Monday, August 06, 2007

What Makes QlikTech So Good: A Concrete Example

Continuing with Friday’s thought, it’s worth giving a concrete example of what QlikTech makes easy. Let’s look at the cross-sell report I mentioned on Thursday.

This report answers a common marketing question: which products do customers tend to purchase together, and how do customers who purchase particular combinations of products behave? (Ok, two questions.)

The report this begins with a set of transaction records coded with a Customer ID, Product ID, and Revenue. The trick is to identify all pairs among these records that have the same Customer ID. Physically, the resulting report is a matrix with products both as column and row headings. Each cell will report on customers who purchased the pair of products indicated by the row and column headers. Cell contents will be the number of customers, number of purchases of the product in the column header, and revenue of those purchases. (We also want row and column totals, but that’s a little complicated so let’s get back to that later.)

Since each record relates to the purchase of a single product, a simple cross tab of the input data won’t provide the information we want. Rather, we need to first identify all customers who purchased a particular product and group them on the same row. Columns will then report on all the other products they purchased.

Conceptually, Qlikview and SQL do this in roughly the same way: build a list of existing Customer ID / Product ID combinations, use this list to select customers for each row, and then find all transactions associated with those customers. But the mechanics are quite different.

In QlikView, all that’s required is to extract a copy of the original records. This keeps the same field name for Customer ID so it can act as a key relating to the original data, but renames Product ID as Master Product so it can treated as an independent dimension. The extract is done in a brief script that loads the original data and creates the other table from it:

Columns: // this is the table name
load
Customer_ID,
Product_ID,
Revenue
from input_data.csv (ansi, txt, delimiter is ',', embedded labels); // this code will be generated by a wizard

Rows: // this is the table name
load
Customer_ID,
Product_ID as Master_Product
resident Columns;

After that, all that’s needed is to create a pivot table report in the QlikView interface by specifying the two dimensions and defining expressions for the cell contents: count (distinct Customer ID); count (Product ID), and sum(Revenue). QlikView automatically limits the counts to the records qualified for each cell by the dimension definitions.

SQL takes substantially more work. The original extract is similar, creating a table with Customer ID and Master Product. But more technical skill is needed: the user must know to use a “select distinct” command to avoid creating multiple records with the same Customer ID / Product ID combination. Multiple records would result in duplicate rows, and thus double-counting, when the list is later joined back to the original transactions. (QlikView gives the same, non-double-counted results whether or not “select distinct” is used to create its extract.)

Once the extract is created, SQL requires the user to create a table with records for the report. This must contain two records for each transaction: one where the original product is the Master Product, and other where it is the Product ID. This requires a left join (I think) of the extract table against the original transaction table: again, the user needs enough SQL skill to know which kind of join is needed and how to set it up.

Next, the SQL user must create the report values themselves. We’ve now reached the limits of my own SQL skills, but I think you need two selections. The first is a “group by” on the Master Product, Product ID and Customer ID fields for the customer counts. The second is another “group by” on just the Master Product and Product ID for the product counts and revenue. Then you need to join the customer counts back to the more summarized records. Perhaps this could all be done in a single pass, but, either way, it’s pretty trickly.

Finally, the SQL user must display the final results in a report. Presumably this would be done in a report writer that hides the technical details from the user. But somebody skilled will still need to set things up the first time around.

I trust it’s clear how much easier it will be to create this report in QlikView than SQL. QlikView required one table load and one extract. SQL required one table load, one extract, one join to create the report records, and one to three additional selects to create the final summaries. Anybody wanna race?

But this is a very simple example that barely scratches the surface of what users really want. For example, they’ll almost certainly ask to calculate Revenue per Customer. This will be simple for QlikTech: just add a report expression of sum(Revenue) / count(distinct Customer_ID). (Actually, since QlikView lets you name the expressions and then use the names in other expressions, the formula would probably be something simpler still, like “Revenue / CustomerCount”.) SQL will probably need another data pass after the totals are created to do the calculation. Perhaps a good reporting tool will avoid this or at least hide it from the user. But the point is that QlikTech lets you add calculations without any changes to the files, and thus without any advance planning.

Another thing users are likely to want is row and column totals. These are conceptually tricky because you can’t simply add up the cell values. For the row totals, the same customer may appear in multiple columns, so you need to eliminate those duplicates to get correct values for customer count and revenue per customer. For the column totals, you need to remove transactions that appear on two rows (one where they are the Master Product, and other where they are the Product_ID). QlikTech automatically handles both situations because it is dynamically calculating the totals from the original data. But SQL created several intermediate tables, so the connection to the original data is lost. Most likely, SQL will need another set of selections and joins to get the correct totals.

QlikTech’s approach becomes even more of an advantage when users start drilling into the data. For example, they’re likely to select transactions related to particular products or on unrelated dimensions such as customer type. Again, since it works directly from the transaction details, QlikVeiw will instantly give correct values (including totals) for these subsets. SQL must rerun at least some of its selections and aggregations.

But there’s more. When we built the cross sell report for our client, we split results based on the number of total purchases made by each customer. We did this without any file manipulation, by adding a “calculated dimension” to the report: aggr(count (Product ID), Customer ID). Admittedly, this isn’t something you’d expect a casual user to know, but I personally figured it out just looking at the help files. It’s certainly simpler than how you’d do it in SQL, which is probably to count the transactions for each customer, post the resulting value on the transaction records or a customer-level extract file, and rebuild the report.

I could go on, but hope I’ve made the point: the more you want to do, the greater the advantage of doing it in QlikView. Since people in the real world want to do lots of things, the real world advantage of QlikTech is tremendous. Quod Erat Demonstrandum.

(disclaimer: although Client X Client is a QlikTech reseller, contents of this blog are solely the responsibility of the author.)

No comments: