Home Market Basket Analysis, Part VI: The Quartal Graph is Worth Several Thousand Words

Market Basket Analysis, Part VI: The Quartal Graph is Worth Several Thousand Words

The growth of digitally stored data volume in the world is staggering—it is estimated that by 2010, the world will be storing more than 1021 (10 with 21 zeros) bytes of data, or 1 zettabyte (ZB) of data.1 This is roughly the number of stars in the universe. A large percentage (70 percent) of the total data is created by individuals in their daily interactions; furthermore, 85 percent of the total data generated will be of use to business organizations.2

Data mining, a key business intelligence (BI) tool, uses statistical and mathematical algorithms to find useful correlations and patterns in large customer transactions databases. The traditional BI routine of trying to describe patterns from pages and pages of tables and graphs in large reports has become unwieldy and is not very effective in providing useful and actionable information to business managers.3 This is where visualization tools come in—as they say in China, “One picture is worth (10) thousand words.” 4

This was recognized by the famous statistician John W. Tukey, who back in 1962,5 invented a wide variety of new and simple graphic displays for his Exploratory Data Analysis (EDA) concept.

The question, “Why is a picture worth a thousand words?”, has also been answered by researchers who have conducted experiments for testing the effectiveness of visual representation as an aid to problem solving. Beveridge and Parkins6 used diagrams and colored strips as cues, and their results suggested that colored strips of varying intensities were more effective for facilitating recall. Robinson and Kiewra7 demonstrated that students learned to apply concepts better when presented with text and graphics, compared with students who were provided with text outlines only. Mayer et al.8 compared different instructional treatments (a 600-word passage with summary, a summary alone, and a visual summary) and showed that scientific cause-and-effect explanations are best taught by using visual summaries.

Visualizations for Multidimensional Data
There are several graphs for displaying 3-D data on a 2-D graph—heat map, contour map and wireframe plot, to name a few. The problem gets quite complex when a visualization tool is needed for very large data sets with dimensionality larger than three. The parallel coordinates9 system of Inselberg can be used to plot data of any dimensionality, but the graph is hard to understand and may not work well for very large data sets. There are several parallel plot software packages available (Parallax10, GGOBI11, Mondrian12). GGOBI and Mondrian are available as free downloads, and both are able to show via use of bar charts and histograms the interactions among variables.

Figure 1 shows a parallel plot of a data set collected from more than 1,500 slot machines on a casino floor. The variables shown in the parallel plot are coin-in (C1), ParPercent, End_Unit (EU, binary), End_Cap (EC, binary), and BankSize. A histogram of the variable ParPercent is also shown in Figure 1. The software Mondrian was used to draw Figure 1, which shows that the coin-in for slot machines with high ParPercent tends to be low. It should also be clear that a parallel plot with 10,000 or more data points may be too cluttered and hence may not be all that useful. Even though the parallel coordinates plot is able to show interactions among variables, for large data sets of high dimensionality, it would not be very easy to quickly find patterns and correlations that are needed to make business decisions.
There is, however, the need to make business decisions faster in today’s complex business environment, which means that the visual display of information has to show several dimensions on a 2-D plot and still be easy to understand. In response to this need, there is a demand for visualization tools that generate Super Graphics. These represent the next generation of BI tools for strategic, operational and analytical users of business data. Quartal Super Graphics™ tools enable users to plot vast amounts of information on a 2-D graph that shows the users’ patterns and relationships among several variables that are very difficult to see otherwise.

Customer and Other Analytics—The Quadrant Evolves to the Quartal
Traditional quadrant analysis is a time proven management tool for dividing business problems into understandable pieces. For example, if we divided our slot machine customers into four quadrants based on recency and frequency, as shown in Figure 2, these four quadrants give a very useful breakdown of the customers.

Each one has a quite specific treatment. The limitation with the quadrant method is that the revenue of each quadrant is different so the relative importance of the quadrants is not apparent. For example, it is quite typical for the majority of the revenue to be in the HH quadrant. Quartal analysis takes this analysis to the next step by splitting the customers into quadrants in such a way that each quadrant has “equal” revenue.13

The Quartal Super Graphic takes this process and repeatedly applies the allocation method into “equal parts,” further splitting the customers layer by layer. This produces 4(depth) quartal cells where each of these cells has equal revenue. This allows a business organization to see what its customers are doing essentially in one single picture. This single picture is jamming four dimensions into a two-dimensional scatter plot.

In the following illustrative example, we show how quartal visualization tools can be used to perform the classic recency, frequency, monetary value (RFM) analysis on customer transaction data. These three measures of customer behavior are commonly used because they are known to be excellent predictors of future customer behavior. If a customer has interacted recently with one of your products or services, then this customer is more likely to interact with it again. The same can be said for the frequency of interaction. The third coordinate, monetary value (profit in this case), is obviously very important to any business organization.

Figure 3 shows an illustrative example of classic RFM analysis applied to a large database of customer transactions. Thousands of correlated data points are visually depicted, allowing the eye to quickly see patterns and relationships that are present in the data. The top right corner of the quartal graph shows customers who have visited more frequently and more recently. The bottom left corner of the graph consists of customers who have visited less frequently and less recently. The values of RFM for each of the four corners are also shown in Figure 3. The revenue heat map uses color to show revenue from the customers (blue = low, red = high). The split of the customers in the illustrative example is based on profit.

If we examine Figure 3, the top right corner depicts customers that visit frequently and have visited recently. As expected, these customers are considered of high value to the business organization.
However, note the relatively high value for customers that visit less often but have visited recently (top left quadrant) compared to those who visit frequently but have not visited recently (bottom right quadrant).

A Working Example—The Use of Super Graphics
In our earlier series, we described how market basket analysis can be used to build groups of customers based on their demand. This demand-based modeling is extremely powerful but presents one final challenge: How can we humans understand these mathematical models? The following process shows how the combination of segmentation and quartal analysis can be combined to bring together visually enabled demand-based marketing programs.

In this working example, we describe a four-step process (Figure 4) of using quartals and segmentation to manage very high-dimensional data into practical marketing activities.

In this article, we have discussed some of the problems faced by operators who, for years, have been collecting ever more detailed interaction data. Since it is extremely difficult to find patterns from tables and charts in voluminous reports, a tool for visualization of multidimensional data on a 2-D plot is a must for BI applications. There are several such tools available, but the quartal graph enables users to plot large amounts of data on an easy to understand 2-D graph that clearly shows the users’ patterns, relationships and interactions among several variables that are very difficult to see otherwise.

Footnotes

1    Singh, Dr. A. K., and Cardno, Andrew. “The Petabyte Era of Gaming Data.” Casino Enterprise Management, September 2008, p. 20-22.

2    Gantz, J. F., Manfrediz, A., Minton, S., Reinsel, D., Schlichting, W., Toncheva, A. (2008). “The Diverse and Exploding Digital Universe – An Updated Forecast of Worldwide Information Growth Through 2011.” IDC White Paper sponsored by EMC Corporation. www.emc.com/collateral/analyst-reports/diverse-exploding-digital-univers….

3    www.itbusinessedge.com/cm/blogs/all/some-advice-on-visual-bi/?cs=11594.

4    www.phrases.org.uk/meanings/a-picture-is-worth-a-thousand-words.html.

5    Tukey, John Wilder (1962). “The Future of Data Analysis.” Annals of Mathematical Statistics, p. 33:1-67 and 81.

6    Beveridge, M., & Parkins, E. (1987). “Visual Representation in Analogical Problem Solving.” Memory & Cognition, 15, p. 230-237.

7    Robinson, D.H. & Kiewra, K.A. 1995. “Visual Argument: Graphic Organizers are Superior to Outlines in Improving Learning from Text.” Journal of Educational Psychology, 87, p. 455-467.

8    Mayer, R.E., Bove, W., Bryman, A., Mars, R. & Tapangco, L. 1996. “When Less is More: Meaningful Learning from Visual and Verbal Summaries of Science Textbook Lessons.” Journal of Educational Psychology, 88, p. 64-73.

9    Inselberg, Alfred (1985). “The Plane with Parallel Coordinates.” The Visual Computer 1(2): p. 69-91.

10    www.kdnuggets.com/software/parallax.html.

11    www.ggobi.org.

12    http://rosuda.org/mondrian/Mondrian.html.

13    This cannot be precise as the granularity of the allocation is customers, and it is unlikely they split exactly into four mathematically equal parts.

Leave a Comment