Market Basket Analysis Explained | Alteryx & Tableau

Table of Contents

What is Market Basket Analysis?

Market Basket Analysis (MBA) is a type of predictive analytics used to determine what products are usually bought together. It is a popular application of Association Rule Mining, a data mining method to find patterns or associations in data.

Supermarkets often use MBA to analyse the purchasing behaviour of customers, which can help them to come up with data-driven strategies to boost sales. This often includes deciding which products should be placed close to each other, or which products should be discounted or bundled together.

Cartoon depicting that Placing groceries often bought together in the same shelf can influence the purchasing decisions of customers.

What are Some Use Cases for Market Basket Analysis?

Product Recommendations

E-commerce websites generally use these associations to suggest related products to customers based on their current selections, increasing cross-selling and upselling opportunities.

Screenshot of recommended Amazon products based of previous purchases

Inventory Management

Retailers can optimise their stock levels by identifying which products are commonly purchased together and ensuring they are readily available when needed.

Targeted Marketing

Marketers use MBA to design more effective advertising campaigns and offer insights into customer behaviour and preferences.

Fraud Detection

In some cases, it can be used for fraud detection in the banking and insurance industry. Unusual combinations of products in a transaction can raise red flags for potentially fraudulent activity.

Market Basket Case Study

It’s common to see diapers and baby formula being sold on the same aisle, after all these two items are daily baby necessities and are often bought together. Hence, stores place these items close to each other to make them accessible for customers to purchase. But, have you ever noticed diapers and beers being sold on the same aisle?

Image of a grocery store selling beer and diapers side by side.

The placement of diapers and beers next to each other may be surprising to many as they seem like an unlikely combination of items to be purchased together. In the data mining world, there is a famous story that a big retailer discovered that the sales of diapers and beers were correlated.

One possible explanation for this is fathers who make a late-night run to the grocery store to buy diapers for their kids would often get a few cans of beers too to. Once this correlation was established, the retailer moved the beer next to the diapers section and as a result, both the sale of diapers and beers increased.

Regardless of whether this is a true story or not, one thing is for sure – there are certainly many unexpected combinations of items in data waiting to be uncovered.

How does Market Basket Analysis work?

The most important concept to understand is the Apriori Algorithm that is used in MBA to identify frequent itemsets in a dataset and subsequently generate association rules from these frequent itemsets. The Apriori Algorithm assumes that if an itemset, e.g. {egg, beer, diaper}, is frequent, then all its subsets must also be frequent, e.g. {egg, diaper}, {beer, diaper}, {beer, egg}. An association rule (A → B) is a pattern that states that when A occurs, B occurs with a certain probability.

Three Key Metrics for MBA

Before deep-diving into how the Apriori Algorithm works, there are three key metrics to know.

P(A + B): Probability of both items A and B being purchased together.

P(A): Probability of item A being purchased.

P(B): Probability of item B being purchased.

Support

Support measures how frequently a combination of items occur in transactional data.  If support is high, it means that the set of items (A and B) is bought together frequently. To calculate the support of A → B, take the number of transactions that contain the set of items (A and B) and divide by the total number of transactions.

 

 

Confidence

Confidence measures how likely that a specific set of items will be bought together. A high confidence means that when customers buy A, they are more likely to buy B as well. To calculate the confidence of buying B when A is already in the basket, take the number of transactions that contain the set of items (A and B) and divide by the number of transactions that contain A.

Lift

Lift measures the strength of association between items in a set of transactions. Lift determines how likely two items are to be purchased together than if they were purchased separately. A high lift value of more than one means that buying A increases the likelihood of buying B (i.e. A & B are complementary products), while a low lift value of less than one means that buying A decreases the likelihood of buying item B (i.e. A & B are substitute products of each other). To calculate the lift of A → B, take the number of transactions that contain the set of items (A and B) and divide by the product of support of A and support of B.

Image depicting Lift(A->B) Calculation

The Apriori Algorithm

To illustrate the concept of Apriori Algorithm in MBA, let us consider the groceries transaction data below.

1. Let k = 1. Generate frequent itemsets that contain k items and calculate their support. Set minimum support to a suitable threshold, e.g. 60% (3/5). Remove infrequent itemsets that are below the minimum support. The remaining itemsets are considered as frequent.

Image depicting itemsets (grocery items) and support calculation results.

In our example, egg has a support of 20% (1/5) which is lesser than the minimum support of 60%. Hence, we drop egg from our itemset.

2. Continue to generate frequent itemsets that contain (k+1) items.

In our example, we now generate frequent itemsets that contain two items and calculate their support. We drop the itemset of beer and egg from our itemset as it has a support of 40% (2/5) which is lower than the minimum support of 60%.

3. Repeat until no frequent itemsets are found.

Image depicts itemset (gocereis) and support results

In this example, since no 3-itemset is frequent, we calculate the confidence of the previous frequent itemsets (i.e. 2-itemset).

4. Set minimum confidence to a suitable threshold, e.g. 70%. To generate strong rules, remove frequent itemsets that are below the minimum confidence.

Image depicts item set (groceries), support and confidence.

In this example, since all the rules are above the minimum confidence of 70%, no rule is eliminated.

This flowchart summarises how Apriori Algorithm works:

The output of MBA is a set of strong rules that can be used to make business decisions. Although the Apriori Algorithm is simple, it takes a long time to scan through a large database to find frequent itemsets.

The Eclat Algorithm

Another useful algorithm to know is the Eclat Algorithm, which stands for Equivalence Class Clustering and Bottom-Up Lattice Traversal. It is a vertical data mining approach that identifies frequent itemsets by using equivalence classes (sets of items that have the same transaction ID) and transaction intersections (intersection of transactions that share common items). Equivalence classes help to organise and process transactions based on common items. While the Apriori Algorithm repeatedly scans the original database, the Eclat Algorithm just analyzes the currently generated dataset to identify frequently occurring items.

How to Choose from the Apriori or Eclat Algorithm

Apriori Algorithm can perform well when dealing with sparse data, where most of the itemsets have low support. Apirori Algorithm may be more memory efficient than Eclat Algorithm for larger datasets. Thus, Apriori is suitable if the data is large, and items are bought infrequently together, whereas Eclat is more suitable for small and medium datasets.

However, if computational efficiency is crucial, Eclat might be a better choice since it is faster than Apriori, especially when dealing with dense datasets, since it uses the depth-first search approach rather than the breadth-first search approach.

Ultimately, it is a good idea to experiment with both algorithms and compare the performance on your specific data.

How to implement Market Basket Analysis in Alteryx and Tableau

Here is a step-by-step video tutorial on how to create MBA using Alteryx and Tableau.

Image depicts Alteryx Workflow for Market Basket Analysis

1. Data Input

Load transactional data into Alteryx, this data should include information about which items were purchased together in each transaction. Ensure the data is in a format suitable for MBA, with one row per transaction and a column indicating the items purchased. Alteryx’s data cleansing tools can be used to manipulate the data into the desired format.

Image Depicts table with 'Record', 'Transaction ID' and 'Item Description' table in Alteryx.

2. Market Basket Analysis Tools

Image Depicts the Alteryx 'MB Rules Tool' and 'MB Inspect Tool' Icons.

Under the ’Predictive Grouping’ tab, use the ’MB Rules’ tool which uses algorithms like Apriori or Eclat to generate association rules or frequent itemsets. A summary report of both the transaction data and the rules/itemsets is produced, along with a model object that can be further investigated in an ‘MB Inspect’ tool.

In the ‘MB Rules‘ tool configuration, select the correct transaction key field and the field that contains the item identifier. Set parameters such as minimum support and confidence levels to filter the rules generated.

Image depicts the Alteryx MB configuration window.

3. Explore Results

Use Alteryx tools like ’Browse’ or ’Output Data’ to explore and analyse the results of the MBA. This may include frequent itemsets and association rules.

The report generated by ‘MB Inspect‘ tool includes details about the association rules (support, confidence, lift), a grouped matrix and a network graph. The fields include:

    • LHS: A comma-separated list of left-hand side items in each rule.
    • RHS: A comma-separated list of right-hand side items in each rule.
    • Support: The level of support for each rule.
    • Confidence: The level of confidence for each rule.
    • Lift: The level of lift each rule possesses.

4. Visualise Results

We can export the metric data generated by Alteryx using ‘Data Output‘ tool as a Tableau Data Extract and use it to create a workbook. Visualisations can include charts, scatterplots, matrix or other visual representations of association rules and its metrics. Examples of visualisations include:

Image depicts a Bubble Plot with the results in Tableau.

5. Iterate and Refine

Depending on initial results, users may want to iterate on the analysis. Parameters can be adjusted or different algorithms can be implemented to refine the market basket analysis.

6. Export Results

Once satisfied with the analysis, the results can be exported for further use, such as incorporating insights into business strategies or reporting.

Summary

Ready to unlock the power of market basket analysis for your business? Consult with us today to leverage the advanced capabilities of Tableau or Alteryx. Discover valuable insights into customer behaviour, optimise product recommendations and enhance your decision-making process.