Building a Recommender System Using Embeddings

As part of the Data Science and Machine Learning team at Drop, I shared a blog post on how we built a recommender model to create a personalized experience for our members. Check it out! :)

User-User Collaborative Filtering Using Neo4j Graph Database

Motivation

Recommendation systems fall under two categories: personalized and non-personalized recommenders. My previous post on association rules mining is an example of a non-personalized recommender, as the recommendations generated are not tailored to a specific user. By contrast, a personalized recommender system takes into account user preferences in order to make recommendations for that user. There are various personalized recommender algorithms, and in this post, I will be implementing user-user collaborative filtering. This algorithm finds users similar to the target user in order to generate recommendations for the target user. And to add to the fun, I will be implementing the algorithm using a graph database versus the more traditional approach of matrix factorization.

User-User Collaborative Filtering

This personalized recommender algorithm assumes that past agreements predict future agreements. It uses the concept of similarity in order to identify users that are "like" the target user in terms of their preferences. Users identified as most similar to the target user become part of the target user's neighbourhood. Preferences of these neighbours are then used to generate recommendations for the target user.

Concretely, here are the steps we will be implementing to generate recommendations for an online grocer during the user check-out process:

  1. Select a similarity metric to quantify likeness between users in the system.
  2. For each user pair, compute similarity metric.
  3. For each target user, select the top k neighbours based on the similarity metric.
  4. Identify products purchased by the top k neighbours that have not been purchased by the target user.
  5. Rank these products by the number of purchasing neighbours.
  6. Recommend the top n products to the target user.

In this particular demonstration, we are building a recommender system based on the preferences of 100 users. As such, we would like the neighbourhood size, k, to be large enough to identify clear agreement between users, but not too large that we end up including those that are not very similar to the target user. Hence we choose k=10. Secondly, in the context of a user check-out application for an online grocer, the goal is to increase user basket by surfacing products that are as relevant as possible, without overwhelming the user. Therefore we limit the number of recommendations to n=10 products per user.

Jaccard Index

The Jaccard index measures similarity between two sets, with values ranging from 0 to 1. A value of 0 indicates that the two sets have no elements in common, while a value of 1 implies that the two sets are identical. Given two sets A and B, the Jaccard index is computed as follows:

$$J(A,B) = \frac{| A \cap B |} {| A \cup B |}$$

The numerator is the number of elements that A and B have in common, while the denominator is the number of elements that are unique to each set.

In this implementation of user-user collaborative filtering, we will be using the Jaccard index to measure similarity between two users. This is primarily due to the sparse nature of the data, where there is a large number of products and each user purchases only a small fraction of those products. If we were to model our user preferences using binary attributes (ie: 1 if user purchased product X and 0 if user did not purchase product X), we would have a lot of 0s and very few 1s. The Jaccard index is effective in this case, as it eliminates matching attributes that have a value of 0 for both users. In other words, when computing similarity between two users, we only consider products that have been purchased, either by both users, or at least one of the users. Another great thing about the Jaccard index is that it accounts for cases where one user purchases significantly more products than the user it's being compared to. This can result in higher overlap of products purchased between the two, but this does not necessarily mean that the two users are similar. With the equation above, we see that the denominator will be large, thereby resulting in a smaller Jaccard index value.

Graph Database

A graph database is a way of representing and storing data. Unlike a relational database which represents data as rows and columns, data in a graph database is represented using nodes, edges and properties. This representation makes graph databases conducive to storing data that is inherently connected. For our implementation of user-user collaborative filtering, we are interested in the relationships that exist between users based on their preferences. In particular, we have a system comprised of users who ordered orders, and these orders contain products that are in a department and in an aisle. These relationships are easily depicted using a property graph data model:

Property Graph Model

The nodes represent entities in our system: User, Order, Product, Department and Aisle. The edges represent relationships: ORDERED, CONTAINS, IN_DEPARTMENT and IN_AISLE. The attributes in the nodes represent properties (eg: a User has property "user_id"). Mapping to natural language, we can generally think of nodes as nouns, edges as verbs and properties as adjectives.

In this particular graph, each node type contains the same set of properties (eg: all Orders have an order_id, order_number, order_day_of_week and order_hour_of_day), but one of the interesting properties of graph databases is that it is schema-free. This means that a node can have an arbitrary set of properties. For example, we can have two user nodes, u1 and u2, with u1 having properties such as name, address and phone number, and u2 having properties such as name and email. The concept of a schema-free model also applies to the relationships that exist in the graph.

We will be implementing the property graph model above using Neo4j, arguably the most popular graph database today. The actual creation and querying of the database will be done using the Cypher query language, often thought of as SQL for graphs.

Input Dataset

Similar to my post on association rules mining, we will once again be using data from Instacart. The datasets can be downloaded from the link below, along with the data dictionary:

“The Instacart Online Grocery Shopping Dataset 2017”, Accessed from https://www.instacart.com/datasets/grocery-shopping-2017 on Oct 10, 2017.

Part 1: Data Preparation

In [1]:
import pandas as pd
import numpy as np
import sys
In [2]:
# Utility functions 

# Returns the size of an object in MB
def size(obj):
    return "{0:.2f} MB".format(sys.getsizeof(obj) / (1000 * 1000))

# Displays dataframe dimensions, size and top 5 records
def inspect_df(df_name, df):
    print('{0} --- dimensions: {1};  size: {2}'.format(df_name, df.shape, size(df)))  
    display(df.head())
    
# Exports dataframe to CSV, the format for loading data into Neo4j 
def export_to_csv(df, out):
    df.to_csv(out, sep='|', columns=df.columns, index=False)

A. Load user order data

Data from Instacart contains 3-99 orders per user. Inspecting the distribution of number of orders per user, we see that users place 16 orders on average, with 75% of the user base placing at least 5 orders. For demonstration purposes, we will be running collaborative filtering on a random sample of 100 users who purchased at least 5 orders.

In [9]:
min_orders   = 5     # minimum order count per user
sample_count = 100   # number of users to select randomly

# Load data from evaluation set "prior" (please see data dictionary for definition of 'eval_set') 
order_user           = pd.read_csv('orders.csv')
order_user           = order_user[order_user['eval_set'] == 'prior']

# Get distribution of number of orders per user
user_order_count     = order_user.groupby('user_id').agg({'order_id':'count'}).rename(columns={'order_id':'num_orders'}).reset_index()
print('Distribution of number of orders per user:')
display(user_order_count['num_orders'].describe())

# Select users who purchased at least 'min_orders'
user_order_atleast_x = user_order_count[user_order_count['num_orders'] >= min_orders]

# For reproducibility, set seed before taking a random sample
np.random.seed(1111)
user_sample          = np.random.choice(user_order_atleast_x['user_id'], sample_count, replace=False)

# Subset 'order_user' to include records associated with the 100 randomly selected users
order_user           = order_user[order_user['user_id'].isin(user_sample)]
order_user           = order_user[['order_id','user_id','order_number','order_dow','order_hour_of_day']]
inspect_df('order_user', order_user)
Distribution of number of orders per user:

count    206209.000000
mean         15.590367
std          16.654774
min           3.000000
25%           5.000000
50%           9.000000
75%          19.000000
max          99.000000
Name: num_orders, dtype: float64


order_user --- dimensions: (1901, 5);  size: 0.09 MB
order_id user_id order_number order_dow order_hour_of_day
11334 2808127 701 1 2 14
11335 2677145 701 2 3 11
11336 740361 701 3 1 13
11337 2866491 701 4 3 12
11338 1676999 701 5 4 11

B. Load order details data

In [10]:
# Load orders associated with our 100 selected users, along with the products contained in those orders
order_product = pd.read_csv('order_products__prior.csv')
order_product = order_product[order_product['order_id'].isin(order_user.order_id.unique())][['order_id','product_id']]
inspect_df('order_product', order_product)
order_product --- dimensions: (19840, 2);  size: 0.48 MB
order_id product_id
1855 209 39409
1856 209 20842
1857 209 16965
1858 209 8021
1859 209 23001

C. Load product data

In [11]:
# Load products purchased by our 100 selected users
products = pd.read_csv('products.csv')
products = products[products['product_id'].isin(order_product.product_id.unique())]
inspect_df('products', products)
products --- dimensions: (3959, 4);  size: 0.46 MB
product_id product_name aisle_id department_id
0 1 Chocolate Sandwich Cookies 61 19
33 34 Peanut Butter Cereal 121 14
44 45 European Cucumber 83 4
98 99 Local Living Butter Lettuce 83 4
115 116 English Muffins 93 3

D. Load aisle data

In [12]:
# Load entire aisle data as it contains the names related to the aisle IDs from the 'products' data
aisles = pd.read_csv('aisles.csv')
inspect_df('aisles', aisles)
aisles --- dimensions: (134, 2);  size: 0.01 MB
aisle_id aisle
0 1 prepared soups salads
1 2 specialty cheeses
2 3 energy granola bars
3 4 instant foods
4 5 marinades meat preparation

E. Load department data

In [13]:
# Load entire department data as it contains the names related to the department IDs from the 'products' data
departments = pd.read_csv('departments.csv')
inspect_df('departments', departments)
departments --- dimensions: (21, 2);  size: 0.00 MB
department_id department
0 1 frozen
1 2 other
2 3 bakery
3 4 produce
4 5 alcohol

F. Export dataframes to CSV, which in turn will be loaded into Neo4j

In [14]:
export_to_csv(order_user,    '~/neo4j_instacart/import/neo4j_order_user.csv')
export_to_csv(order_product, '~/neo4j_instacart/import/neo4j_order_product.csv')    
export_to_csv(products,      '~/neo4j_instacart/import/neo4j_products.csv')
export_to_csv(aisles,        '~/neo4j_instacart/import/neo4j_aisles.csv')
export_to_csv(departments,   '~/neo4j_instacart/import/neo4j_departments.csv')

Part 2: Create Neo4j Graph Database

A. Set up authentication and connection to Neo4j

In [15]:
# py2neo allows us to work with Neo4j from within Python
from py2neo import authenticate, Graph

# Set up authentication parameters
authenticate("localhost:7474", "neo4j", "xxxxxxxx") 

# Connect to authenticated graph database
g = Graph("http://localhost:7474/db/data/")

B. Start with an empty database, then create constraints to ensure uniqueness of nodes

In [16]:
# Each time this notebook is run, we start with an empty graph database
g.run("MATCH (n) DETACH DELETE n;")    

# We drop and recreate our node constraints
g.run("DROP CONSTRAINT ON (order:Order)             ASSERT order.order_id            IS UNIQUE;")
g.run("DROP CONSTRAINT ON (user:User)               ASSERT user.user_id              IS UNIQUE;")
g.run("DROP CONSTRAINT ON (product:Product)         ASSERT product.product_id        IS UNIQUE;")
g.run("DROP CONSTRAINT ON (aisle:Aisle)             ASSERT aisle.aisle_id            IS UNIQUE;")
g.run("DROP CONSTRAINT ON (department:Department)   ASSERT department.department_id  IS UNIQUE;")

g.run("CREATE CONSTRAINT ON (order:Order)           ASSERT order.order_id            IS UNIQUE;")
g.run("CREATE CONSTRAINT ON (user:User)             ASSERT user.user_id              IS UNIQUE;")
g.run("CREATE CONSTRAINT ON (product:Product)       ASSERT product.product_id        IS UNIQUE;")
g.run("CREATE CONSTRAINT ON (aisle:Aisle)           ASSERT aisle.aisle_id            IS UNIQUE;")
g.run("CREATE CONSTRAINT ON (department:Department) ASSERT department.department_id  IS UNIQUE;")
Out[16]:
<py2neo.database.Cursor at 0x109ea3e10>

C. Load product data into Neo4j

In [19]:
query = """
        // Load and commit every 500 records
        USING PERIODIC COMMIT 500 
        LOAD CSV WITH HEADERS FROM 'file:///neo4j_products.csv' AS line FIELDTERMINATOR '|' 
        WITH line 
        
        // Create Product, Aisle and Department nodes
        CREATE (product:Product {product_id: toInteger(line.product_id), product_name: line.product_name}) 
        MERGE  (aisle:Aisle {aisle_id: toInteger(line.aisle_id)}) 
        MERGE  (department:Department {department_id: toInteger(line.department_id)}) 

        // Create relationships between products and aisles & products and departments 
        CREATE (product)-[:IN_AISLE]->(aisle) 
        CREATE (product)-[:IN_DEPARTMENT]->(department);
        """

g.run(query)
Out[19]:
<py2neo.database.Cursor at 0x10a9e4208>

D. Load aisle data into Neo4j

In [137]:
query = """
        // Aisle data is very small, so there is no need to do periodic commits
        LOAD CSV WITH HEADERS FROM 'file:///neo4j_aisles.csv' AS line FIELDTERMINATOR '|' 
        WITH line 
        
        // For each Aisle node, set property 'aisle_name' 
        MATCH (aisle:Aisle {aisle_id: toInteger(line.aisle_id)}) 
        SET aisle.aisle_name = line.aisle;
        """

g.run(query)
Out[137]:
<py2neo.database.Cursor at 0x1101457b8>

E. Load department data into Neo4j

In [138]:
query = """
        // Department data is very small, so there is no need to do periodic commits
        LOAD CSV WITH HEADERS FROM 'file:///neo4j_departments.csv' AS line FIELDTERMINATOR '|' 
        WITH line
        
        // For each Department node, set property 'department_name' 
        MATCH (department:Department {department_id: toInteger(line.department_id)}) 
        SET department.department_name = line.department;
        """

g.run(query)
Out[138]:
<py2neo.database.Cursor at 0x11751acf8>

F. Load order details data into Neo4j

In [139]:
query = """
        // Load and commit every 500 records        
        USING PERIODIC COMMIT 500
        LOAD CSV WITH HEADERS FROM 'file:///neo4j_order_product.csv' AS line FIELDTERMINATOR '|'
        WITH line
        
        // Create Order nodes and then create relationships between orders and products
        MERGE (order:Order {order_id: toInteger(line.order_id)})
        MERGE (product:Product {product_id: toInteger(line.product_id)})
        CREATE (order)-[:CONTAINS]->(product);
        """

g.run(query)
Out[139]:
<py2neo.database.Cursor at 0x110145160>

G. Load user order data into Neo4j

In [140]:
query = """
        // Load and commit every 500 records 
        USING PERIODIC COMMIT 500
        LOAD CSV WITH HEADERS FROM 'file:///neo4j_order_user.csv' AS line FIELDTERMINATOR '|'
        WITH line
        
        // Create User nodes and then create relationships between users and orders 
        MERGE (order:Order {order_id: toInteger(line.order_id)})
        MERGE (user:User   {user_id:  toInteger(line.user_id)})

        // Create relationships between users and orders, then set Order properties
        CREATE(user)-[o:ORDERED]->(order)              
        SET order.order_number = toInteger(line.order_number),
            order.order_day_of_week = toInteger(line.order_dow), 
            order.order_hour_of_day = toInteger(line.order_hour_of_day);
        """

g.run(query)
Out[140]:
<py2neo.database.Cursor at 0x1149251d0>

H. What our graph looks like

This is what the nodes and relationships we have created look like in Neo4j for a small subset of the data. Please use the legend on the top left corner only to determine the colours associated with the different nodes (ie: ignore the numbers).

Instacart Graph

Part 3: Implement User-User Collaborative Filtering Algorithm

In [221]:
# Implements user-user collaborative filtering using the following steps:
#   1. For each user pair, compute Jaccard index
#   2. For each target user, select top k neighbours based on Jaccard index
#   3. Identify products purchased by the top k neighbours that have not been purchased by the target user
#   4. Rank these products by the number of purchasing neighbours
#   5. Return the top n recommendations for each user

def collaborative_filtering(graph, neighbourhood_size, num_recos):

    query = """
           // Get user pairs and count of distinct products that they have both purchased
           MATCH (u1:User)-[:ORDERED]->(:Order)-[:CONTAINS]->(p:Product)<-[:CONTAINS]-(:Order)<-[:ORDERED]-(u2:User)
           WHERE u1 <> u2
           WITH u1, u2, COUNT(DISTINCT p) as intersection_count

           // Get count of all the distinct products that they have purchased between them
           MATCH (u:User)-[:ORDERED]->(:Order)-[:CONTAINS]->(p:Product)
           WHERE u in [u1, u2]
           WITH u1, u2, intersection_count, COUNT(DISTINCT p) as union_count

           // Compute Jaccard index
           WITH u1, u2, intersection_count, union_count, (intersection_count*1.0/union_count) as jaccard_index

           // Get top k neighbours based on Jaccard index
           ORDER BY jaccard_index DESC, u2.user_id
           WITH u1, COLLECT(u2)[0..{k}] as neighbours
           WHERE LENGTH(neighbours) = {k}                                              // only want users with enough neighbours
           UNWIND neighbours as neighbour
           WITH u1, neighbour

           // Get top n recommendations from the selected neighbours
           MATCH (neighbour)-[:ORDERED]->(:Order)-[:CONTAINS]->(p:Product)             // get all products bought by neighbour
           WHERE not (u1)-[:ORDERED]->(:Order)-[:CONTAINS]->(p)                        // which target user has not already bought
           WITH u1, p, COUNT(DISTINCT neighbour) as cnt                                // count neighbours who purchased product
           ORDER BY u1.user_id, cnt DESC                                               // sort by count desc
           RETURN u1.user_id as user, COLLECT(p.product_name)[0..{n}] as recos         // return top n products
           """

    recos = {}
    for row in graph.run(query, k=neighbourhood_size, n=num_recos):
        recos[row[0]] = row[1]

    return recos

Part 4: Execute User-User Collaborative Filtering

Our collaborative filtering function expects 3 parameters: a graph database, the neighbourhood size and the number of products to recommend to each user. A reminder that our graph database, g, contains nodes and relationships pertaining to user orders. And as previously discussed, we have chosen k=10 as the neighbourhood size and n=10 as the number of products to recommend to each of our users. We now invoke our collaborative filtering function using these parameters.

In [277]:
%%time
recommendations = collaborative_filtering(g,10,10)
display(recommendations)
{701: ['Strawberries',
  'Organic Zucchini',
  'Organic Strawberries',
  'Limes',
  'Bag of Organic Bananas',
  'Organic Baby Spinach',
  'Organic Baby Carrots',
  'Organic Black Beans',
  'Organic Fuji Apple',
  'Red Vine Tomato'],
 1562: ['Organic Whole String Cheese',
  'Honeycrisp Apple',
  'Organic Strawberries',
  'Sparkling Water Grapefruit',
  'Organic Zucchini',
  'Organic Yellow Onion',
  'Organic Ginger Root',
  'Asparagus',
  'Salted Butter',
  'Whole Almonds'],
 4789: ['Organic Granny Smith Apple',
  'Limes',
  'Organic Green Cabbage',
  'Organic Cilantro',
  'Creamy Almond Butter',
  'Corn Tortillas',
  'Organic Grape Tomatoes',
  'Unsweetened Almondmilk',
  'Organic Blackberries',
  'Organic Lacinato (Dinosaur) Kale'],
 5225: ['Bag of Organic Bananas',
  'Organic Baby Carrots',
  'Strawberries',
  'Organic Blueberries',
  'Organic Baby Spinach',
  'Organic Strawberries',
  'Honeycrisp Apple',
  'Sweet Kale Salad Mix',
  'Banana',
  'Green Onions'],
 5939: ['Organic Lemon',
  'Organic Kiwi',
  'Fresh Cauliflower',
  'Organic Red Onion',
  'Organic Small Bunch Celery',
  'Organic Raspberries',
  'Banana',
  'Organic Garlic',
  'Organic Large Extra Fancy Fuji Apple',
  'Frozen Organic Wild Blueberries'],
 6043: ['Organic Garlic',
  'Organic Zucchini',
  'Bag of Organic Bananas',
  'Organic Yellow Onion',
  'Large Lemon',
  'Small Hass Avocado',
  'Organic Cilantro',
  'Organic Tomato Paste',
  'Organic Blueberries',
  'Organic Grape Tomatoes'],
 6389: ['Organic Zucchini',
  'Half & Half',
  'Bag of Organic Bananas',
  'Organic Baby Carrots',
  'Organic Yellow Onion',
  'Banana',
  'Organic Garlic',
  'Penne Rigate',
  'Organic Garnet Sweet Potato (Yam)',
  'Organic Raspberries'],
 7968: ['Strawberries',
  'Organic Baby Spinach',
  'Organic Strawberries',
  'Organic Lemon',
  '100% Recycled Paper Towels',
  'Organic Blueberries',
  'Organic Garlic',
  'Bag of Organic Bananas',
  'Limes',
  'Organic Grape Tomatoes'],
 12906: ['Organic Garlic',
  'Bag of Organic Bananas',
  'Organic Raspberries',
  'Organic Black Beans',
  'Free & Clear Natural Laundry Detergent For Sensitive Skin',
  'Organic Blueberries',
  'Original Hummus',
  'Organic Hass Avocado',
  'Organic Red Onion',
  'Organic Zucchini'],
 24670: ['Organic Blueberries',
  'Organic Baby Carrots',
  'Organic Raspberries',
  'Organic Hass Avocado',
  'Organic Garlic',
  'Orange Bell Pepper',
  'Organic Italian Parsley Bunch',
  'Organic Cilantro',
  'Honeycrisp Apple',
  'Organic Baby Spinach'],
 25442: ['Organic Blueberries',
  'Organic Red Onion',
  'Organic Peeled Whole Baby Carrots',
  'Organic Garlic',
  'Organic Baby Arugula',
  'Jalapeno Peppers',
  'Large Lemon',
  'Limes',
  'Bunched Cilantro',
  'Organic Avocado'],
 25490: ['Bag of Organic Bananas',
  'Organic Garlic',
  'Organic Strawberries',
  'Banana',
  'Extra Virgin Olive Oil',
  'Organic Avocado',
  'Organic Navel Orange',
  'Carrots',
  'Small Hass Avocado',
  'Organic Hass Avocado'],
 26277: ['Bag of Organic Bananas',
  'Organic Strawberries',
  'Organic Baby Carrots',
  'Cereal',
  'Pure Vanilla Extract',
  'Hass Avocado Variety',
  'Peaches',
  'Original Beef Jerky',
  'Unsweetened Vanilla Almond Breeze',
  'Organic Lemonade'],
 32976: ['Bag of Organic Bananas',
  'Organic Strawberries',
  'Organic Baby Carrots',
  'Apple Honeycrisp Organic',
  'Peanut Butter Creamy With Salt',
  'Organic Baby Spinach',
  'Organic Grape Tomatoes',
  'Organic Zucchini',
  'Seedless Red Grapes',
  'Organic Yellow Onion'],
 37120: ['Organic Strawberries',
  'Organic Baby Spinach',
  'Strawberries',
  'Sour Cream',
  'Organic Avocado',
  'Banana',
  'Carrots',
  'Organic Baby Carrots',
  'Limes',
  'Russet Potato'],
 40286: ['Bag of Organic Bananas',
  'Organic Baby Carrots',
  'Banana',
  'Organic Yellow Onion',
  'Sparkling Water Grapefruit',
  'Organic Strawberries',
  'Large Lemon',
  'Limes',
  'Organic Hass Avocado',
  'Asparagus'],
 42145: ['Organic Baby Spinach',
  'Large Lemon',
  'Organic Baby Carrots',
  'Fresh Cauliflower',
  'Organic Whole Milk',
  'Organic Large Extra Fancy Fuji Apple',
  'Bunched Cilantro',
  'Organic Kiwi',
  'Organic Ginger Root',
  'Grated Parmesan'],
 43902: ['Large Lemon',
  'Banana',
  'Bag of Organic Bananas',
  'Organic Baby Carrots',
  'Organic Strawberries',
  'Yellow Onions',
  'Organic Yellow Onion',
  'Organic Granny Smith Apple',
  'Limes',
  'Organic Raspberries'],
 45067: ['Organic Yellow Onion',
  'Limes',
  'Organic Grape Tomatoes',
  'Organic Black Beans',
  'Cucumber Kirby',
  'Bunched Cilantro',
  'Organic Baby Spinach',
  'Organic Strawberries',
  'Organic Avocado',
  'Asparagus'],
 47838: ['Organic Ginger Root',
  'Organic Lacinato (Dinosaur) Kale',
  'Organic Baby Carrots',
  'Organic Italian Parsley Bunch',
  'Banana',
  'Organic Cucumber',
  'Organic Large Extra Fancy Fuji Apple',
  'Organic Avocado',
  'Limes',
  'Organic Zucchini'],
 49441: ['Organic Yellow Onion',
  'Organic Hass Avocado',
  'Honeycrisp Apple',
  'White Corn',
  'Organic Baby Carrots',
  'Organic Whole String Cheese',
  'Red Vine Tomato',
  'Extra Virgin Olive Oil',
  'Pineapple Chunks',
  'Organic Peeled Whole Baby Carrots'],
 50241: ['Organic Raspberries',
  'Yellow Onions',
  'Organic Baby Spinach',
  'Red Vine Tomato',
  'Organic Granny Smith Apple',
  'Bunched Cilantro',
  'Organic Black Beans',
  'Organic Yellow Onion',
  'Basil Pesto',
  'Limes'],
 51076: ['Strawberries',
  'Red Onion',
  'Organic Hass Avocado',
  'Blueberries',
  'Organic Peeled Whole Baby Carrots',
  'Original Hummus',
  'Organic Chicken & Apple Sausage',
  'Organic Grape Tomatoes',
  'Organic Fuji Apple',
  'Green Bell Pepper'],
 52784: ['Bag of Organic Bananas',
  'Organic Hass Avocado',
  'Large Lemon',
  'Red Vine Tomato',
  'Bunched Cilantro',
  'Organic Strawberries',
  'Honeycrisp Apple',
  'Green Bell Pepper',
  'Jalapeno Peppers',
  'Banana'],
 53304: ['Organic Baby Carrots',
  'Sweet Kale Salad Mix',
  'Blackberries',
  'Celery Sticks',
  'Brussels Sprouts',
  'Apple Honeycrisp Organic',
  'Brioche Hamburger Buns',
  'Apricots',
  'Packaged Grape Tomatoes',
  'Cereal'],
 53968: ['Strawberries',
  'Organic Baby Spinach',
  'Large Lemon',
  'Organic Garlic',
  'Organic Extra Firm Tofu',
  'Organic Zucchini',
  'Organic Avocado',
  'Organic Basil',
  'Red Onion',
  'Organic Raspberries'],
 55720: ['Organic Garlic',
  'Organic Zucchini',
  'Asparagus',
  'Fresh Cauliflower',
  'Organic Raspberries',
  'Organic Tomato Cluster',
  'Red Vine Tomato',
  'Organic Red Onion',
  'Organic Avocado',
  'Organic Yellow Onion'],
 56266: ['Banana',
  'Bag of Organic Bananas',
  'Limes',
  'Organic Strawberries',
  'Organic Garlic',
  'Roasted Red Pepper Hummus',
  'Organic Blueberries',
  'Organic Hass Avocado',
  'Honeycrisp Apple',
  'Ground Black Pepper'],
 58959: ['Organic Large Extra Fancy Fuji Apple',
  'Chicken Base, Organic',
  'Organic Baby Spinach',
  'Banana',
  'Organic Baby Carrots',
  'Organic Unsweetened Almond Milk',
  'Organic Garlic',
  'Large Lemon',
  'Grape White/Green Seedless',
  'Sparkling Lemon Water'],
 59889: ['Bag of Organic Bananas',
  'Organic Baby Spinach',
  'Organic Strawberries',
  'Organic Raspberries',
  'Organic Hass Avocado',
  'Large Lemon',
  'Organic Peeled Whole Baby Carrots',
  'Organic Baby Carrots',
  'Fresh Cauliflower',
  'Organic Black Beans'],
 61065: ['Bag of Organic Bananas',
  'Organic Baby Spinach',
  'Organic Strawberries',
  'Banana',
  'Organic Small Bunch Celery',
  'Organic Yellow Onion',
  'Strawberries',
  'Organic Ginger Root',
  'Yellow Onions',
  'Organic Raspberries'],
 63472: ['Carrots',
  'Bag of Organic Bananas',
  'Organic Avocado',
  'Organic Strawberries',
  'Banana',
  'Organic Grape Tomatoes',
  'Organic Hass Avocado',
  'Philadelphia Original Cream Cheese',
  'Half & Half',
  'Gluten Free Chocolate Dipped Donuts'],
 66265: ['Blueberries',
  'Organic Strawberries',
  'Bag of Organic Bananas',
  'Large Lemon',
  'Tilapia Filet',
  'Bartlett Pears',
  'Half & Half',
  'Banana',
  'Unsweetened Almondmilk',
  'Organic Peeled Whole Baby Carrots'],
 67941: ['Large Lemon',
  'Organic Red Onion',
  'Organic Baby Spinach',
  'Bag of Organic Bananas',
  'Strawberries',
  'Sparkling Water Grapefruit',
  'Fresh Cauliflower',
  'Organic Garlic',
  'Banana',
  'Organic Small Bunch Celery'],
 69178: ['Organic Blueberries',
  'Bag of Organic Bananas',
  'Organic Baby Spinach',
  'Organic Strawberries',
  'Strawberries',
  'Raspberries',
  'Banana',
  'Creamy Almond Butter',
  'Organic Garlic',
  'Brussels Sprouts'],
 72791: ['Orange Bell Pepper',
  'Bag of Organic Bananas',
  'Organic Peeled Whole Baby Carrots',
  'Organic Yellow Onion',
  'Organic Baby Spinach',
  'Organic Zucchini',
  'Bunched Cilantro',
  'Organic Avocado',
  'Organic Raspberries',
  'Sugar Snap Peas'],
 73171: ['Organic Blueberries',
  'Organic Zucchini',
  'Bag of Organic Bananas',
  'Large Lemon',
  'Extra Virgin Olive Oil',
  'Strawberries',
  'Organic Yellow Onion',
  'Organic Baby Spinach',
  'Basil Pesto',
  'Limes'],
 73477: ['Organic Baby Spinach',
  'Large Lemon',
  'Organic Kiwi',
  'Strawberries',
  'Organic Baby Carrots',
  'Organic Tomato Paste',
  'Organic Italian Parsley Bunch',
  'Organic Raspberries',
  'Organic Fuji Apple',
  'Limes'],
 75993: ['Bag of Organic Bananas',
  'Large Lemon',
  'Organic Garlic',
  'Organic Strawberries',
  'Organic Avocado',
  'Banana',
  'Organic Blueberries',
  'Organic Ground Korintje Cinnamon',
  'Organic Hass Avocado',
  'Organic Zucchini'],
 85028: ['Organic Avocado',
  'Organic Strawberries',
  'Organic Baby Spinach',
  'Organic Garlic',
  'Bag of Organic Bananas',
  'Yellow Onions',
  'Organic Hass Avocado',
  'Bunched Cilantro',
  'Organic Raspberries',
  'Organic Baby Carrots'],
 85238: ['Organic Avocado',
  'Organic Raspberries',
  'Organic Zucchini',
  'Bag of Organic Bananas',
  'Organic Baby Spinach',
  'Organic Yellow Onion',
  'Organic Small Bunch Celery',
  'Limes',
  'Organic Ginger Root',
  'Organic Lemon'],
 87350: ['Organic Avocado',
  'Apple Honeycrisp Organic',
  'Original Hummus',
  'Organic Cucumber',
  'Carrots',
  'Yellow Onions',
  'Small Hass Avocado',
  'Extra Virgin Olive Oil',
  'Organic Zucchini',
  'Organic Blueberries'],
 89776: ['Organic Zucchini',
  'Large Lemon',
  'Red Onion',
  'Orange Bell Pepper',
  'Organic Fuji Apple',
  'Organic Avocado',
  'Carrots',
  'Red Vine Tomato',
  'Organic Yellow Onion',
  'Organic Extra Firm Tofu'],
 93241: ['Organic Strawberries',
  'Large Lemon',
  'Organic Garlic',
  'Organic Raspberries',
  'Organic Creamy Peanut Butter',
  'Organic Baby Carrots',
  'Creamy Almond Butter',
  'Lime',
  'Bag of Organic Bananas',
  'Organic Red Bell Pepper'],
 95686: ['Banana',
  'Organic Avocado',
  'Organic Hass Avocado',
  'Organic Strawberries',
  'Large Lemon',
  'Organic Zucchini',
  'Organic Baby Spinach',
  'Organic Large Extra Fancy Fuji Apple',
  'Organic Baby Carrots',
  'Asparagus'],
 96466: ['Organic Zucchini',
  'Organic Small Bunch Celery',
  'Organic Fuji Apple',
  'Organic Large Extra Fancy Fuji Apple',
  'Organic Baby Carrots',
  'Fresh Cauliflower',
  'Organic Baby Spinach',
  'Michigan Organic Kale',
  'Organic Whole Strawberries',
  'Large Lemon'],
 98570: ['Strawberries',
  'Organic Hass Avocado',
  'Organic Whole String Cheese',
  'Organic Baby Spinach',
  'Original Hummus',
  'Organic Tomato Paste',
  'Organic Yellow Onion',
  'Extra Virgin Olive Oil',
  'Organic Peeled Whole Baby Carrots',
  'Organic Black Beans'],
 99282: ['Organic Raspberries',
  'Organic Small Bunch Celery',
  'Organic Cucumber',
  'Original Hummus',
  'Organic Yellow Onion',
  'Organic Avocado',
  'Organic Zucchini',
  'Strawberries',
  'Organic Lemon',
  'Organic Baby Carrots'],
 100253: ['Organic Strawberries',
  'Organic Hass Avocado',
  'Organic Avocado',
  'Bing Cherries',
  'Organic Yellow Onion',
  'Organic Grape Tomatoes',
  'Organic Lemon',
  'Organic Tomato Cluster',
  'Extra Virgin Olive Oil',
  'Organic Garlic'],
 102099: ['Bag of Organic Bananas',
  'Organic Strawberries',
  'Organic Zucchini',
  'Red Vine Tomato',
  'Organic Hass Avocado',
  'Organic Blueberries',
  'Organic Large Extra Fancy Fuji Apple',
  'Organic Avocado',
  'Organic Small Bunch Celery',
  'Organic Raspberries'],
 104175: ['Banana',
  'Organic Blueberries',
  'Organic Strawberries',
  'Large Lemon',
  'Bag of Organic Bananas',
  'Extra Virgin Olive Oil',
  'Organic Baby Spinach',
  'Cucumber Kirby',
  'Organic Yellow Onion',
  'Organic Avocado'],
 107051: ['Organic Zucchini',
  'Bag of Organic Bananas',
  'Organic Baby Spinach',
  'Limes',
  'Organic Avocado',
  'Organic Extra Firm Tofu',
  'Fresh Cauliflower',
  'Asparagus',
  'Chocolate Chip Cookie Dough Ice Cream',
  'Organic Blueberries'],
 107931: ['Large Lemon',
  'Organic Granny Smith Apple',
  'Pineapple Chunks',
  'Organic Baby Carrots',
  'Organic Raspberries',
  'Limes',
  'Organic Italian Parsley Bunch',
  'White Corn',
  'Bag of Organic Bananas',
  'Yellow Onions'],
 111387: ['Bag of Organic Bananas',
  'Garlic',
  'Banana',
  'Organic Raspberries',
  'Large Lemon',
  'Organic Lemon',
  'Organic Cilantro',
  'Clementines, Bag',
  'Organic Red Bell Pepper',
  '100% Whole Wheat Bread'],
 114336: ['Organic Strawberries',
  'Strawberries',
  'Organic Blueberries',
  'Sweet Kale Salad Mix',
  'Blackberries',
  'Organic Baby Carrots',
  'Cereal',
  'Hass Avocados',
  'Organic Baby Spinach',
  'Meyer Lemons'],
 114764: ['Bag of Organic Bananas',
  'Organic Strawberries',
  'Banana',
  'Organic Ginger Root',
  'Organic Small Bunch Celery',
  'Organic Baby Spinach',
  'Organic Kiwi',
  'Organic Cucumber',
  'Unsweetened Almondmilk',
  'Frozen Organic Wild Blueberries'],
 118102: ['Bag of Organic Bananas',
  'Banana',
  'Organic Baby Spinach',
  'Organic Zucchini',
  'Large Lemon',
  'Organic Blueberries',
  'Lime Sparkling Water',
  'Organic Lacinato (Dinosaur) Kale',
  'Organic Ginger Root',
  'Organic Whole Strawberries'],
 118981: ['Large Lemon',
  'Organic Blueberries',
  'Organic Avocado',
  'Organic Fuji Apple',
  'Strawberries',
  'Basil Pesto',
  'Organic Baby Spinach',
  'Organic Garlic',
  'Organic Raspberries',
  'Organic Baby Carrots'],
 120138: ['Banana',
  'Organic Grape Tomatoes',
  'Strawberries',
  'Red Onion',
  'Large Lemon',
  'Blueberries',
  'Organic Fuji Apple',
  'Organic Avocado',
  'Organic Cilantro',
  'Organic Peeled Whole Baby Carrots'],
 120660: ['Bag of Organic Bananas',
  'Organic Yellow Onion',
  'Organic Lemon',
  'Limes',
  'Organic Baby Spinach',
  'Organic Strawberries',
  'Banana',
  'Organic Blueberries',
  'Organic Large Extra Fancy Fuji Apple',
  'Organic Hass Avocado'],
 125120: ['Organic Blueberries',
  'Strawberries',
  'Organic Strawberries',
  'Bag of Organic Bananas',
  'Organic Baby Carrots',
  'Blackberries',
  'Raspberries',
  'Fresh Asparagus',
  'Honeycrisp Apples',
  'Sweet Kale Salad Mix'],
 129124: ['Organic Baby Spinach',
  'Organic Strawberries',
  'Banana',
  'Organic Avocado',
  '100% Recycled Paper Towels',
  'Bag of Organic Bananas',
  'Extra Virgin Olive Oil',
  'Organic Grape Tomatoes',
  'Feta Cheese Crumbles',
  'Large Lemon'],
 131280: ['Organic Strawberries',
  'Limes',
  'Orange Bell Pepper',
  'Bag of Organic Bananas',
  'Organic Blueberries',
  'Strawberries',
  'Organic Baby Spinach',
  'Organic Grape Tomatoes',
  'Organic Tomato Cluster',
  'Blueberries'],
 132038: ['Organic Strawberries',
  'Large Lemon',
  'Organic Blueberries',
  'Grape White/Green Seedless',
  'Organic Zucchini',
  'Organic Avocado',
  'Limes',
  'Orange Bell Pepper',
  'Organic Baby Spinach',
  'Organic Whole Milk'],
 132551: ['Bag of Organic Bananas',
  'Strawberries',
  'Organic Blueberries',
  'Organic Strawberries',
  'Organic Baby Carrots',
  'Blackberries',
  'Fresh Asparagus',
  'Organic Lemon',
  'Soda',
  'Real Mayonnaise'],
 133738: ['Organic Avocado',
  'Organic Garlic',
  'Cucumber Kirby',
  'Organic Baby Carrots',
  'Organic Whole Milk',
  'Bag of Organic Bananas',
  'Organic Blueberries',
  'Organic Small Bunch Celery',
  'Organic Hass Avocado',
  'Organic Strawberries'],
 133964: ['Large Lemon',
  'Strawberries',
  'Organic Blueberries',
  'Organic Zucchini',
  'Organic Strawberries',
  'Basil Pesto',
  'Organic Baby Spinach',
  'Original Fresh Stack Crackers',
  'Bag of Organic Bananas',
  'No Salt Added Black Beans'],
 138067: ['Organic Whole Milk',
  'Kale & Spinach Superfood Puffs',
  'Shredded Mild Cheddar Cheese',
  'Organic Grape Tomatoes',
  'Granny Smith Apples',
  'Large Lemon',
  'Pure & Natural Sour Cream',
  'Cherubs Heavenly Salad Tomatoes',
  'Lime',
  'Limes'],
 138203: ['Organic Avocado',
  'Organic Garlic',
  'Limes',
  'Organic Peeled Whole Baby Carrots',
  'Yellow Onions',
  'Organic Small Bunch Celery',
  'Organic Cilantro',
  'Organic Tomato Paste',
  'Organic Garnet Sweet Potato (Yam)',
  'Organic Hass Avocado'],
 139656: ['Organic Raspberries',
  'Organic Strawberries',
  'Organic Blueberries',
  'Organic Garlic',
  'Organic Avocado',
  'Organic Granny Smith Apple',
  'Organic Cilantro',
  'Organic Blackberries',
  'Large Lemon',
  'Banana'],
 141719: ['Organic Raspberries',
  'Strawberries',
  'Organic Strawberries',
  'Organic Baby Carrots',
  'Organic Hass Avocado',
  'Organic Ginger Root',
  'Organic Cilantro',
  'Red Onion',
  'Organic Large Extra Fancy Fuji Apple',
  'Organic Peeled Whole Baby Carrots'],
 147179: ['Organic Avocado',
  'Organic Zucchini',
  'Organic Strawberries',
  'Bag of Organic Bananas',
  'Basil Pesto',
  'Strawberries',
  'Organic Whole Milk',
  'Organic Raspberries',
  'Seedless Red Grapes',
  'Organic Baby Arugula'],
 151119: ['Large Lemon',
  'Banana',
  'Pure & Natural Sour Cream',
  'Red Onion',
  'Organic Grape Tomatoes',
  'Blueberries',
  'Limes',
  'Eggo Homestyle Waffles',
  'Sauvignon Blanc',
  'Elbow Macaroni Pasta'],
 151410: ['Organic Strawberries',
  'Banana',
  'Bag of Organic Bananas',
  'Red Vine Tomato',
  'Large Lemon',
  'Organic Avocado',
  'Organic Cucumber',
  'Organic Zucchini',
  'Limes',
  'Organic Mint'],
 151564: ['Organic Strawberries',
  'Organic Baby Spinach',
  'Banana',
  'Large Lemon',
  'Original Hummus',
  'Organic Raspberries',
  'Organic Ginger Root',
  'Organic Baby Carrots',
  'Limes',
  'Organic Cilantro'],
 154852: ['Organic Strawberries',
  'Bag of Organic Bananas',
  'Organic Garlic',
  'Organic Yellow Onion',
  'Organic Avocado',
  'Organic Cilantro',
  'Organic Granny Smith Apple',
  'Organic Baby Spinach',
  'Organic Blueberries',
  'Extra Virgin Olive Oil'],
 156537: ['Strawberries',
  'Large Lemon',
  'Organic Strawberries',
  'Organic Baby Carrots',
  'Red Onion',
  'Blackberries',
  'Beef Loin New York Strip Steak',
  'Bunched Cilantro',
  'Organic Baby Arugula',
  'Cherrios Honey Nut'],
 157497: ['Organic Baby Spinach',
  'Strawberries',
  'Banana',
  'Orange Bell Pepper',
  'Organic Zucchini',
  'Organic Blueberries',
  'Organic Garlic',
  'Organic Strawberries',
  'Organic Garnet Sweet Potato (Yam)',
  '100% Recycled Paper Towels'],
 157798: ['Seedless Small Watermelon',
  'Organic Baby Spinach',
  'Hass Avocados',
  'Seedless Red Grapes',
  'Heavy Duty Aluminum Foil',
  'Creamy Almond Butter',
  'Organic Whole String Cheese',
  'Meyer Lemons',
  'Brussels Sprouts',
  'Brioche Hamburger Buns'],
 158373: ['Banana',
  'Organic Avocado',
  'Organic Quick Oats',
  'Seedless Red Grapes',
  'Organic Blueberries',
  'Organic Grape Tomatoes',
  'Limes',
  'Organic Broccoli Florets',
  'Organic Lemon',
  'Organic Baby Carrots'],
 159308: ['Red Onion',
  'Blueberries',
  'Large Lemon',
  'Plain Whole Milk Yogurt',
  'Orange Bell Pepper',
  'Organic Reduced Fat Milk',
  'Provolone',
  'Seedless Red Grapes',
  'Organic Beans & Rice Cheddar Cheese Burrito',
  'Supergreens!'],
 161574: ['Bag of Organic Bananas',
  'Sweet Kale Salad Mix',
  'Soda',
  'Raspberries',
  'Green Bell Pepper',
  'Apricots',
  'Real Mayonnaise',
  'Organic Navel Orange',
  'Fat Free Skim Milk',
  'Brussels Sprouts'],
 166707: ['Organic Baby Carrots',
  'Organic Avocado',
  'Strawberries',
  'Organic Small Bunch Celery',
  'Yellow Onions',
  'Banana',
  'Organic Ginger Root',
  'Organic Baby Spinach',
  'Cucumber Kirby',
  'Red Vine Tomato'],
 169583: ['Organic Hass Avocado',
  'Organic Zucchini',
  'Extra Virgin Olive Oil',
  'Organic Blueberries',
  'Honeycrisp Apple',
  'Organic Yellow Onion',
  'Organic Raspberries',
  'Basil Pesto',
  'Bing Cherries',
  'Organic Italian Parsley Bunch'],
 177453: ['Banana',
  'Large Lemon',
  'Organic Baby Arugula',
  'Strawberries',
  'Bag of Organic Bananas',
  'Limes',
  'Organic Rainbow Carrots',
  'Chocolate Peanut Butter Ice Cream',
  'Instant Coffee',
  'Organic Medjool Dates'],
 179429: ['Banana',
  'Large Lemon',
  'Organic Grape Tomatoes',
  'Organic Garlic',
  'Orange Bell Pepper',
  'Organic Cilantro',
  'Sustainably Soft Bath Tissue',
  'Organic Diced Tomatoes',
  'Organic Fuji Apple',
  'Strawberries'],
 180305: ['Bag of Organic Bananas',
  'Organic Yellow Onion',
  'Organic Baby Spinach',
  'Banana',
  'Organic Whole Milk',
  'Asparagus',
  'Organic Peeled Whole Baby Carrots',
  'Apple Honeycrisp Organic',
  'Organic Grape Tomatoes',
  'Large Lemon'],
 180461: ['Organic Black Beans',
  'Original Hummus',
  'Strawberries',
  'Free & Clear Natural Laundry Detergent For Sensitive Skin',
  'Organic Blackberries',
  'Organic Tomato Paste',
  'Basil Pesto',
  'Asparagus',
  '100% Recycled Paper Towels',
  'Extra Virgin Olive Oil'],
 182863: ['Bag of Organic Bananas',
  'Organic Hass Avocado',
  'Organic Strawberries',
  'Organic Red Bell Pepper',
  'Organic Baby Spinach',
  'Organic Black Beans',
  'Banana',
  'Organic Small Bunch Celery',
  'Organic Garnet Sweet Potato (Yam)',
  "Organic D'Anjou Pears"],
 185153: ['Organic Peeled Whole Baby Carrots',
  'Organic Baby Spinach',
  'Large Lemon',
  'Cucumber Kirby',
  'Blueberries',
  'Organic Avocado',
  'Feta Cheese Crumbles',
  'Organic Garlic',
  'Organic Strawberries',
  'Tomato Sauce'],
 187019: ['Organic Blueberries',
  'Organic Baby Carrots',
  'Blackberries',
  'Bag of Organic Bananas',
  'Limes',
  'Organic Lemon',
  'Sweet Kale Salad Mix',
  'Organic Strawberries',
  'Brussels Sprouts',
  'Organic Avocado'],
 187754: ['Organic Yellow Onion',
  'Organic Cilantro',
  'Organic Lemon',
  'Banana',
  'Organic Garlic',
  'Organic Granny Smith Apple',
  'Large Lemon',
  'Organic Peeled Whole Baby Carrots',
  'Frozen Organic Wild Blueberries',
  'Organic Black Beans'],
 192587: ['Banana',
  'Organic Grape Tomatoes',
  'Organic Grade A Free Range Large Brown Eggs',
  'Organic Peeled Whole Baby Carrots',
  'Strawberries',
  'Cucumber Kirby',
  'Organic Blueberries',
  'Organic Baby Arugula',
  'Organic Baby Spinach',
  'Fresh Cauliflower'],
 197989: ['Organic Strawberries',
  'Organic Cucumber',
  'Large Lemon',
  'Organic Avocado',
  'Organic Yellow Onion',
  'Asparagus',
  'Extra Virgin Olive Oil',
  'Granny Smith Apples',
  'Organic Gala Apples',
  'Organic Blueberries'],
 199124: ['Red Onion',
  'Banana',
  'Strawberries',
  'Classic Hummus',
  'Seedless Red Grapes',
  'Orange Bell Pepper',
  'Organic Strawberries',
  'Organic Avocado',
  'Blueberries',
  'Uncured Genoa Salami'],
 200078: ['Large Lemon',
  'Jalapeno Peppers',
  'Bag of Organic Bananas',
  'Carrots',
  'Red Onion',
  'Orange Bell Pepper',
  'Yellow Onions',
  'Organic Hass Avocado',
  'Organic Peeled Whole Baby Carrots',
  'Limes'],
 201135: ['Organic Avocado',
  'Large Lemon',
  'Organic Yellow Onion',
  'Organic Garlic',
  'Organic Baby Carrots',
  'Original Hummus',
  'Organic Whole String Cheese',
  'Pineapple Chunks',
  'Organic Cucumber',
  'Organic Red Onion'],
 201870: ['Large Lemon',
  'Organic Baby Spinach',
  'Strawberries',
  'Organic Baby Carrots',
  'Organic Half & Half',
  'Organic Ginger Root',
  'Organic Zucchini',
  'Organic Fuji Apple',
  'Bag of Organic Bananas',
  'Organic Avocado'],
 203111: ['Strawberries',
  'Packaged Grape Tomatoes',
  'Unsweetened Vanilla Almond Breeze',
  'Organic Sage',
  'Organic Tortilla Chips',
  'Soda',
  'Vanilla Milk Chocolate Almond Ice Cream Bars Multi-Pack',
  'Organic Zucchini',
  'Coconut Water',
  'Teriyaki & Pineapple Chicken Meatballs']}
CPU times: user 46.5 ms, sys: 2.75 ms, total: 49.2 ms
Wall time: 1min

As can be seen above, our collaborative filtering function returns a dictionary of users and their top 10 recommended products. To see how we arrived at the output above, let's break down our function using a speficic example, user 4789. For reference, below are the products that this user has been recommended:

In [278]:
recommendations[4789]
Out[278]:
['Organic Granny Smith Apple',
 'Limes',
 'Organic Green Cabbage',
 'Organic Cilantro',
 'Creamy Almond Butter',
 'Corn Tortillas',
 'Organic Grape Tomatoes',
 'Unsweetened Almondmilk',
 'Organic Blackberries',
 'Organic Lacinato (Dinosaur) Kale']

The first main component of our collaborative filtering function identifies the top 10 neighbours for user 4789. It does so by creating user pairs, where u1 is always user 4789 and u2 is any other user who has purchased products that u1 has purchased. It then computes the Jaccard index for each user pair, by taking the number of distinct products that u1 and u2 have purchased in common (intersection_count) and dividing it by the number of distinct products that are unique to each user (union_count). The 10 users with the highest Jaccard index are selected as user 4789's neighbourhood.

In [285]:
query = """
        // Get count of all distinct products that user 4789 has purchased and find other users who have purchased them
        MATCH (u1:User)-[:ORDERED]->(:Order)-[:CONTAINS]->(p:Product)<-[:CONTAINS]-(:Order)<-[:ORDERED]-(u2:User)
        WHERE u1 <> u2
          AND u1.user_id = {uid}
        WITH u1, u2, COUNT(DISTINCT p) as intersection_count
        
        // Get count of all the distinct products that are unique to each user
        MATCH (u:User)-[:ORDERED]->(:Order)-[:CONTAINS]->(p:Product)
        WHERE u in [u1, u2]
        WITH u1, u2, intersection_count, COUNT(DISTINCT p) as union_count
       
        // Compute Jaccard index
        WITH u1, u2, intersection_count, union_count, (intersection_count*1.0/union_count) as jaccard_index
        
        // Get top k neighbours based on Jaccard index
        ORDER BY jaccard_index DESC, u2.user_id
        WITH u1, COLLECT([u2.user_id, jaccard_index, intersection_count, union_count])[0..{k}] as neighbours
     
        WHERE LENGTH(neighbours) = {k}                // only want to return users with enough neighbours
        RETURN u1.user_id as user, neighbours
        """

neighbours = {}
for row in g.run(query, uid=4789, k=10):
    neighbours[row[0]] = row[1]

print("Labels for user 4789's neighbour list: user_id, jaccard_index, intersection_count, union count")
display(neighbours)
Labels for user 4789's neighbour list: user_id, jaccard_index, intersection_count, union count

  
{4789: [[42145, 0.12794612794612795, 38, 297],
  [138203, 0.10497237569060773, 38, 362],
  [87350, 0.09390862944162437, 37, 394],
  [49441, 0.0912280701754386, 26, 285],
  [187754, 0.0912280701754386, 26, 285],
  [180461, 0.09115281501340483, 34, 373],
  [120660, 0.08641975308641975, 21, 243],
  [107931, 0.08360128617363344, 26, 311],
  [73477, 0.07855626326963906, 37, 471],
  [154852, 0.0735930735930736, 17, 231]]}

The second main component of our collaborative filtering function generates recommendations for user 4789 using the neighbours identified above. It does so by considering products that the neighbours have purchased which user 4789 has not already purchased. The function then counts the number of neighbours who have purchased each of the candidate products. The 10 products with the highest neighbour count are selected as recommendations for user 4789.

In [287]:
%%time
query = """
        // Get top n recommendations for user 4789 from the selected neighbours
        MATCH (u1:User),
              (neighbour:User)-[:ORDERED]->(:Order)-[:CONTAINS]->(p:Product)        // get all products bought by neighbour
        WHERE u1.user_id = {uid}
          AND neighbour.user_id in {neighbours}
          AND not (u1)-[:ORDERED]->(:Order)-[:CONTAINS]->(p)                        // which u1 has not already bought
        
        WITH u1, p, COUNT(DISTINCT neighbour) as cnt                                // count times purchased by neighbours
        ORDER BY u1.user_id, cnt DESC                                               // and sort by count desc
        RETURN u1.user_id as user, COLLECT([p.product_name,cnt])[0..{n}] as recos  
        """

recos = {}
for row in g.run(query, uid=4789, neighbours=[42145,138203,87350,49441,187754,180461,120660,107931,73477,154852], n=10):
    recos[row[0]] = row[1]
    
print("Labels for user 4789's recommendations list: product, number of purchasing neighbours")
display(recos)
Labels for user 4789's recommendations list: product, number of purchasing neighbours


{4789: [['Organic Granny Smith Apple', 6],
  ['Limes', 5],
  ['Organic Green Cabbage', 5],
  ['Organic Cilantro', 5],
  ['Creamy Almond Butter', 5],
  ['Corn Tortillas', 5],
  ['Organic Grape Tomatoes', 5],
  ['Unsweetened Almondmilk', 4],
  ['Organic Blackberries', 4],
  ['Organic Lacinato (Dinosaur) Kale', 4]]}
  

CPU times: user 6.85 ms, sys: 2.08 ms, total: 8.93 ms
Wall time: 716 ms

Part 5: Evaluating Recommender Performance

If we were to actually integrate our recommender system in to a production environment, we would need a way to measure its performance. As mentioned, in the context of a user check-out application for an online grocer, the goal is to increase basket size, by surfacing a short list of products that are as relevant as posssible to the user. For this particular application, we could choose precision as our metric for evaluating our recommender's performance. Precision is computed as the proportion of products that the user actually purchased, out of all the products that user has been recommended. To determine overall recommender performance, average precision can be calculated using the precision values for all the users in the system.

Conclusion

We have demonstrated how to build a user-based recommender system leveraging the principles of user-user collaborative filtering. We've discussed the key concepts underlying this algorithm, from identifying neighbourhoods using a similarity metric, to generating recommendations for a user based on its neighbours' preferences. In addition, we have shown how easy and intuitive modeling connected data can be with a graph database. One final point worth noting: in real world applications, we may want to implement non-personalized recommendation strategies for users who are new to the system and those who have not yet made sufficient purchases. Strategies may include recommending top selling products for new users, and for the latter group, products identified to have high affinity with other products that the user has already purchased. This can be done through association rules mining, also known as market basket analysis.

Association Rules Mining Using Python Generators to Handle Large Datasets

Motivation

I was looking to run association analysis in Python using the apriori algorithm to derive rules of the form {A} -> {B}. However, I quickly discovered that it's not part of the standard Python machine learning libraries. Although there are some implementations that exist, I could not find one capable of handling large datasets. "Large" in my case was an orders dataset with 32 million records, containing 3.2 million unique orders and about 50K unique items (file size just over 1 GB). So, I decided to write my own implementation, leveraging the apriori algorithm to generate simple {A} -> {B} association rules. Since I only care about understanding relationships between any given pair of items, using apriori to get to item sets of size 2 is sufficient. I went through various iterations, splitting the data into multiple subsets just so I could get functions like crosstab and combinations to run on my machine with 8 GB of memory. :) But even with this approach, I could only process about 1800 items before my kernel would crash... And that's when I learned about the wonderful world of Python generators.

Python Generators

In a nutshell, a generator is a special type of function that returns an iterable sequence of items. However, unlike regular functions which return all the values at once (eg: returning all the elements of a list), a generator yields one value at a time. To get the next value in the set, we must ask for it - either by explicitly calling the generator's built-in "next" method, or implicitly via a for loop. This is a great property of generators because it means that we don't have to store all of the values in memory at once. We can load and process one value at a time, discard when finished and move on to process the next value. This feature makes generators perfect for creating item pairs and counting their frequency of co-occurence. Here's a concrete example of what we're trying to accomplish:

  1. Get all possible item pairs for a given order

    eg:  order 1:  apple, egg, milk    -->  item pairs: {apple, egg}, {apple, milk}, {egg, milk}
         order 2:  egg, milk           -->  item pairs: {egg, milk}
  2. Count the number of times each item pair appears

    eg: {apple, egg}: 1
        {apple, milk}: 1
        {egg, milk}: 2

Here's the generator that implements the above tasks:

In [1]:
import numpy as np
from itertools import combinations, groupby
from collections import Counter

# Sample data
orders = np.array([[1,'apple'], [1,'egg'], [1,'milk'], [2,'egg'], [2,'milk']], dtype=object)

# Generator that yields item pairs, one at a time
def get_item_pairs(order_item):
    
    # For each order, generate a list of items in that order
    for order_id, order_object in groupby(orders, lambda x: x[0]):
        item_list = [item[1] for item in order_object]      
    
        # For each item list, generate item pairs, one at a time
        for item_pair in combinations(item_list, 2):
            yield item_pair                                      


# Counter iterates through the item pairs returned by our generator and keeps a tally of their occurrence
Counter(get_item_pairs(orders))
Out[1]:
Counter({('apple', 'egg'): 1, ('apple', 'milk'): 1, ('egg', 'milk'): 2})

get_item_pairs() generates a list of items for each order and produces item pairs for that order, one pair at a time. The first item pair is passed to Counter which keeps track of the number of times an item pair occurs. The next item pair is taken, and again, passed to Counter. This process continues until there are no more item pairs left. With this approach, we end up not using much memory as item pairs are discarded after the count is updated.

Apriori Algorithm

Apriori is an algorithm used to identify frequent item sets (in our case, item pairs). It does so using a "bottom up" approach, first identifying individual items that satisfy a minimum occurence threshold. It then extends the item set, adding one item at a time and checking if the resulting item set still satisfies the specified threshold. The algorithm stops when there are no more items to add that meet the minimum occurrence requirement. Here's an example of apriori in action, assuming a minimum occurence threshold of 3:

order 1: apple, egg, milk  
order 2: carrot, milk  
order 3: apple, egg, carrot
order 4: apple, egg
order 5: apple, carrot


Iteration 1:  Count the number of times each item occurs   
item set      occurrence count    
{apple}              4   
{egg}                3   
{milk}               2   
{carrot}             2   

{milk} and {carrot} are eliminated because they do not meet the minimum occurrence threshold.


Iteration 2: Build item sets of size 2 using the remaining items from Iteration 1 (ie: apple, egg)  
item set           occurence count  
{apple, egg}             3  

Only {apple, egg} remains and the algorithm stops since there are no more items to add.

If we had more orders and items, we can continue to iterate, building item sets consisting of more than 2 elements. For the problem we are trying to solve (ie: finding relationships between pairs of items), it suffices to implement apriori to get to item sets of size 2.

Association Rules Mining

Once the item sets have been generated using apriori, we can start mining association rules. Given that we are only looking at item sets of size 2, the association rules we will generate will be of the form {A} -> {B}. One common application of these rules is in the domain of recommender systems, where customers who purchased item A are recommended item B.

Here are 3 key metrics to consider when evaluating association rules:

  1. support
    This is the percentage of orders that contains the item set. In the example above, there are 5 orders in total and {apple,egg} occurs in 3 of them, so:

                 support{apple,egg} = 3/5 or 60%

    The minimum support threshold required by apriori can be set based on knowledge of your domain. In this grocery dataset for example, since there could be thousands of distinct items and an order can contain only a small fraction of these items, setting the support threshold to 0.01% may be reasonable.

  2. confidence
    Given two items, A and B, confidence measures the percentage of times that item B is purchased, given that item A was purchased. This is expressed as:

                 confidence{A->B} = support{A,B} / support{A}

    Confidence values range from 0 to 1, where 0 indicates that B is never purchased when A is purchased, and 1 indicates that B is always purchased whenever A is purchased. Note that the confidence measure is directional. This means that we can also compute the percentage of times that item A is purchased, given that item B was purchased:

                 confidence{B->A} = support{A,B} / support{B}

    In our example, the percentage of times that egg is purchased, given that apple was purchased is:

                 confidence{apple->egg} = support{apple,egg} / support{apple}
                                        = (3/5) / (4/5)
                                        = 0.75 or 75%

    A confidence value of 0.75 implies that out of all orders that contain apple, 75% of them also contain egg. Now, we look at the confidence measure in the opposite direction (ie: egg->apple):

                 confidence{egg->apple} = support{apple,egg} / support{egg}
                                        = (3/5) / (3/5)
                                        = 1 or 100%  

    Here we see that all of the orders that contain egg also contain apple. But, does this mean that there is a relationship between these two items, or are they occurring together in the same orders simply by chance? To answer this question, we look at another measure which takes into account the popularity of both items.

  3. lift
    Given two items, A and B, lift indicates whether there is a relationship between A and B, or whether the two items are occuring together in the same orders simply by chance (ie: at random). Unlike the confidence metric whose value may vary depending on direction (eg: confidence{A->B} may be different from confidence{B->A}), lift has no direction. This means that the lift{A,B} is always equal to the lift{B,A}:

                 lift{A,B} = lift{B,A} = support{A,B} / (support{A} * support{B})

    In our example, we compute lift as follows:

                 lift{apple,egg} = lift{egg,apple} = support{apple,egg} / (support{apple} * support{egg})
                                                   = (3/5) / (4/5 * 3/5) 
                                                   = 1.25

    One way to understand lift is to think of the denominator as the likelihood that A and B will appear in the same order if there was no relationship between them. In the example above, if apple occurred in 80% of the orders and egg occurred in 60% of the orders, then if there was no relationship between them, we would expect both of them to show up together in the same order 48% of the time (ie: 80% * 60%). The numerator, on the other hand, represents how often apple and egg actually appear together in the same order. In this example, that is 60% of the time. Taking the numerator and dividing it by the denominator, we get to how many more times apple and egg actually appear in the same order, compared to if there was no relationship between them (ie: that they are occurring together simply at random).

    In summary, lift can take on the following values:

     * lift = 1 implies no relationship between A and B. 
       (ie: A and B occur together only by chance)
    
     * lift > 1 implies that there is a positive relationship between A and B.
       (ie:  A and B occur together more often than random)
    
     * lift < 1 implies that there is a negative relationship between A and B.
       (ie:  A and B occur together less often than random)

    In our example, apple and egg occur together 1.25 times more than random, so we conclude that there exists a positive relationship between them.

Armed with knowledge of apriori and association rules mining, let's dive into the data and code to see what relationships we unravel!

Input Dataset

Instacart, an online grocer, has graciously made some of their datasets accessible to the public. The order and product datasets that we will be using can be downloaded from the link below, along with the data dictionary:

“The Instacart Online Grocery Shopping Dataset 2017”, Accessed from https://www.instacart.com/datasets/grocery-shopping-2017 on September 1, 2017.

In [2]:
import pandas as pd
import numpy as np
import sys
from itertools import combinations, groupby
from collections import Counter
In [3]:
# Function that returns the size of an object in MB
def size(obj):
    return "{0:.2f} MB".format(sys.getsizeof(obj) / (1000 * 1000))

Part 1: Data Preparation

A. Load order data

In [4]:
orders = pd.read_csv('order_products__prior.csv')
print('orders -- dimensions: {0};   size: {1}'.format(orders.shape, size(orders)))
display(orders.head())
orders -- dimensions: (32434489, 4);   size: 1037.90 MB
order_id product_id add_to_cart_order reordered
0 2 33120 1 1
1 2 28985 2 1
2 2 9327 3 0
3 2 45918 4 1
4 2 30035 5 0

B. Convert order data into format expected by the association rules function

In [5]:
# Convert from DataFrame to a Series, with order_id as index and item_id as value
orders = orders.set_index('order_id')['product_id'].rename('item_id')
display(orders.head(10))
type(orders)
order_id
2    33120
2    28985
2     9327
2    45918
2    30035
2    17794
2    40141
2     1819
2    43668
3    33754
Name: item_id, dtype: int64

pandas.core.series.Series

C. Display summary statistics for order data

In [6]:
print('orders -- dimensions: {0};   size: {1};   unique_orders: {2};   unique_items: {3}'
      .format(orders.shape, size(orders), len(orders.index.unique()), len(orders.value_counts())))
orders -- dimensions: (32434489,);   size: 518.95 MB;   unique_orders: 3214874;   unique_items: 49677

Part 2: Association Rules Function

A. Helper functions to the main association rules function

In [7]:
# Returns frequency counts for items and item pairs
def freq(iterable):
    if type(iterable) == pd.core.series.Series:
        return iterable.value_counts().rename("freq")
    else: 
        return pd.Series(Counter(iterable)).rename("freq")

    
# Returns number of unique orders
def order_count(order_item):
    return len(set(order_item.index))


# Returns generator that yields item pairs, one at a time
def get_item_pairs(order_item):
    order_item = order_item.reset_index().as_matrix()
    for order_id, order_object in groupby(order_item, lambda x: x[0]):
        item_list = [item[1] for item in order_object]
              
        for item_pair in combinations(item_list, 2):
            yield item_pair
            

# Returns frequency and support associated with item
def merge_item_stats(item_pairs, item_stats):
    return (item_pairs
                .merge(item_stats.rename(columns={'freq': 'freqA', 'support': 'supportA'}), left_on='item_A', right_index=True)
                .merge(item_stats.rename(columns={'freq': 'freqB', 'support': 'supportB'}), left_on='item_B', right_index=True))


# Returns name associated with item
def merge_item_name(rules, item_name):
    columns = ['itemA','itemB','freqAB','supportAB','freqA','supportA','freqB','supportB', 
               'confidenceAtoB','confidenceBtoA','lift']
    rules = (rules
                .merge(item_name.rename(columns={'item_name': 'itemA'}), left_on='item_A', right_on='item_id')
                .merge(item_name.rename(columns={'item_name': 'itemB'}), left_on='item_B', right_on='item_id'))
    return rules[columns]               

B. Association rules function

In [8]:
def association_rules(order_item, min_support):

    print("Starting order_item: {:22d}".format(len(order_item)))


    # Calculate item frequency and support
    item_stats             = freq(order_item).to_frame("freq")
    item_stats['support']  = item_stats['freq'] / order_count(order_item) * 100


    # Filter from order_item items below min support 
    qualifying_items       = item_stats[item_stats['support'] >= min_support].index
    order_item             = order_item[order_item.isin(qualifying_items)]

    print("Items with support >= {}: {:15d}".format(min_support, len(qualifying_items)))
    print("Remaining order_item: {:21d}".format(len(order_item)))


    # Filter from order_item orders with less than 2 items
    order_size             = freq(order_item.index)
    qualifying_orders      = order_size[order_size >= 2].index
    order_item             = order_item[order_item.index.isin(qualifying_orders)]

    print("Remaining orders with 2+ items: {:11d}".format(len(qualifying_orders)))
    print("Remaining order_item: {:21d}".format(len(order_item)))


    # Recalculate item frequency and support
    item_stats             = freq(order_item).to_frame("freq")
    item_stats['support']  = item_stats['freq'] / order_count(order_item) * 100


    # Get item pairs generator
    item_pair_gen          = get_item_pairs(order_item)


    # Calculate item pair frequency and support
    item_pairs              = freq(item_pair_gen).to_frame("freqAB")
    item_pairs['supportAB'] = item_pairs['freqAB'] / len(qualifying_orders) * 100

    print("Item pairs: {:31d}".format(len(item_pairs)))


    # Filter from item_pairs those below min support
    item_pairs              = item_pairs[item_pairs['supportAB'] >= min_support]

    print("Item pairs with support >= {}: {:10d}\n".format(min_support, len(item_pairs)))


    # Create table of association rules and compute relevant metrics
    item_pairs = item_pairs.reset_index().rename(columns={'level_0': 'item_A', 'level_1': 'item_B'})
    item_pairs = merge_item_stats(item_pairs, item_stats)
    
    item_pairs['confidenceAtoB'] = item_pairs['supportAB'] / item_pairs['supportA']
    item_pairs['confidenceBtoA'] = item_pairs['supportAB'] / item_pairs['supportB']
    item_pairs['lift']           = item_pairs['supportAB'] / (item_pairs['supportA'] * item_pairs['supportB'])
   

    # Return association rules sorted by lift in descending order
    return item_pairs.sort_values('lift', ascending=False)

Part 3: Association Rules Mining

In [9]:
%%time
rules = association_rules(orders, 0.01)  
Starting order_item:               32434489
Items with support >= 0.01:           10906
Remaining order_item:              29843570
Remaining orders with 2+ items:     3013325
Remaining order_item:              29662716
Item pairs:                        30622410
Item pairs with support >= 0.01:      48751

CPU times: user 9min 26s, sys: 34.5 s, total: 10min 1s
Wall time: 10min 13s
In [10]:
# Replace item ID with item name and display association rules
item_name   = pd.read_csv('products.csv')
item_name   = item_name.rename(columns={'product_id':'item_id', 'product_name':'item_name'})
rules_final = merge_item_name(rules, item_name).sort_values('lift', ascending=False)
display(rules_final)
itemA itemB freqAB supportAB freqA supportA freqB supportB confidenceAtoB confidenceBtoA lift
0 Organic Strawberry Chia Lowfat 2% Cottage Cheese Organic Cottage Cheese Blueberry Acai Chia 306 0.010155 1163 0.038595 839 0.027843 0.263113 0.364720 9.449868
1 Grain Free Chicken Formula Cat Food Grain Free Turkey Formula Cat Food 318 0.010553 1809 0.060033 879 0.029170 0.175788 0.361775 6.026229
3 Organic Fruit Yogurt Smoothie Mixed Berry Apple Blueberry Fruit Yogurt Smoothie 349 0.011582 1518 0.050376 1249 0.041449 0.229908 0.279424 5.546732
9 Nonfat Strawberry With Fruit On The Bottom Gre... 0% Greek, Blueberry on the Bottom Yogurt 409 0.013573 1666 0.055288 1391 0.046162 0.245498 0.294033 5.318230
10 Organic Grapefruit Ginger Sparkling Yerba Mate Cranberry Pomegranate Sparkling Yerba Mate 351 0.011648 1731 0.057445 1149 0.038131 0.202773 0.305483 5.317849
11 Baby Food Pouch - Roasted Carrot Spinach & Beans Baby Food Pouch - Butternut Squash, Carrot & C... 332 0.011018 1503 0.049878 1290 0.042810 0.220892 0.257364 5.159830
12 Unsweetened Whole Milk Mixed Berry Greek Yogurt Unsweetened Whole Milk Blueberry Greek Yogurt 438 0.014535 1622 0.053828 1621 0.053794 0.270037 0.270204 5.019798
23 Uncured Cracked Pepper Beef Chipotle Beef & Pork Realstick 410 0.013606 1839 0.061029 1370 0.045465 0.222947 0.299270 4.903741
24 Organic Mango Yogurt Organic Whole Milk Washington Black Cherry Yogurt 334 0.011084 1675 0.055586 1390 0.046128 0.199403 0.240288 4.322777
2 Grain Free Chicken Formula Cat Food Grain Free Turkey & Salmon Formula Cat Food 391 0.012976 1809 0.060033 1553 0.051538 0.216142 0.251771 4.193848
25 Raspberry Essence Water Unsweetened Pomegranate Essence Water 366 0.012146 2025 0.067202 1304 0.043274 0.180741 0.280675 4.176615
13 Unsweetened Whole Milk Strawberry Yogurt Unsweetened Whole Milk Blueberry Greek Yogurt 440 0.014602 1965 0.065210 1621 0.053794 0.223919 0.271437 4.162489
14 Unsweetened Whole Milk Peach Greek Yogurt Unsweetened Whole Milk Blueberry Greek Yogurt 421 0.013971 1922 0.063783 1621 0.053794 0.219043 0.259716 4.071849
44 Oh My Yog! Pacific Coast Strawberry Trilayer Y... Oh My Yog! Organic Wild Quebec Blueberry Cream... 860 0.028540 2857 0.094812 2271 0.075365 0.301015 0.378688 3.994083
55 Mighty 4 Kale, Strawberry, Amaranth & Greek Yo... Mighty 4 Essential Tots Spinach, Kiwi, Barley ... 390 0.012943 2206 0.073208 1337 0.044370 0.176791 0.291698 3.984498
20 Unsweetened Whole Milk Peach Greek Yogurt Unsweetened Whole Milk Strawberry Yogurt 499 0.016560 1922 0.063783 1965 0.065210 0.259625 0.253944 3.981352
65 0% Greek, Blueberry on the Bottom Yogurt Nonfat Strawberry With Fruit On The Bottom Gre... 305 0.010122 1391 0.046162 1666 0.055288 0.219267 0.183073 3.965918
15 Unsweetened Whole Milk Mixed Berry Greek Yogurt Unsweetened Whole Milk Peach Greek Yogurt 410 0.013606 1622 0.053828 1922 0.063783 0.252774 0.213319 3.963014
43 Unsweetened Whole Milk Peach Greek Yogurt Unsweetened Whole Milk Mixed Berry Greek Yogurt 407 0.013507 1922 0.063783 1622 0.053828 0.211759 0.250925 3.934016
26 Unsweetened Blackberry Water Unsweetened Pomegranate Essence Water 494 0.016394 3114 0.103341 1304 0.043274 0.158638 0.378834 3.665867
19 Unsweetened Whole Milk Mixed Berry Greek Yogurt Unsweetened Whole Milk Strawberry Yogurt 383 0.012710 1622 0.053828 1965 0.065210 0.236128 0.194911 3.621024
16 Unsweetened Whole Milk Strawberry Yogurt Unsweetened Whole Milk Peach Greek Yogurt 444 0.014735 1965 0.065210 1922 0.063783 0.225954 0.231009 3.542526
56 Mighty 4 Sweet Potato, Blueberry, Millet & Gre... Mighty 4 Essential Tots Spinach, Kiwi, Barley ... 398 0.013208 2534 0.084093 1337 0.044370 0.157064 0.297681 3.539900
74 Sweet Potatoes Stage 2 Organic Stage 2 Winter Squash Baby Food Puree 322 0.010686 2077 0.068927 1322 0.043872 0.155031 0.243570 3.533734
79 Compostable Forks Plastic Spoons 321 0.010653 1528 0.050708 1838 0.060996 0.210079 0.174646 3.444151
75 Organic Stage 2 Carrots Baby Food Organic Stage 2 Winter Squash Baby Food Puree 337 0.011184 2306 0.076527 1322 0.043872 0.146141 0.254917 3.331080
42 Unsweetened Whole Milk Strawberry Yogurt Unsweetened Whole Milk Mixed Berry Greek Yogurt 352 0.011681 1965 0.065210 1622 0.053828 0.179135 0.217016 3.327938
21 Unsweetened Whole Milk Blueberry Greek Yogurt Unsweetened Whole Milk Strawberry Yogurt 350 0.011615 1621 0.053794 1965 0.065210 0.215916 0.178117 3.311071
17 Unsweetened Whole Milk Blueberry Greek Yogurt Unsweetened Whole Milk Peach Greek Yogurt 341 0.011316 1621 0.053794 1922 0.063783 0.210364 0.177419 3.298101
83 Cream Top Blueberry Yogurt Cream Top Peach on the Bottom Yogurt 313 0.010387 1676 0.055620 1748 0.058009 0.186754 0.179062 3.219399
... ... ... ... ... ... ... ... ... ... ... ...
22444 Large Lemon Hass Avocados 468 0.015531 152177 5.050136 49246 1.634274 0.003075 0.009503 0.001882
2577 Red Onion Bag of Organic Bananas 1008 0.033451 42906 1.423876 376367 12.490090 0.023493 0.002678 0.001881
250 Roasted Pine Nut Hummus Banana 327 0.010852 11176 0.370886 470096 15.600574 0.029259 0.000696 0.001876
655 Organic Large Green Asparagus Banana 556 0.018451 19228 0.638099 470096 15.600574 0.028916 0.001183 0.001854
40897 Banana Organic Extra Virgin Olive Oil 369 0.012246 470096 15.600574 12788 0.424382 0.000785 0.028855 0.001850
2652 Spinach Bag of Organic Bananas 383 0.012710 16766 0.556395 376367 12.490090 0.022844 0.001018 0.001829
2722 Sour Cream Bag of Organic Bananas 486 0.016128 21481 0.712867 376367 12.490090 0.022625 0.001291 0.001811
11143 Organic Blueberries Blueberries 329 0.010918 99359 3.297321 55703 1.848556 0.003311 0.005906 0.001791
2537 Green Onions Bag of Organic Bananas 592 0.019646 26467 0.878332 376367 12.490090 0.022367 0.001573 0.001791
1386 2% Reduced Fat Milk Organic Strawberries 574 0.019049 36768 1.220180 263416 8.741706 0.015611 0.002179 0.001786
3291 2% Reduced Fat Milk Organic Baby Spinach 523 0.017356 36768 1.220180 240637 7.985763 0.014224 0.002173 0.001781
530 Chocolate Chip Cookies Banana 377 0.012511 13688 0.454249 470096 15.600574 0.027542 0.000802 0.001765
10681 Half & Half Organic Half & Half 302 0.010022 68842 2.284586 75334 2.500029 0.004387 0.004009 0.001755
5446 Organic Reduced Fat 2% Milk Organic Whole Milk 379 0.012577 47593 1.579418 136832 4.540898 0.007963 0.002770 0.001754
11455 Banana Soda 864 0.028673 470096 15.600574 33008 1.095401 0.001838 0.026175 0.001678
11421 Bag of Organic Bananas Fridge Pack Cola 366 0.012146 376367 12.490090 18005 0.597513 0.000972 0.020328 0.001628
2568 Asparation/Broccolini/Baby Broccoli Bag of Organic Bananas 317 0.010520 16480 0.546904 376367 12.490090 0.019235 0.000842 0.001540
19596 Banana Organic Tortilla Chips 320 0.010619 470096 15.600574 13458 0.446616 0.000681 0.023778 0.001524
2319 Fridge Pack Cola Bag of Organic Bananas 341 0.011316 18005 0.597513 376367 12.490090 0.018939 0.000906 0.001516
11017 Organic Baby Spinach 2% Reduced Fat Milk 403 0.013374 240637 7.985763 36768 1.220180 0.001675 0.010961 0.001373
22572 Organic Raspberries Raspberries 322 0.010686 136621 4.533895 56858 1.886886 0.002357 0.005663 0.001249
11012 Organic Strawberries 2% Reduced Fat Milk 371 0.012312 263416 8.741706 36768 1.220180 0.001408 0.010090 0.001154
246 Soda Banana 531 0.017622 33008 1.095401 470096 15.600574 0.016087 0.001130 0.001031
11555 Banana Clementines 397 0.013175 470096 15.600574 29798 0.988874 0.000845 0.013323 0.000854
1474 Strawberries Organic Strawberries 706 0.023429 141805 4.705931 263416 8.741706 0.004979 0.002680 0.000570
7271 Organic Strawberries Strawberries 640 0.021239 263416 8.741706 141805 4.705931 0.002430 0.004513 0.000516
6763 Organic Hass Avocado Organic Avocado 464 0.015398 212785 7.061469 176241 5.848722 0.002181 0.002633 0.000373
4387 Organic Avocado Organic Hass Avocado 443 0.014701 176241 5.848722 212785 7.061469 0.002514 0.002082 0.000356
2596 Banana Bag of Organic Bananas 654 0.021704 470096 15.600574 376367 12.490090 0.001391 0.001738 0.000111
670 Bag of Organic Bananas Banana 522 0.017323 376367 12.490090 470096 15.600574 0.001387 0.001110 0.000089

48751 rows × 11 columns

Part 4: Conclusion

From the output above, we see that the top associations are not surprising, with one flavor of an item being purchased with another flavor from the same item family (eg: Strawberry Chia Cottage Cheese with Blueberry Acai Cottage Cheese, Chicken Cat Food with Turkey Cat Food, etc). As mentioned, one common application of association rules mining is in the domain of recommender systems. Once item pairs have been identified as having positive relationship, recommendations can be made to customers in order to increase sales. And hopefully, along the way, also introduce customers to items they never would have tried before or even imagined existed! If you wish to see the Python notebook corresponding to the code above, please click here.