Optimising Starbucks’ product promotions

7 min readDec 12, 2020

Project Overview

Using historical data of customers’ information and past transactions, and their interactions with Starbucks promotional offers, a strategy to optimize how to target specific products and improve customers loyalty is presented.

Every few days, Starbucks promotes a specific product or offer to increase customers’ attention. In the dataset, there are three different types of offer:

Informational: A mere promotion of a product without reward or minimum spending
BOGO (Buy One Get One): As the name states, spending a certain amount on products is rewarded by doubling the products bought
Discount: Spending some amount is rewarded by reducing the price of the next product to be bought

Problem statement

Then, the idea is to determine which demographic groups should a specific promotion be targeted based on their response of past transactions.

For that, using three datasets:

One containing costumers data, or “Profile”
Another with informations about the type of promotions offered by Starbucks, or “Portfolio”
And the last one with past interactions between customers and offers, or “Transcript”

A Machine Learning model will be proposed to predict the type of promotion that should be offered to a specific customer.

Specifically, the following steps will be followed:

Data Exploration and Visualisation
Data Preprocessing
Model Implementation
Model Evaluation and Validation
Conclusions

Metrics

For each model, the metric that will be used to assess its performance will be how many correct predictions of customer completing a promotion it is able to predict.

In particular, the F1 score has been chosen since to avoid skewed results for other metrics such as the accuracy. The F1 score, as stated by IBM is a “a measurement that considers both precision and recall to compute the score”.

1. Data Exploration and Visualisation

The first thing that can be extracted from the data is that there are 2175 customers that are 118 years old, clearly representing outliers in the data as can be seen in the figure below.

Outliers in the age distribution of customers

Additionally, the there are also exactly 2175 null entries in the “income” and “gender” features, for which an age of 118 is associated. This clearly means that these values are false entries and have been eliminated from the dataset.

Null gender values:  2175
Null income values:  2175
Age corresponding to null values:  118

These data points seem not being actually valid and have been discarded from the dataset for the rest of the analysis.

Then, taking a look at the distribution of customers by gender, it can be seen that 57.23% are male while 41.34% is female as shown in the figure on the left.

Also, the clients have a mean age of 54, with an important bulk of people between 40 and 60 years old. Nonetheless, a non-negligible part of the customers are located below the age of 30 as shown in the figure below.

Regarding the distribution of income between customers, they have a mean of 65K$. Plus, even if the maximum income is around 120K$, 75% of the customers have an income of less than 80K$ as presented in the figure above.

Regarding the events recorded as part of the transaction dataset, it can be seen that around 24.88% are received offers, but only 10% are completed offers as shown in the figure on the left.

The transaction data remains the most interesting one since it is data that reveals how customers buy products without the need of an offer. From there, valuable information can be extracted to assess their reaction to a product offer.

2. Data processing and analysis

All the features of this dataset will be processed to have numerical values so they can be later used in a ML model. For that, when applicable all the data points will be classified into groups for easier analysis.

In summary, the following steps have been followed:

The false data points with age 118 will be removed to clean the data set, setting first all those values to NaN. Then, all the NaN values are removed from the data set.
Four age groups are established: teenager, young-adult, adult, and elderly
Three categories of income are established: low, average and high
The year when the customers signed up and the days since they became members can be extracted as new columns. Depending on the time a person has been customer, three categories are established: newcomer, regular, loyal
Convert complicate customerID to easier enumeration as done with profile dataset
Finally, convert the offer information into different columns: amount, reward and offerID

3. Model implementation

The first thing to do to implement the model is to identify which are the features that we want to predict (target), and which will be used to identify the customers.

In this case, the target will be the “event” information. This, as the same time could be either “offer viewed”, or “offer completed”, as the other cases were eliminated. Then, the rest of the columns will be the features that will be used to predict the targets.

To make it easier to the model, some features related to the offers with originally numerical values will be normalized: time, amount and duration (in hours).

Also, to be able to implement the model, the dataset is split into test and training data of the following size:

Training data size:  59781
Test data size:  19927

And they have the following event distribution:

Now, several simple classification and regression models will be trained with the dataset and the accuracy of the model will be calculated.

Their accuracy on the training data with respect to the test data will be compared to assess the best model. The most-known models from Scikit-Learn will be implemented:

K-Nearest Neighbors
Support Vector Machines
Decision Trees Classifiers: Random Forest
Naive Bayes
Logistic Regression

All five models have been implemented and trained with the training data, to later use them with the testing data.

4. Model evaluation and validation

After the accuracy for all mentioned models were obtained (both with the training and testing data sets), the following results were obtained for each case:

It can be observed that for four out of five of the models, it was obtained a 100% training accuracy. On the other hand, for SVC, Random Forest, and Logistic Regression a 100% was also obtained in prediction accuracy.

For additional value, a refinement on the K-Nearest Neighbors model was performed since it was the one with “lowest” accuracy. After performing a GridSearch, it was found that the optimal parameters for this model are the following:

algorithm = “kd_tree”
n_neighbors = 5
weights = “distance”

Giving a slightly improved F1-score of:

k-Nearest Neighbors Classifier
F1-score Completed Offers:  99.62 %
F1-score Viewed Offers:  99.75 %

Hence, the chosen model would be the K-Nearest Neighbors with the following hyperparameters:

{'algorithm': 'kd_tree', 'n_neighbors': 5, 'weights': 'distance'}

This model has been chosen since it provides a very high F1-score (the metric chosen to evaluate the models), but it reduces the risk to fall into overfitting as might have happened with the other models that had 100% accuracy.

On the other hand, the K-Nearest Neighbors algorithm provides a very secure way to solve binary problems, as it was in this case.

5. Conclusions

The implementation of a Machine Learning model to predict what kind of offer to be targeted to a specific customer depending on their demographics has been conducted.

These customers have been grouped into smaller groups that has allowed a better targetting of the promotions, depending on:

Age: teenager, young-adult, adult, and elderly
Gender: male, female, and other
Income: low, average, and high
Loyalty: newcomer, regular, loyal

A deep data cleaning and processing, classifying the customers into groups for each feature has allowed the ML model to detect exactly the features that are most linked to an offer and great accuracy results have been obtained.

Even if all models used have shown impressive results, they can also lead to overfitting. Given the type of data set that was used during the analysis, and after performing the refinement, the K-Nearest Neighbors appears to be the most suitable candidate to obtain the best results without falling into overfitting.

Reflection

The most tedious part has clearly been the preprocessing of the data. The need of transforming categorical data into numerical features so that the ML model could process it, as well as finding the right group classification of customers demographic have been the most challenging tasks.

More generally, envisioning the real objective of the activity and how a ML model could help predicting how a specific promotion could be optimised for a certain demographic was not straightforward either at the beginning.

Future Improvements

The need of further customer data becomes useful to avoid targeting offers to such big groups of people. For instance: employment status, residence location, frequency of attendance to a Starbucks, etc. might shed important information for better modelling. Nevertheless, this might only apply to regular customers who are ready to participate in a loyalty program to be able to obtain this information.