Portfolio

Microsoft Excel is an often overlooked tool when it comes to the serious data scientist. Excel has many limits, it can't handle large data sets well, it has an ugly and convoluted way of writing logic and formulas and it becomes laggy when you add too much data to your file. Not to mention auto-correcting your inputs to random dates!

But, let's not forget Excel is most people's first data playground. The vast majority of businesses in the world rely on Excel and often leaders expect business results to be shared with them in an Excel spreadsheet. For the most part spreadsheets are here to stay, and Excel has a lot of nice features that make it most people's go to tool when they want to quickly explore a data set before deciding what to do with it. That's why in this article I'm going to be working through a dataset from UberEats, analysing some marketing data and highlighting some simple features that makes Excel a quick and easy to use tool for data analysts.

This is a dataset provided as an Excel sheet from between 2015-2016. Let's have a look.

We start with 36 columns and 2206 rows. Each row in the data set represents an individual customer over a given time period. Each column is an attribute of that customer; such as how much they spent, what kind of products they bought, if they bought through a promotional campaign and so on.

When I'm given a dataset like this I like to focus on what the data is telling me, but more importantly on what questions can I answer with this dataset? So that's where I began. The objective of this dataset is to maximise business revenue, so let's focus on that. We can start by plotting a scatter plot of the customers income vs the amount they spent (MntTotal) in this dataset and let's take a look.

From this plot we can see there is a reasonably strong correlation with someone's annual income and how much they spent with UberEats. The correlation looks pretty strong but we can be more accurate with our estimations. When creating a scatter plot in Excel we can opt to include the R2 value (it's a little hidden, you have to add a trend line in add chart element and then format the trend line).

Ok, so now we've got an R2 value on our scatter plot, but what does this mean? Well it is similar to the R value of a chart, and is another method of measuring correlation between two continuous variables, so a value of -0.9 means there is a very strong negative correlation and a value of 0.9 means there is a very strong positive correlation. Values of around -0.5 to 0.5 are considered reasonably weak. More precisely however, this value (0.6774) tells us that 68% of the variation in this dataset can be attributed to the relationship between a customers income and the amount of money they spent. It is calculated as the variance (the average of the sum of the squares) of the mean of the sample minus the variance of the trend line divided by the variance of the mean. But Excel has this functions neatly built in for us. It's a useful way to quantify the relationship between two variables.

Ok so from this data it looks like people with an income >£60,000 spend more with Uber Eats. But there is a better way to show this, a Pareto distribution. Similar to a histogram, however it rank orders the categories we specify and tells you the distribution of each group. But to do this, we need to better clean our data. We do this with an Excel formula, by taking our income column and applying IF/ELSE logic to it.

=IF(AND(B2<=10000),"£0 - 10,000",IF(AND(B2<=20000,B2>10001),"£10,001 - £20,000",IF(AND(B2<=30000,B2>20001),"£20,001 - £30,000",IF(AND(B2<=40000,B2>30001),"£30,001 - £40,000",IF(AND(B2<=50000,B2>40001),"£40,001 - £50,000",IF(AND(B2<=60000,B2>50001),"£50,001 - £60,000",IF(AND(B2<=70000,B2>60001),"£60,001 - £70,000",IF(AND(B2<=80000,B2>70001),"£70,001 - £80,000",IF(AND(B2<=90000,B2>80001),"£80,001 - £90,000",IF(AND(B2<=100000,B2>90001),"£90,001 - £100,000",IF(B2>100000,"> £100,000","?")))))))))))

What we're telling Excel here is that if the continuous variable (income) value lands between these two set values, label that group with our desired label. This is a good way to change continuous data into categorical data. Now plotting a Pareto distribution of our income groups vs total spent we see a clearer picture.

Well, that's much more helpful than our initial scatter plot. From this we can see that people with an income between £70,001 - £80,000 accounted for approximately 30% of the total population, the majority for any category, shortly followed by the previous income bracket (£60,001 - £70,000). For brevity, we'll focus on this first group. However now we can deep dive into this income group to find out how Uber Eats might maximise revenue for this customer type.

We know that this group spends the most out of all income groups, but what products do they mostly purchase? What age group are they typically, and how do the companies previous marketing campaigns affect purchases. Lets take a look.

Firstly, let's take a look at what products are purchased most by this group. A simple bar chart is good for this or anything where you want to display categorical data.

Unfortunately our dataset comes with 'MntRegularProd' amount of regular produce as an aggregated category. We don't know what this is made up of or what sub groups it contains, however it clearly tops what most people spend their money on. What is really interesting however is the amount of purchases made on wines! Let's dive a little deeper into our demographic and see what else we can find out about them.

We can look at age group next, though we'll have to use a similar formula to what we did for income groups to create categorical groups. Now let's look at another plot.

Ok, so for our target demographic we know that the majority of people are aged 36-50, closely followed by the next age group. This also forms a nice looking distribution from our initial data too, we're starting to build up more of an idea about our target demographic. Let's take a look at the companies marketing campaigns next.

Within our dataset we know if customers bought during a marketing campaign as it's given as a 1 for yes and a 0 for no.

There are different ways of achieving this next step, but what I like to do to prevent my main datasource getting too complex is create a copy and reduce the columns down to the data I need. I did this for each marketing campaign. Applying a filter to select all the 1's for a given campaign, the sum can be calculated using Excel's subtotal function.

=SUBTOTAL(9,D2:D2203)

Excel's subtotal function is an easy way to create aggregate functions on datasets outside of using pivot tables.

From this data we can see that marketing campaign 6 has been the most successful one launched by Uber Eats (we don't know any of the specifics of this campaign) and it seems that revenue is increasing with each campaign. Let's see if this trend is the same with our targeted group from earlier.

The trend definitely follows the same pattern, so this group was likely influenced by the marketing campaigns as much as any other customer. Campaign 2 definitely didn't seem to do well!

Ok, lastly it looks like the amount of revenue is increasing from these last two plots. But it's likely that the customer base is also increasing to account for the increased spend. Let's take a look at when customers sign up to use the service to see if there are any patterns in the data.

Our original dataset has the date that each customer signed up to the service. But let's find out what month they joined in. We can use the function MONTH() on our cell to give us the numerical month that they signed up but that's not very readable. Instead use the TEXT function as so: =TEXT(AN2,"mmmm"). This gives the following:

Much better, now let's create a pivot table from this. I do this by selecting the data I want (The MonthName column) and clicking 'Recommend Pivot Tables' from the Insert ribbon. This is a really convenient way to explore your data without having to type out any formulas. We want to know how many people signed up in each month so drag MonthName to the Rows tab and we select COUNT from the Values tab.

Now we can see that we had the most sign ups in January and a total of 2205 sign ups in the given time period.

So what insights can we gain from this analysis?

Customers with an annual income of £60,000 to £80,000 spent the most with Uber Eats 💵
The most commonly bought product for this group (after regular products) was wine 🍷
Our target group is between 35 and 65 years old 👨 👴
Marketing campaign 6 was the most successful for all groups 📊
Most customers sign up in January 🗓️

What can we do with this information? Run promotional campaigns in January with offers on wine products? Maybe, there is a lot more that can be done with this data set and it was fun to explore. Excel does have some useful techniques to quickly look through some data and gain insights quickly.

Thanks for reading.

Storytelling with Data​: Analysing purchasing trends from Uber Eats with Microsoft Excel

Storytelling with Data: Analysing purchasing trends from Uber Eats with Microsoft Excel