Blog

September 2019 Recap – Cluster Analysis with Dr. Michael Levin

Our September meetup featured a strong turnout for the always popular Dr. Michael Levin from Otterbein University speaking about cluster analysis. We’ve checked the records and this was Dr. Levin’s 4th time presenting! An impressive feat which puts him close to the free tote bag for members of the five-timers club. Considering the quality of the content and the great questions it engendered we better start designing that tote bag!

So what exactly is cluster analysis?? “K-means cluster analysis” — it sounds kind of esoteric and difficult, but Dr. Levin showed both how crucial this kind of analysis is and as the ease with which it can be implemented. We might have 10,000 different individual customers, but if we want to actually analyze and then take actions upon those customers we really need to split them up into a manageable number of groups.

Don’t forget that groups of everyone combined or everyone one at a time are still groups, just not very useful ones! Useful groups are the smallest number of groups we can have that split up our set clustered by the dimensions that we are interested in.

Dr. Levin walked us through an example of this kind of grouping with real world data and was brave enough to actually bring up Excel to do a live coding example. Typically that’s a good way to make sure everything explodes, but the only breakage was a few brief projector outages.

He was also kind enough to share both his slides and his Excel templates! The four cluster approach comes from Wayne Winston’s book “Marketing Analytics: Data-Driven Techniques with Microsoft Excel“.

Excel Templates:

Three Cluster Solution Template
Four Cluster Solution Template
Five Cluster Solution Template


 

This kind of analysis can of course also be done in your statistics package / programming language of choice. We will now provide a couple of links on how it can be done in R or Python to satisfy our toolset “fairness doctrine” requirements, as mandated by the cbusdaw bylaws. As a bonus these also shows just how simple excel can make it!

R
http://markedmondson.me/intro-to-machine-learning-with-web-analytics-random-forests-and-k-means
https://uc-r.github.io/kmeans_clustering

Python
https://towardsdatascience.com/an-introduction-to-clustering-algorithms-in-python-123438574097

Please join us next month when Martijn Scheijbeler from RV Share will discuss product analytics!