February 2020 Recap – Factor Analysis with Ahmad Ahmad

Our February meetup was a fun and informative session on factor analysis with Ahmad Ahmad downtown at Hopewell. Thanks to everyone that showed up even though the weather was kind of awful! Ahmad first gave us an introduction into the concept of factor analysis and when it might be helpful, then in good cbusdaw style proceeded to show us some real data and analysis, throughout fielding some solid questions from the crowd.

Some data is easily reducible to a smaller set of groups. For example, during Ahmad’s talk there was a significant amount of water droplets falling from the sky. These droplets could be very easily reduced into different types: rain, snow, freezing rain, and sleet. This is a case where we definitely don’t need factor analysis, because these are directly observed variables, i.e. we know what precipitation is what without using any statistical methods. Nobody would ever ask an analyst to quantify if it was snowing or not, but they would ask us to figure out from a bunch of different web stats why their users aren’t converting. How do we boil those dozens of dimensions down to groups of a few useful ones that share some kind of common underlying dimension that was not directly observed? And then what happens with even less obviously reducible data? We need statistical tools! Ahmad first walked us through a couple of thought exercises on this kind of dimension reduction to see how this might function from a high level.

Ahmad knew to bring data though, so he brought an analysis that he did based upon Boeing employee survey data where he attempts to turn 30-something different questions into a useful number of higher level groups, in this case 4. He did the analysis in SPSS, but he brought the results into Excel for presentation and has been kind enough to provide that Excel file as well as his slides:

Included in his slides are links to more info on factor analysis using the following tools:
SPSS (with Varimax Rotation)
Excel (with XLSTAT add-on)
Python (with Factor Analyzer package)
R

Those of us that also saw Dr. Michael Levin talk on cluster analysis last September may also want to revisit his talk to remind ourselves of the differences between these two very similar topics that go hand-in-hand but have different uses. Factor analysis is about discovering these underlying groups (like finding the survey questions that could be represented into an underlying factor of “career stage” — age, expected retirement date, and less obvious dimensions like perception of selling potential). Cluster analysis is about sorting the surveyed people into groups by using the existing dimensions, such that group members are more self-similar than people in another group.

Please join us next month when we’ll be back at Rev1 learning about service design with Monica and Anthony Weiler.

Columbus Data & Analytics Wednesdays

Blog

February 2020 Recap – Factor Analysis with Ahmad Ahmad

Related Posts: