August 2024 Recap – Being Intentional, Privacy by Design, and More with Matt Gershoff
For this month’s event, Matt Gershoff, CEO of Conductrics, traveled from the land of triple-digit heat on the regular (Austin) to the land of pleasant-temps-even-in-August (Columbus) to share some thoughts and examples about being intentional when it comes to data collection. If you attended the meetup, then you’re here because we promised that we’d post some useful supplemental information, and we’re going to flip the script of a normal recap post by putting those up front:
- [20-Minute Video] Matt’s talk at PEPR ’24—a Venn diagram with the talk he gave at DAW has something like a 63% overlap, although his DAW talk is a larger circle, as there was additional material! But, since we don’t record DAW talks, the PEPR talk is a good one to share with a colleague who is kicking themselves for not attending DAW.
- [<2-Minute Video] Matt talking about intentionality—not even remotely by design, this was a piece of an interview that another one of the DAW sponsors, Piwik PRO, did with Matt. Useful and thoughtful stuff.
- [5-Page PDF] Privacy by Design: The 7 Foundational Principles—a very worthwhile read; Matt focused primarily on Principle #2, but you’ll never believe what Principle #4 and #7 are! (Seriously, if you give it a real read, it will blow your mind a little bit; it’s almost three decades old and was an underpinning of GDPR!)
- Matt will also be on an upcoming (as of this writing) episode of the Analytics Power Hour podcast, so, if “audio only” is your or a colleague’s jam, smash the subscribe button there.
Matt’s presentation—with annotations added to make it an upgrade from “just the slides”—is included at the end of this post, but a few of the highlights from his presentation were:
- “Just Enough” vs. “Just in Case” Data Collection—Matt made a stronnnnnng case that the industry bias is for the latter, while “privacy by default” demands the former. “Just Enough” data means aligning to a specific and explicit task or objective and then collecting as little data as needed to complete the task. “Just in Case” is a “maximize optionality” play—hoovering up as much data as possible at as granular a level as possible so that there are as many possible “options” for doing “stuff” with it in the future. We are so wired to the latter that it’s uncomfortable to recognize why that Is. Not. Good.
- This doesn’t mean there are no cases where high granularity / high cardinality data is warranted—throughout the talk, Matt was clear that he was not speaking in any absolutes (unless we count as an absolute that, “all data collection should be performed with intentionality”).
- Many types of A/B tests, both univariate and multivariate, can be statistically evaluated without recording data at the individual user level—if you’re like the author of this recap, you’ve always operated under the assumption that A/B tests require capturing each user’s session, including which variant they were assigned to, maybe some other meta data about them (what level of a loyalty program they belong to, for instance, for “deeper dive analysis”), whether or not they converted, and maybe the amount of their purchase. 10,000 visitors in the test? That’s 10,000 rows of data! What Matt demonstrated was, um, no. That’s incorrect thinking. By using equivalence classes, some understanding of the maths underlying the statistical tests of interest (t-test, OLS regression, and more), it’s possible to simply capture/increment aggregated counts (visitor count, sum of sales, sum of the squares of sales) and perform the exact same statistical tests in a way that is: computationally less intensive, data storage much less intensive, and aligned with privacy by design principle #2: privacy by default (and privacy by design principles #3 and #4 and #7). Matt outlined a lot of this in this blog post (although he has since extended his research and thinking on the subject… and it continues to hold up!)
- There are different techniques and concepts that are good to be familiar with when embracing privacy by design—K-anonymity, differential privacy, global vs. local privacy, and more! The key with all of them is that they’re best employed when approaching them as privacy by design rather than privacy-tacked-on-later-to-maintain-regulatory-compliance.
A lot of ground was covered with pretty lively audience engagement and more than a few laughs!
The annotated slides:
And, as always, a few pictures to capture the atmosphere: