Blog

June 2024 Recap – Under the Hood of A/B Testing

Our June 2024 meetup featured Dr. Maria Copot from OSU delving into some of the underlying theories behind our favorite A/B testing platforms. Though before we get into the fun math part (yes, it’s fun, don’t look at me like that) — we need to all remember that there needs to be a question behind your experiment. If you don’t have a hypothesis you’re trying to validate, then what’s the point of testing something? Once you’ve got something you want to test, then you can test it, but testing just for the sake of saying how many A/B tests your department ran last year isn’t going to get you where you want to be.

A lot of us have been asked, “is this result statistically significant?” And maybe we’ve even said, “well, the P-value is <0.05 so it’s significant”… But what exactly is a P-value and why is 0.05 the number a big deal? Dr. Copot explained the basics of P-values, including that 0.05 is an arbitrary benchmark, and that it can’t tell you anything about the size of an effect, its validity, or reason behind it. If that still sounds a bit confusing, it’s time to queue the memes about scientists being unable to explain P-values in an intuitive way. We think Dr. Copot’s explanation would be in the top quantile of that distribution at any rate. Even if math is fun, it isn’t always intuitive.

Dr. Copot also talked about sample sizes and power analysis (one such online calculator I’ve used many times here: https://www.evanmiller.org/ab-testing/sample-size.html), but then moved on to talking about Bayesian methods. Traditional A/B tools (like Google Optimize, RIP) have typically used Frequentists methods like we’ve been talking about with P-values. Newer tools have folded in some Bayesian methods, which thankfully are a little more intuitive, if perhaps more mathematically & computationally expensive.

Finally, we talked about how privacy regulations, sampling, and cookie limitations can make doing these kinds of experiments more difficult. One way around these limitations is to use paid platforms like Prolific where you can make your own sample group and run a group of fully consented users through an experiment of your choosing.


Please join us next month when Lauren Burke-McCarthy will talk about how to succeed as a solo data scientist.