Archive | DAW Recaps

April 2025 – Using Predictive Modeling to Prevent Homelessness with Ty Henkaline

Our April 2025 event featured Ty Henkaline talking about work that he has done with non-profits in Franklin County to help better understand homelessness. Ty has been working with Smart Columbus’ Columbus Community Information Exchange Initiative (CIE) to produce research that utilizes data from the Mid-Ohio Food Collective (MOFC) and the Community Shelter Board (CSB) to help us better understand this growing problem.

As Ben Franklin — for whom our county is named after — famously said, “An ounce of prevention is worth a pound of cure.” No question this is doubly true for homelessness, and providing early warning to agencies that help prevent these crises is a great use of data.

But as Ty pointed out, this data is not always easy to come by. Our existing systems were all built separately, and data integration was never a priority. Sensitive data about at-risk individuals is a challenging arena to work in, and Ty emphasized both the value of having partners that were truly invested in making this system work as well as the potential value of additional data sources.

This “spike chart” was a huge hit with the audience, and shows the following things:

  1. A growing increase in services usage (in particular food banks) was a strong leading indicator of a homelessness.
  2. With far fewer data sources compared to LA, Franklin County was able to see a very similar effect. How often do you see that in data modeling?
  3. Individuals experiencing first-time homelessness continue to need an elevated level of services after the initial crisis. This reinforces the notion that prevention can do a lot to improve the overall load on the system.

As promised, Ty provided us with his slides, which contain lots of links and some calls to action! Try scrolling to navigate the slides, or check out the direct link here.

If you’re interested in helping or learning more, please feel free to message Ty on LinkedIn.

Check out the engaged audience!

March 2025 Recap – A/B Testing with Melanie Bowles

headstone from Google Graveyard

When we last had Melanie Bowles as a speaker in 2019, she lead an informative session on building a sustainable experimentation strategy. Since nothing at all has changed since 2019, we just replayed that talk and then everyone went home. While I’m obviously being facetious — much of the strategies that Melanie laid out in that talk are still very relevant! The landscape has changed a lot since then, from big changes in browser privacy and client-side technology to the shutdown of the most widely used tool in the industry, Google Optimize.

While there’s no clear successor to Optimize, there are many good testing tools out there, including popular options like: AB Tasty, Convert.com, Visual Website Optimizer, Optimizely, etc. Most of these tools do also offer integration with GA4.

Deployment count of tools via BuiltWith data

As you can see from this chart based upon deployments in the top 1M sites, none of these tools are exactly catching fire with popularity. A big reason for that may be that none of them are free for unlimited usage like Optimize was. Melanie also pointed out that A/B testing and similar functionality like feature flagging has in some cases moved into all-in-one suites like Amplitude, Salesforce, etc. The sunset of Optimize can be looked at as a chance to mature our A/B testing practices, and focus them where they can have the most impact.

Melanie also suggested that we embrace AI tools, especially on the ideation side of testing. There’s no substitute for human expertise when building out tests, but it’s certainly not cheating to let ChatGPT come up with some potential variations for your test! Just remember to give the AI as much context as you can. Melanie ran through a quick example which included providing the AI with a customer persona, which you can find in her slides below!

 

As a new twist for the meetup in 2025, we’re making a donation to a speaker-selected non-profit at each event. Melanie chose to designate Columbus Cultural Orchestra — a program for young people 13-25 to develop their musical skills and enhance diversity in orchestral music — for a $250 donation!

February 2025 Recap – Analytics the Right Way with Tim Wilson

Our first event of 2025 was a book release party for CBUSDAW’s very own Tim Wilson!

If you’ve ever been to a CBUSDAW event before (or listened to his podcast the Analytics Power Hour) you’ll know that Tim has a lot of things to say about analytics. Smart things, funny things, cranky things, etc. To our benefit, he’s organized many of these thoughts together into a book (with co-author Dr. Joe Sutherland) called “Analytics the Right Way: A Business Leader’s Guide to Putting Data to Productive Use“.

This is an excellent book that may be targeted towards “business leaders” with its title, but can also be incredibly useful for analysts themselves in terms of how to think about doing analytics in a productive way. There’s a lot of books out there covering tools, methods, and technology — but Tim and Joe’s book stands out in being about actually using analytics within an organization to further business goals. (NB: this is Jason writing this recap, and not Tim awkwardly hyping his own book in the third person. Also Tim I promise I will get around to writing that Amazon review at some point.)

But this wasn’t just a book signing with free beer, Tim did also give a talk about some of the topics he covered in the book! We had a great crowd, with friends and colleagues of Tim’s coming in from as far as Chicago, Nashville, and Boston.

 

We also had Jim Gianoglio jump in behind the camera (Tim’s normal job) and get some great action shots:

 

November 2024 Recap – Piwik PRO and Clarity with Josh Silverbauer

For our November 2024 event, we brought Josh Silverbauer in from Philly to talk about behavior analytics (in the form of MS Clarity) and marketing analytics (in the form of Piwik PRO) and when you might want to use each one.

Since Josh is well-known for writing parody songs to introduce speakers, here at CBUSDAW we flipped the script on him and opened the event with a surprise parody song about Josh.

We present “It’s Josh Silverbauer” to the tune of “In the Midnight Hour” by Wilson Pickett, sung by Jim Gianoglio and featuring Jason Packer on kazoo.

 

While Josh is a fan of both Clarity and Piwik PRO, he’s pointed out that he’s not paid anything by either organization… so he’s free to tell it like it is. And “how it is” is that both tools are great additions to any analyst’s arsenal — and with a generous free tier for Piwik PRO and a totally free product with Clarity there’s not much barrier to entry.

Josh pointed out how the two tools can easily used to supplement each other. For example, one could use Piwik PRO to find a particular aggregate group of users that aren’t converting well, and then review those users’ entire sessions with the session recording feature in Clarity. Or check the heatmap of the landing page for that same group.

If you’ve used session recording tools in the past you know that it can be pretty tedious to watch the recordings one-by-one. It’s like, “geez, just click the button already user #23341, it’s RIGHT THERE”.

Microsoft has recently integrated CoPilot into Clarity so it can now help save you from watching a ton of videos and instead can summarize and do some basic analysis for you.

Josh described Piwik PRO as “what Google’s Universal Analytics 2.0 could have been if GA4 didn’t exist”. If you were a serious user of our dearly departed UA you’ll feel right at home in Piwik PRO, and you’ll be pleased to see how well thought-out the platform is.

Josh’s slides:

Ok, so Clarity and Piwik PRO are both pretty cool tools, but what about rock operas?

Josh (and Jason) are releasing Volume 2 of Josh’s epic analytics rock opera entitled “User Journey Volume 2: The Custom Dimension” on November 18th.

You can listen to the first volume, “Universal Sunset” now on Spotify and most other streaming platforms.

Finally, don’t forget to join us next month for our yearly holiday event at the Grandview Theater. No speakers, but there will be a movie this year and you can vote on what you’d like it to be!

Disclaimer: Piwik PRO is a sponsor of CBUSDAW — but they only pay for the monthly pizza, not our (or Josh’s) endorsements.

October 2024 Recap – Geo Testing with Sanjay Tamrakar

As analysts, we love to optimize everything that we possibly can — so when we have a speaker that gives us a new way to think about testing we are here for it!

For our October event we had Sanjay Tamrakar talk to us about doing geo-testing. Sanjay covered basic methods like traditional pilot testing, to difference in differences, all the way to the current state-of-the-art with Geolift.

Back in the pre-digital era, Columbus was considered to be a top location in the US to pilot test new products, since its demographics closely matched the country as a whole. This allowed companies to try out new products, but only gave marketers an idea of how well a new product might do nationwide, not how much incremental lift different product variations might give or how different Columbus might perform vs. Chicago or Charleston. This kind of granularity requires more powerful tools.

These days there are much more expedient and statistically rigorous methods to test things, like Geolift. Geolift is a an open-source package from Meta that allows the creation of artificial control groups which we can use to test against treatment groups without having to worry as much about building control groups using user-specific information and the privacy issues which that can bring. Geolift’s synthetic control methods can create control groups by amalgamating different untreated areas whose performance was expected to match the treated areas.

There was even some R code showing up on the big screen, which sadly Tim Wilson missed!

Sanjay was also kind enough to provide us with his slides:

 

September 2024 Recap – Data and analytics Interns? At my company? I’d never really thought about it!

We mixed things up a bit for our September event, both with location and format. Given the topic, we hosted the event at Denison Edge, which is a really cool venue!

The topic was inspired by the experiences of a rising senior at Kenyon College that, despite excellent qualifications and impeccable due diligence, barely managed to land an analytics internship in the summer of 2024. Some relevant details of that internship:

  • The company that hired him was a small agency that had not really thought about having an intern
  • Through a string of improbable but fortunate events, they hired him for the summer
  • The student had a great experience, and the agency found that he added real value to their work
  • Things went so well that the company kept him on for ~10 hours/week once he returned to school in the fall

That’s the happiest of endings, sure, but the CbusDAW organizers were struck that it was almost certain that this specific tale represented countless simlar stories that never came to pass. And that’s a miss.

Consider:

  • Companies of all sizes (including small ones!) have data at their disposal that is underutilized due to a lack of resources
  • College students today—across all types of programs—are developing valuable skills in programming, statistics, and analytics in the classroom
  • Academic programs recognize the importance of their students getting hands-on, real-world experience, and there are any number of resources in place to support getting them that experience

We brought together four panelists from central Ohio-based higher education to have a discussion about getting all of those factors to work together to create more win-win situations. The panelists:


Matt Miller
Denison University

Nimet Alpay
Franklin University

Tom Metzger
The Ohio State University

Kristen Astorian
Ohio Wesleyan University

While the initial idea for the panel was “internships,” the panelists made it clear that internships are simply one way for students to get real-world experience while delivering value to organizations. Many data and analytics programs—both undergraduate and graduate level—require a capstone project that works with an organization and their data to deliver value (and capstone projects have the benefit of having instructor oversight and coaching).

Some keys to making an internship successful:

  • The project should be meaningful—using interns to work on projects that are menial doesn’t benefit the intern or the organization that hired them
  • The project should be manageable—dropping an intern into a monstrously complex data environment with overly ambitious ideas for what they will be able to deliver in a finite period of time is setting them up for failure
  • The intern should have a primary point of contact for oversight—this should be someone who actually wants to take on the work. They’re playing the role of a guide, mentor, and manager all at once.
  • Consider pairing the intern with someone deeply knowledgeable of the data itself—it can take months to ramp up on the intricacies of many organizations data environments. While students do need to get exposure to the messiness of real-world data and the often-daunting level of effort to “clean it up” as part of a project, it can be useful to have someone who knows the ins and outs of the various tables assist them in getting queries written.

There are also a surprising number of programs (if only the moderator of the panel was not also the author of this post—something of a hindrance to note-taking!) that provide support to companies who are open to taking on interns (or to working with students on capstone or other projects):

  • The career centers at most universities have staff who are deeply familiar both with there students and what it takes to scope work and provide support in order to make student work productive and impactful
  • Through various programs (a range of funding sources), companies can actually have interns’ pay subsidized (partly or fully)! The career centers at any school can point interested companies to resources for that.

It was very clear that, once an organization tries out tapping into student talent, they consistently extend and expand their programs over time. Have you given that a thought? Reach out to one or more of the people above to find out more!

August 2024 Recap – Being Intentional, Privacy by Design, and More with Matt Gershoff

For this month’s event, Matt Gershoff, CEO of Conductrics, traveled from the land of triple-digit heat on the regular (Austin) to the land of pleasant-temps-even-in-August (Columbus) to share some thoughts and examples about being intentional when it comes to data collection. If you attended the meetup, then you’re here because we promised that we’d post some useful supplemental information, and we’re going to flip the script of a normal recap post by putting those up front:

  • [20-Minute Video] Matt’s talk at PEPR ’24—a Venn diagram with the talk he gave at DAW has something like a 63% overlap, although his DAW talk is a larger circle, as there was additional material! But, since we don’t record DAW talks, the PEPR talk is a good one to share with a colleague who is kicking themselves for not attending DAW.
  • [<2-Minute Video] Matt talking about intentionality—not even remotely by design, this was a piece of an interview that another one of the DAW sponsors, Piwik PRO, did with Matt. Useful and thoughtful stuff.
  • [5-Page PDF] Privacy by Design: The 7 Foundational Principles—a very worthwhile read; Matt focused primarily on Principle #2, but you’ll never believe what Principle #4 and #7 are! (Seriously, if you give it a real read, it will blow your mind a little bit; it’s almost three decades old and was an underpinning of GDPR!)
  • Matt will also be on an upcoming (as of this writing) episode of the Analytics Power Hour podcast, so, if “audio only” is your or a colleague’s jam, smash the subscribe button there.

Matt’s presentation—with annotations added to make it an upgrade from “just the slides”—is included at the end of this post, but a few of the highlights from his presentation were:

  • “Just Enough” vs. “Just in Case” Data Collection—Matt made a stronnnnnng case that the industry bias is for the latter, while “privacy by defaultdemands the former. “Just Enough” data means aligning to a specific and explicit task or objective and then collecting as little data as needed to complete the task. “Just in Case” is a “maximize optionality” play—hoovering up as much data as possible at as granular a level as possible so that there are as many possible “options” for doing “stuff” with it in the future. We are so wired to the latter that it’s uncomfortable to recognize why that Is. Not. Good.
  • This doesn’t mean there are no cases where high granularity / high cardinality data is warranted—throughout the talk, Matt was clear that he was not speaking in any absolutes (unless we count as an absolute that, “all data collection should be performed with intentionality”).
  • Many types of A/B tests, both univariate and multivariate, can be statistically evaluated without recording data at the individual user level—if you’re like the author of this recap, you’ve always operated under the assumption that A/B tests require capturing each user’s session, including which variant they were assigned to, maybe some other meta data about them (what level of a loyalty program they belong to, for instance, for “deeper dive analysis”), whether or not they converted, and maybe the amount of their purchase. 10,000 visitors in the test? That’s 10,000 rows of data! What Matt demonstrated was, um, no. That’s incorrect thinking. By using equivalence classes, some understanding of the maths underlying the statistical tests of interest (t-test, OLS regression, and more), it’s possible to simply capture/increment aggregated counts (visitor count, sum of sales, sum of the squares of sales) and perform the exact same statistical tests in a way that is: computationally less intensive, data storage much less intensive, and aligned with privacy by design principle #2: privacy by default (and privacy by design principles #3 and #4 and #7). Matt outlined a lot of this in this blog post (although he has since extended his research and thinking on the subject… and it continues to hold up!)
  • There are different techniques and concepts that are good to be familiar with when embracing privacy by design—K-anonymity, differential privacy, global vs. local privacy, and more! The key with all of them is that they’re best employed when approaching them as privacy by design rather than privacy-tacked-on-later-to-maintain-regulatory-compliance.

A lot of ground was covered with pretty lively audience engagement and more than a few laughs!

The annotated slides:

And, as always, a few pictures to capture the atmosphere:

 

July 2024 Recap – Solo Data Science with Lauren Burke-McCarthy

Fresh from another successful DataConnect Conference, Lauren Burke-McCarthy led our July session of Data & Analytics Wednesday talking about how to survive and succeed as a solo practitioner of data science.

Being a “solo practitioner” could mean being the only data scientist on your team, being siloed in some way, or even being a freelance contractor. The strategies that Lauren presented were focused on how to best communicate and set expectations with stakeholders. We’ve all been there when a project has gone off the rails because what a practitioner implemented didn’t match at all what a stakeholder had envisioned. Let’s nip these misalignments in the bud as best we can before they can blossom into fully grown issues.

In fact it turns out many (perhaps most!) of these techniques could work for us in any data-related role we were in. What after all even is a data scientist? Lauren also took a crack at answering that age-old question off the top of her head. To paraphrase her answer, a Data Scientist focuses on models and experiments to make future-looking prediction — vs a Data Analyst works on analysis of current and historical data to identify trends and develop insights. If those two things seem to blur into each other at times, that just shows how Lauren’s advice on processes and communication works for both! Perhaps even those of us who have now added “AI” in our job titles? Could well be…

Looking to learn more about these techniques? Lauren was kind enough to provide us with our slides so you can take a look for yourself:

And, of course, pictures!

Please join us next month when the ever-delightful Matt Gershoff will be in town to discuss how to think purposely about data as we move towards privacy by design.

 

June 2024 Recap – Under the Hood of A/B Testing

Our June 2024 meetup featured Dr. Maria Copot from OSU delving into some of the underlying theories behind our favorite A/B testing platforms. Though before we get into the fun math part (yes, it’s fun, don’t look at me like that) — we need to all remember that there needs to be a question behind your experiment. If you don’t have a hypothesis you’re trying to validate, then what’s the point of testing something? Once you’ve got something you want to test, then you can test it, but testing just for the sake of saying how many A/B tests your department ran last year isn’t going to get you where you want to be.

A lot of us have been asked, “is this result statistically significant?” And maybe we’ve even said, “well, the P-value is <0.05 so it’s significant”… But what exactly is a P-value and why is 0.05 the number a big deal? Dr. Copot explained the basics of P-values, including that 0.05 is an arbitrary benchmark, and that it can’t tell you anything about the size of an effect, its validity, or reason behind it. If that still sounds a bit confusing, it’s time to queue the memes about scientists being unable to explain P-values in an intuitive way. We think Dr. Copot’s explanation would be in the top quantile of that distribution at any rate. Even if math is fun, it isn’t always intuitive.

Dr. Copot also talked about sample sizes and power analysis (one such online calculator I’ve used many times here: https://www.evanmiller.org/ab-testing/sample-size.html), but then moved on to talking about Bayesian methods. Traditional A/B tools (like Google Optimize, RIP) have typically used Frequentists methods like we’ve been talking about with P-values. Newer tools have folded in some Bayesian methods, which thankfully are a little more intuitive, if perhaps more mathematically & computationally expensive.

Finally, we talked about how privacy regulations, sampling, and cookie limitations can make doing these kinds of experiments more difficult. One way around these limitations is to use paid platforms like Prolific where you can make your own sample group and run a group of fully consented users through an experiment of your choosing.


Please join us next month when Lauren Burke-McCarthy will talk about how to succeed as a solo data scientist.

 

May 2024 Recap – Getting Real with AI

At our May 2024 event, Nick Woo from AlignAI shared a thoughtful and pragmatic perspective about how to approach figuring out what use cases are (and are not!) appropriate for AI. The turnout for the meetup was strong, and the discussion was lively!

Nick started off with a handy definition of machine learning:

“Machine Learning is an approach to learn complex patterns from existing data to make predictions on new data.”

Oh. Sure. Seems simple enough, right? But that doesn’t include generative AI, does it? As a matter of fact, it does:

  • The existing data is what was used to train the model
  • The new data is the prompt that is provided to the model (!)
  • The response to the prompt is really a prediction when the model processes that new data (!!!)

Nick also outlined the anatomy of an AI use case:

  1. Business Problem
  2. Data
  3. Training
  4. Model
  5. Accuracy Metrics
  6. UX/UI

Which step is the most common stumbling block for organizations’ proposed use cases? The “Data” one—there needs to be sufficiently scaled, cleansed, and complete data to actually develop a model that is useful. Oh, and then that model will likely need to be refreshed and refined with new data over time.

The most neglected step in the planning of an AI project? The last step: actually thinking through what the user experience should ultimately be when the model is put into production!

Nick was quick to point out that it is easy to treat AI as a hammer and then seeing all the world as a nail. If there is a simpler, cheaper, equally effective way to address a particular business problem, then addressing it with AI probably doesn’t make sense! He also acknowledged (as did several audience members) that we’re currently at a point where there are executives who truly do just want to be able to say, “We use AI,” which means some projects can be a bit misguided. This phase shall pass, we assume!

Another discussion that cropped up was measuring the ROI of an AI use case. Nick noted that this can be shaky ground:

  • AI technology platforms pushing to measure impact simply based on the adoption of the technology (rather than quantifying actual business impact)
  • Minimal use of techniques like controlled experimentation to quantify the impact (there is simply too much excitement currently to create interest in withholding the magic from a control group in a disciplined way)
  • The ROI of an AI project can be thought of as “the ROI of an OPEX project”—organizations that are disciplined about measuring the impact of non-AI OPEX projects should be pretty good about quantifying the impact of their investments; it’s just another tool in their toolkit, so the measurement mindset can be the same

And… there was more, including an example scoring matrix for prioritizing use cases across multiple criteria!

A recap post and the slides really can’t do the evening justice, but it’s better than nothing. The recap was above. The slides are right here:

And some pics from the evening: