Archive | DAW Recaps

September 2024 Recap – Data and analytics Interns? At my company? I’d never really thought about it!

We mixed things up a bit for our September event, both with location and format. Given the topic, we hosted the event at Denison Edge, which is a really cool venue!

The topic was inspired by the experiences of a rising senior at Kenyon College that, despite excellent qualifications and impeccable due diligence, barely managed to land an analytics internship in the summer of 2024. Some relevant details of that internship:

  • The company that hired him was a small agency that had not really thought about having an intern
  • Through a string of improbable but fortunate events, they hired him for the summer
  • The student had a great experience, and the agency found that he added real value to their work
  • Things went so well that the company kept him on for ~10 hours/week once he returned to school in the fall

That’s the happiest of endings, sure, but the CbusDAW organizers were struck that it was almost certain that this specific tale represented countless simlar stories that never came to pass. And that’s a miss.

Consider:

  • Companies of all sizes (including small ones!) have data at their disposal that is underutilized due to a lack of resources
  • College students today—across all types of programs—are developing valuable skills in programming, statistics, and analytics in the classroom
  • Academic programs recognize the importance of their students getting hands-on, real-world experience, and there are any number of resources in place to support getting them that experience

We brought together four panelists from central Ohio-based higher education to have a discussion about getting all of those factors to work together to create more win-win situations. The panelists:


Matt Miller
Denison University

Nimet Alpay
Franklin University

Tom Metzger
The Ohio State University

Kristen Astorian
Ohio Wesleyan University

While the initial idea for the panel was “internships,” the panelists made it clear that internships are simply one way for students to get real-world experience while delivering value to organizations. Many data and analytics programs—both undergraduate and graduate level—require a capstone project that works with an organization and their data to deliver value (and capstone projects have the benefit of having instructor oversight and coaching).

Some keys to making an internship successful:

  • The project should be meaningful—using interns to work on projects that are menial doesn’t benefit the intern or the organization that hired them
  • The project should be manageable—dropping an intern into a monstrously complex data environment with overly ambitious ideas for what they will be able to deliver in a finite period of time is setting them up for failure
  • The intern should have a primary point of contact for oversight—this should be someone who actually wants to take on the work. They’re playing the role of a guide, mentor, and manager all at once.
  • Consider pairing the intern with someone deeply knowledgeable of the data itself—it can take months to ramp up on the intricacies of many organizations data environments. While students do need to get exposure to the messiness of real-world data and the often-daunting level of effort to “clean it up” as part of a project, it can be useful to have someone who knows the ins and outs of the various tables assist them in getting queries written.

There are also a surprising number of programs (if only the moderator of the panel was not also the author of this post—something of a hindrance to note-taking!) that provide support to companies who are open to taking on interns (or to working with students on capstone or other projects):

  • The career centers at most universities have staff who are deeply familiar both with there students and what it takes to scope work and provide support in order to make student work productive and impactful
  • Through various programs (a range of funding sources), companies can actually have interns’ pay subsidized (partly or fully)! The career centers at any school can point interested companies to resources for that.

It was very clear that, once an organization tries out tapping into student talent, they consistently extend and expand their programs over time. Have you given that a thought? Reach out to one or more of the people above to find out more!

August 2024 Recap – Being Intentional, Privacy by Design, and More with Matt Gershoff

For this month’s event, Matt Gershoff, CEO of Conductrics, traveled from the land of triple-digit heat on the regular (Austin) to the land of pleasant-temps-even-in-August (Columbus) to share some thoughts and examples about being intentional when it comes to data collection. If you attended the meetup, then you’re here because we promised that we’d post some useful supplemental information, and we’re going to flip the script of a normal recap post by putting those up front:

  • [20-Minute Video] Matt’s talk at PEPR ’24—a Venn diagram with the talk he gave at DAW has something like a 63% overlap, although his DAW talk is a larger circle, as there was additional material! But, since we don’t record DAW talks, the PEPR talk is a good one to share with a colleague who is kicking themselves for not attending DAW.
  • [<2-Minute Video] Matt talking about intentionality—not even remotely by design, this was a piece of an interview that another one of the DAW sponsors, Piwik PRO, did with Matt. Useful and thoughtful stuff.
  • [5-Page PDF] Privacy by Design: The 7 Foundational Principles—a very worthwhile read; Matt focused primarily on Principle #2, but you’ll never believe what Principle #4 and #7 are! (Seriously, if you give it a real read, it will blow your mind a little bit; it’s almost three decades old and was an underpinning of GDPR!)
  • Matt will also be on an upcoming (as of this writing) episode of the Analytics Power Hour podcast, so, if “audio only” is your or a colleague’s jam, smash the subscribe button there.

Matt’s presentation—with annotations added to make it an upgrade from “just the slides”—is included at the end of this post, but a few of the highlights from his presentation were:

  • “Just Enough” vs. “Just in Case” Data Collection—Matt made a stronnnnnng case that the industry bias is for the latter, while “privacy by defaultdemands the former. “Just Enough” data means aligning to a specific and explicit task or objective and then collecting as little data as needed to complete the task. “Just in Case” is a “maximize optionality” play—hoovering up as much data as possible at as granular a level as possible so that there are as many possible “options” for doing “stuff” with it in the future. We are so wired to the latter that it’s uncomfortable to recognize why that Is. Not. Good.
  • This doesn’t mean there are no cases where high granularity / high cardinality data is warranted—throughout the talk, Matt was clear that he was not speaking in any absolutes (unless we count as an absolute that, “all data collection should be performed with intentionality”).
  • Many types of A/B tests, both univariate and multivariate, can be statistically evaluated without recording data at the individual user level—if you’re like the author of this recap, you’ve always operated under the assumption that A/B tests require capturing each user’s session, including which variant they were assigned to, maybe some other meta data about them (what level of a loyalty program they belong to, for instance, for “deeper dive analysis”), whether or not they converted, and maybe the amount of their purchase. 10,000 visitors in the test? That’s 10,000 rows of data! What Matt demonstrated was, um, no. That’s incorrect thinking. By using equivalence classes, some understanding of the maths underlying the statistical tests of interest (t-test, OLS regression, and more), it’s possible to simply capture/increment aggregated counts (visitor count, sum of sales, sum of the squares of sales) and perform the exact same statistical tests in a way that is: computationally less intensive, data storage much less intensive, and aligned with privacy by design principle #2: privacy by default (and privacy by design principles #3 and #4 and #7). Matt outlined a lot of this in this blog post (although he has since extended his research and thinking on the subject… and it continues to hold up!)
  • There are different techniques and concepts that are good to be familiar with when embracing privacy by design—K-anonymity, differential privacy, global vs. local privacy, and more! The key with all of them is that they’re best employed when approaching them as privacy by design rather than privacy-tacked-on-later-to-maintain-regulatory-compliance.

A lot of ground was covered with pretty lively audience engagement and more than a few laughs!

The annotated slides:

And, as always, a few pictures to capture the atmosphere:

 

July 2024 Recap – Solo Data Science with Lauren Burke-McCarthy

Fresh from another successful DataConnect Conference, Lauren Burke-McCarthy led our July session of Data & Analytics Wednesday talking about how to survive and succeed as a solo practitioner of data science.

Being a “solo practitioner” could mean being the only data scientist on your team, being siloed in some way, or even being a freelance contractor. The strategies that Lauren presented were focused on how to best communicate and set expectations with stakeholders. We’ve all been there when a project has gone off the rails because what a practitioner implemented didn’t match at all what a stakeholder had envisioned. Let’s nip these misalignments in the bud as best we can before they can blossom into fully grown issues.

In fact it turns out many (perhaps most!) of these techniques could work for us in any data-related role we were in. What after all even is a data scientist? Lauren also took a crack at answering that age-old question off the top of her head. To paraphrase her answer, a Data Scientist focuses on models and experiments to make future-looking prediction — vs a Data Analyst works on analysis of current and historical data to identify trends and develop insights. If those two things seem to blur into each other at times, that just shows how Lauren’s advice on processes and communication works for both! Perhaps even those of us who have now added “AI” in our job titles? Could well be…

Looking to learn more about these techniques? Lauren was kind enough to provide us with our slides so you can take a look for yourself:

And, of course, pictures!

Please join us next month when the ever-delightful Matt Gershoff will be in town to discuss how to think purposely about data as we move towards privacy by design.

 

June 2024 Recap – Under the Hood of A/B Testing

Our June 2024 meetup featured Dr. Maria Copot from OSU delving into some of the underlying theories behind our favorite A/B testing platforms. Though before we get into the fun math part (yes, it’s fun, don’t look at me like that) — we need to all remember that there needs to be a question behind your experiment. If you don’t have a hypothesis you’re trying to validate, then what’s the point of testing something? Once you’ve got something you want to test, then you can test it, but testing just for the sake of saying how many A/B tests your department ran last year isn’t going to get you where you want to be.

A lot of us have been asked, “is this result statistically significant?” And maybe we’ve even said, “well, the P-value is <0.05 so it’s significant”… But what exactly is a P-value and why is 0.05 the number a big deal? Dr. Copot explained the basics of P-values, including that 0.05 is an arbitrary benchmark, and that it can’t tell you anything about the size of an effect, its validity, or reason behind it. If that still sounds a bit confusing, it’s time to queue the memes about scientists being unable to explain P-values in an intuitive way. We think Dr. Copot’s explanation would be in the top quantile of that distribution at any rate. Even if math is fun, it isn’t always intuitive.

Dr. Copot also talked about sample sizes and power analysis (one such online calculator I’ve used many times here: https://www.evanmiller.org/ab-testing/sample-size.html), but then moved on to talking about Bayesian methods. Traditional A/B tools (like Google Optimize, RIP) have typically used Frequentists methods like we’ve been talking about with P-values. Newer tools have folded in some Bayesian methods, which thankfully are a little more intuitive, if perhaps more mathematically & computationally expensive.

Finally, we talked about how privacy regulations, sampling, and cookie limitations can make doing these kinds of experiments more difficult. One way around these limitations is to use paid platforms like Prolific where you can make your own sample group and run a group of fully consented users through an experiment of your choosing.


Please join us next month when Lauren Burke-McCarthy will talk about how to succeed as a solo data scientist.

 

May 2024 Recap – Getting Real with AI

At our May 2024 event, Nick Woo from AlignAI shared a thoughtful and pragmatic perspective about how to approach figuring out what use cases are (and are not!) appropriate for AI. The turnout for the meetup was strong, and the discussion was lively!

Nick started off with a handy definition of machine learning:

“Machine Learning is an approach to learn complex patterns from existing data to make predictions on new data.”

Oh. Sure. Seems simple enough, right? But that doesn’t include generative AI, does it? As a matter of fact, it does:

  • The existing data is what was used to train the model
  • The new data is the prompt that is provided to the model (!)
  • The response to the prompt is really a prediction when the model processes that new data (!!!)

Nick also outlined the anatomy of an AI use case:

  1. Business Problem
  2. Data
  3. Training
  4. Model
  5. Accuracy Metrics
  6. UX/UI

Which step is the most common stumbling block for organizations’ proposed use cases? The “Data” one—there needs to be sufficiently scaled, cleansed, and complete data to actually develop a model that is useful. Oh, and then that model will likely need to be refreshed and refined with new data over time.

The most neglected step in the planning of an AI project? The last step: actually thinking through what the user experience should ultimately be when the model is put into production!

Nick was quick to point out that it is easy to treat AI as a hammer and then seeing all the world as a nail. If there is a simpler, cheaper, equally effective way to address a particular business problem, then addressing it with AI probably doesn’t make sense! He also acknowledged (as did several audience members) that we’re currently at a point where there are executives who truly do just want to be able to say, “We use AI,” which means some projects can be a bit misguided. This phase shall pass, we assume!

Another discussion that cropped up was measuring the ROI of an AI use case. Nick noted that this can be shaky ground:

  • AI technology platforms pushing to measure impact simply based on the adoption of the technology (rather than quantifying actual business impact)
  • Minimal use of techniques like controlled experimentation to quantify the impact (there is simply too much excitement currently to create interest in withholding the magic from a control group in a disciplined way)
  • The ROI of an AI project can be thought of as “the ROI of an OPEX project”—organizations that are disciplined about measuring the impact of non-AI OPEX projects should be pretty good about quantifying the impact of their investments; it’s just another tool in their toolkit, so the measurement mindset can be the same

And… there was more, including an example scoring matrix for prioritizing use cases across multiple criteria!

A recap post and the slides really can’t do the evening justice, but it’s better than nothing. The recap was above. The slides are right here:

And some pics from the evening:

April 2024 Recap – Data Science & AI Trends: an Audience-Guided Discussion

We tried something a little different in this month’s DAW. We actually tried two things that were a little different in this event.

What we intended to be different was that we were going to have a panel of experts who would field a bunch of questions from the audience, capture them on a whiteboard, and then talk through them. Ultimately, we did that—not exactly as it had been drawn up (so to speak), but it worked out.

The unintended difference in the event was to see how many things could go wrong and still have us pull off a successful and engaging meetup. Speculation was that the questions and answers were going to be so good that our robot overlords became concerned and flexed their AI capabilities to undermine the meetup. To wit:

  • On Monday, one of the three intended panelists pulled out of the event. No problem, Brian Sampsel was hastily recruited and graciously accepted the last-minute invitation.
  • On Wednesday morning at 4:00 AM, one of the other panelists went into labor. Did she take the time to email us that she had become unavailable? Yes. Yes she did. Katie Schafer is a machine in her own right (as our other panelist, Pete Gordon, had already noted several days earlier). But, no problem. We could do this with two panelists. What else ya’ got to throw at us, HAL? Well…
  • Weather anyone? The venue and the surrounding area had a tornado watch issued late afternoon, and the venue—Rev1—was squarely inside the tornado watch area. The tornado watch lasted until 7:00 PM (the event started at 6:30 PM). There was rain. There was wind. There was hail for Pete’s sake!

Apparently, though, analytics types take their cues from the USPS. Or have poor judgment. Or some combination? We wound up with a great turnout, with lots of good pre-talk discussion over pizza and beer:

Conveniently, the event is in the interior of the building! #tornadosafety

The discussion itself covered a wide range of topics—skewing heavily towards AI and less to date science (data science is involved in AI, of course, so it was still there):

A Range of Topics to Discuss

There is no deck to share, no recording, and this attendee didn’t take scrupulous notes, so we’ll go with a smattering of the discussion that could be retrieved from his brain the following day:

  • When will AGI (artificial general intelligence) be achieved? Pete’s estimate (which seemed serious) was: 2033. But, he also noted that AlphaGo’s infamous Move 37 (in 2016) was a glimpse into that future.
  • To RAG or not to RAG? Well… that’s a hot topic. It depends.
  • Poisoning of training data? Why, and what are the ramifications? It sounds bad, but it’s got it’s uses—see Nightshade.
  • Should newly minted software engineers be worried about AI making their jobs obsolete? No. Full stop. They’ll have some powerful new tools to employ, but their jobs aren’t going anywhere.
  • What about marketing analysts? Will AI take their jobs? This prompted quite a bit of discussion. Brian made the point that AI can do some pretty impressive exploratory data analysis (EDA), which is definitely useful! One attendee asked if he could see getting to a point where you could tell an AI-based tool what your KPIs were, and it could then just analyze the campaign. The answer was, “Yeah… but a human still needs to set appropriate KPIs!” Even MMM came up—is that AI, or is that just… sophisticated linear regression (statistics). Kinda’ more the latter, but “AI” gets slapped on it for branding purposes and we get excited!

And, of course, lots and lots more! Some pics from the event:

 

March 2024 Recap – the End of Third Party Cookies ☠️

Our March event featured Bill Balderaz from Futurety getting us up-to-date on the impending demise of third party cookies. As most of us have heard, third party cookies in Chrome are scheduled to be turned off later this year. What we’re less sure about, is what the heck is coming in to replace these cookies?

Love them or hate them, third party cookies have been a fundamental building block of the $600B+ digital ads ecosystem for years — and this turndown will be one of the biggest changes in the history of of the industry. While browsers such as Safari and Firefox have blocked third party cookies for years, it’s Chrome’s impending Q3 2024 turndown of these cookies that has caught the industry’s attention due to Chrome’s dominant marketshare (64%).

So what comes next? Surely we’re not going back to completely un-targeted “punch the monkey” style ads… One answer is first party data, where instead of relying on a third party cookies tracking our cross-site activity, we rely upon our own customer data. What does that mean in practical terms?

Bill gave us an example of how first party data can be used to build audiences with Futurety’s own product HUCKLE.

Per usual, the CBUSDAW audience had plenty of good questions. As we move towards this (third-party) cookieless future, we’re all  still collectively figuring out what exactly this means and how to balance user privacy with effective marketing solutions.

And, action shots from the event. The main author of this recap neglected to note that he had some fun with the event by doing a LinkedIn poll as a competition between local (real) cookie purveyors. Lion Cub’s won the poll and, as a result, were provided at the event, along with a “cookie consent” form. The event photographer had both a picture of that consent form and edit access to this recap, so he snuck this paragraph in when he added the photo gallery below!

Please join us next month when we have a panel of AI aficionados on tap for an audience-driven talk on the future of data science and AI.

 

 

February 2024 – Are We Dangerously Obsessed with Data Collection?

This month’s meetup was our first since we were rechristened as “Columbus Data & Analytics Wednesday.” In an unintentional twist, the topic for the event was centered around the speaker’s contention that we (the broad, collective “we”) devote too much of our time and energy to the collection and management of data, and not enough effort to actually putting that data to productive and impactful use within our organizations.

Tim started out by calling out that, if we consider any task that has any relationship to data as “data work,” then we can further categorize each of those tasks into one of two buckets:

  • Data Collection and Management
  • Data Usage

He noted that there is no inherent business value in the collection and management data. There is only the potential for value. To realize that value requires putting the data to meaningful, applicable business use.

All too often, data workers get so caught up in data collection and management tasks, though, that they start to believe that there is inherent business value in those tasks alone. Tim pointed to three reasons for this happening:

  • Technology vendors tend to have business models that are high fixed cost and low variable cost, which means they’re incentivized to drive aggressive customer growth. This results in heavy investments in marketing and sales organizations that wind up distilling down their messaging to, “Buy our technology and you will realize business value.” And they spend a lot of time and money promoting that message.
  • Consultants have the opposite business model—low fixed costs and high variable costs—which means they grow profitably by selling engagements that use repeatable processes that can tap into a scalable workforce. That pulls them to “technology implementation” work over “deeply engage with the businesses of their clients and all of the complexity therein.” So, they wind up promoting a similar message: “Buy (our partners’) technology, let us help you implement it, and you will then realize business value.
  • Human nature within organizations drives us to do tangible “things”—adding new data sources, cleaning up data quality issues, building or augmenting dashboards, etc. This leads us to telling ourselves that these tactical activities, which skew heavily towards data collection and management, bring value to the business in and of themselves.

According to Tim, recognizing and pushing back against this mindset means embracing the messiness and hard work required to actually use data productively. He proposed that organizations need to put the same level of rigor around their data usage processes as they put around their processes for collecting and maintaining data. As an example, he outlined a framework he uses (but was clear that this wasn’t “the only” framework that’s valid for data usage) that pointed to three distinct ways data can be used to provide value:

  • Performance measurement—objectively and quantitatively answering the question: “Where are we today relative to where we expected to be today at some point in the past?” He described using “two magic questions” for this: 1) What are we trying to achieve, and 2) How will we know if we’ve done that?
  • Hypothesis validation—this is all about improving decision making by reducing uncertainty when determining what to do going forward. For this, he described a 3-part fill-in-the-blank technique: “We believe [some idea] because  [some evidence or observation]. If we are right, we will [take some action].”
  • Operational enablement—data when it is actually part of an automated or mostly automated process (for instance, ordering shoes online generates data that is used by the order fulfillment process). He went on to say that every generative AI use case he’s seen put forth falls into operational enablement.

He ended by imploring the crowd to look at the work they and their colleagues do day in and day out through a “data collection & management” vs. “data usage” lens and consider working to shift the balance of their efforts towards the latter!

The slides are available below, as well as at :

And, of course, some pics from the event, which had a large and lively showing!

November 2023 – Google Analytics 4 and BigQuery with Scott Zakrajsek

Over fifteen years ago, Scott Zakrajsek was one of the founding organizers of Columbus Web Analytics Wednesday (he’s the guy in the green shirt right in the middle of this picture taken in May, 2008; three people in this picture, as well as the photographer, were present at our November 2023 event, and none of us could remember the name of the restaurant; the passage of time does wonders to one’s memories):

Several role changes, co-founding a company, spending several years in Boston before returning to central Ohio, getting married and having a couple of kids, and we finally convinced him it was time to re-take the stage at one of our meetups!

The topic: Google Analytics 4 and BigQuery. That’s a Big(Query) topic to cover briefly, but these two platforms are increasingly intertwined, as it has been evident for a while that Google has decided that the road to flexibility and robustness in accessing and analyzing GA4 data is a path that passes directly through BigQuery.

Scott provided a brief recap of how the fundamental data model in GA4 differs from Universal Analytics. He then made the case for why the ease with which that data can be piped into Google BigQuery (he outlined the steps for turning on that integration, including highlighting the key choices to be made when doing that) enables both deeper analysis as well as easier integration of website and mobile app behavioral data with data from other sources.

Once the data is in BigQuery, though, it has to be made accessible, both to analysts and to business users. For the former, that means SQL, and it means going beyond simply SELECT, FROM, WHERE, ORDER BY, and GROUP BY to also be comfortable with UNNEST, subqueries, and CTEs. He demonstrated how generative AI—Bard, as one option (which led to a brief discussion of Duet AI and Copilot as other options)—could be used to get an initial pass at functional SQL, although some tweaking is generally required. That led to a discussion of the difference between SQL-for-exploration-and-one-time-analysis vs. SQL-to-be-productionalized.

To wrap the session, Scott conducted a live demo, including pushing the results of a query into Looker Studio.

The presentation was followed by a great discussion that demonstrated the value of in-person meetups—attendees included both several individuals who are elbows-deep in GA4 with BigQuery as well as a number of BigCurious individuals who were able to tap into the experience of Scott and the attendees to get a much better since of what is involved in bringing the two platforms together.

More details? Check out the slides:

And, hey, the same guy who took that picture at the top of this post with 2008 digital photography tech has upgraded his gear a few times since then, so there are pictures from the event, too:

 

Bonus: in the intro, Bryan brought up the User Journey – Vol. 1 rock opera that long-time WAW co-organizer Jason Packer was instrumental in producing!

October 2023 – Marketing Mix Modeling with Jim Gianoglio

Lots of analysts (the author of this recap included) once believed that multi-touch attribution was the long-awaited answer to John Wanamaker’s famous question about which half of our ad spend was wasted. Except even before we started losing lots of data, there were some serious problems with MTA that we’d been turning a blind eye to. In this new era of cookie restrictions and data privacy regulations, MTA has become even more problematic.

Enter MMM. Jim Gianoglio, founder of Cauzle Analytics, gave us in introduction to marketing mix modeling (MMM) — another way to look at answering this same eternal marketing question about what channels are and aren’t working. MMM has been around for a long time (it was so popular in the ’90s that it was immortalized in multiple songs), but the methods and practices have matured and it can really be useful for a high-level understanding of performance for many organizations.

Using MMM doesn’t mean you have to stop using MTA either! MTA can still give you information on a more granular level, and then MMM can help with overall strategy.

If you’re looking to learn more about MMM, check out Jim’s mailing list, podcast, and Slack at mmmhub.org.

https://www.slideshare.net/JasonPacker/cbusdaw-october-23-marketing-mix-modeling