Experimental Methods + Statistics in Online Attribution with R

September 12, 2018

Experimental Methods + Statistics in Online Attribution with R

Agenda

Who am I?
Different Approaches to Attribution
- Rule-Based: Canned Approaches
- Ad-hoc Pre-Post Testing
- A/B Testing & Experimental Methods
- Markov Chain Models in R

Materials

https://github.com/ksasso/onlineattrib_inR_talk

Who am I ?

Different Approaches to Attribution

Rule-Based: Canned Approaches

Some Examples:

Last Interaction
First Interaction
Last Non-Direct Click
Linear Attribution
Time Decay
Position Based

Google Analytics

Ad-hoc Pre-Post Testing

An example:

We track the same customers' sales over a two 2-week period in which our Facebook ad was and was not live
- We then compare sales or conversion between these two time periods for the same customer (i.e., repeated measures)
We can also treat ad_exposure as a predictor along with some other likely important factors (i.e., purchase hx, buying segment, days of week)
- Use more advanced modeling techniques to predict conversion or revenue

Statistical Approaches:

Similar to what we will see for A/B test in basic form, but more limited inferences (not causal)
Utilize more complicated machine learning models (i.e., glm, random forest, or xgBoost model) more on this soon !

Some Modeling Options

A/B Testing & Experimental Methods

The 3rd Variable Problem:

Cities with more churches have higher crime rates. But what about population?

Central Limit Theorem & Random Assignment

Randomly assign customers to groups based on chance procedure
Minimizes differences between groups such that differences can be attributed to experiment
Central Limit Theorem: When random variables (independent) are added to a set their distribution tends toward a normal distribution (informally a “bell curve”) even if the original variables themselves are not normally distributed
- Helps justify the approximation of large-sample statistics to the normal distribution in controlled experiments

A/B Testing & Experimental Methods

Example A/B test diagram

Statistical Approaches:

Statistical methods same as ad-hoc pre-post, but now campaign exposure is randomly assigned!
Randomization in R

library(dplyr)
head(mtcars)
mtcars <- mtcars %>% 
  mutate(row_id  = row_number())
test <- mtcars %>% 
  sample_frac(.5)
control <- mtcars %>% 
  filter(!row_id %in% test$row_id)

A/B Testing & Experimental Methods

Incorporating additional Predictors: Linear Method

counts <- c(18,17,15,20,10,20,25,13,12)
outcome <- gl(3,1,9)
treatment <- gl(3,3)
print(d.AD <- data.frame(treatment, outcome, counts))
glm.D93 <- glm(counts ~ outcome + treatment, family = poisson())
summary(glm.D93)

Incorporating additional Predictors: xgBoost

Markov Chain Models in R

ChannelAttribution Package in R

Allows us to:

Account for paths that don't result in conversion
Switch from rule-based to probabilistic models
Represent every customer journey as a chain in a directed Markov graph where each vertex is a possible state (channel/touchpoint) and the edges represent the probability of transition between the states (including conversion.)
- "Memory free" probability of reaching one state depends only on the previous state visited
Take advantage of removal effect
- Remove each channel from the path graph consecutively and measure how many conversions (or how much value) could be made (earned) without the one.
- If we obtain N conversions without a certain channel/touchpoint compared to total conversions T of the complete model, that means the channel reflects the change in total conversions (or value)
- Allows you to say "If we had removed channel 1, what would the probability of conversion be?"

Markov Chain Models in R: Implementation

Thanks!

Don't forget about the Data Cleaning! Yikes