Building off the post I wrote about the “three revolutions” in data science happening right now (deep learning, bayesian inference, and causal inference), I thought it might be fun to write down where my work intersects with these “revolutions”. In the original drafts of this, I called this a “quick post”, but pretty soon it became anything but quick. So instead of one post, I split it into three. The first one was two weeks ago, the second one was last week, and the final one is below.
If I spend an additional dollar on Facebook, what will happen?
Causal inference is particularly important to Recast, a company built around a state-of-the-art media mix model (MMM). The model learns the relationship between marketing inputs (e.g. spend on Facebook and TV, direct mail drops, product launches, etc) and outputs like new customer acquisition or revenue.
MMMs are a hornet’s nest of modeling difficulties, only one of which is that the data is deeply confounded. Confounding means that both your inputs and your outputs are jointly caused by some third category of variables. Seasonality is an easy example: around Christmas, both advertising spend and sales may go up because of the holiday shopping period. Even though both spend and sales are going up at the same time, this doesn’t mean that spend is causing the sales; but if you simply regressed sales on spend, you would get a very strong positive relationship. Seasonality is easy to spot and address, but most other confounders are not.
Unless you control for these confounders, your estimates will be biased. This bias can be completely hidden to you; usual procedures for assessing model fit, like how well the model predicts held out data, will lie to you and tell you your model is great. But as soon as the client starts to make changes in their marketing spend based on your model, your model will no longer work!
Why would this happen? We need an example to go through it. Let’s say your company sells widgets. Widgets are a very competitive market, and recently, one of your competitors started bidding up the same keywords you were bidding on, driving up the price of your online advertising. This caused your advertising spend to increase. Simultaneously, the increased competitor activity has caused your sales to fall. What does your model model see? Increased ad spend and decreased sales. The natural conclusion: ad spend does not cause increased sales (or may decrease them).
As competitor activity waxes and wanes, this relationship will hold. When competitors pull back, you’ll spend less to reach the same customers, and you will sell more at the same time. When you evaluate the model you built to see if this relationship is real, you hold out the last few months of data to see how well your model predicts it. Since the held out data contains the same relationship between competitor activity, advertising, and sales as the in-sample data, your model fits very well both in- and out-of-sample.
So based on this modeling, you think that your advertising spend has a zero or possibly negative effect on sales, and make the decision to start pulling back on spend. What happens? Your sales fall! The model no longer captures the relationship!
Do-calculus aficionados will recognize this as E[y | x] ≠ E[y | do(x)], which is a mathy way of saying that the relationship you happen to have observed between two variables is not the relationship you will observe when you actually manipulate one of them — “do(x)” — or, as the saying goes, “correlation does not imply causation”. Umbrellas don’t cause rain, even though you pretty much only see them around when it’s raining.
Building Recast has meant using the latest and greatest research in causal inference to find ways to deconfound the model, so that we are estimating E[y | do(x)], not E[y | x]. Obtaining plausibly causal estimates is not the only challenge related to building this model1, but it is a substantial one. And it is one of the several things that were not possible even five years ago, that are possible now, and that make Recast possible today.
keywords: bias/variance tradeoff, saturated treedepth, setting priors that actually match our priors, non-iid data, and more)