There’s a widely-held misconception that there are two types of statistical or machine learning models: those for when you care about the parameters (“inference”1) and those for when you care about the predictions.

Here’s the problem with that: it’s all wrong. Inference without a causal model is simply a predictive model. And causal models simply comprise a special class of predictive models.

It’s prediction any way you slice it. This isn’t to knock inference — after all, it’s in this blog’s name. As I’ll try to explain below, statistical inference is understanding *how* predictions are made.

Let’s say you’re building a regression; in addition, let’s say that you’re not building a causal model, meaning that you haven’t gone through the exercise of figuring out what you need to control for in order to interpret the coefficients causally.

You find that such-and-such predictor has a beta of 1.2, and so-and-so a beta of -3.4. What do these numbers mean? As we stipulated, they certainly don’t mean that increases in such-and-such *cause* Y to go up by 1.2; or that increases in so-and-so *cause *Y to decrease by 3.4. There is no “why” in these numbers; only “whens”. As in, *when* such-and-such is higher by 1, Y is typically higher by 1.2, and so on.

That is, the model helps you predict what Y is, if you know what such-and-such or so-and-so are. That’s it. So, inference with a non-causal model is simply prediction. The parameters that you can inspect in these models may tell you *how* the model is making its prediction, but this doesn’t change the fact that the only way you can interpret these numbers is through the lens of prediction.

What if we did the same exercise with a causal model? Remember all those times that I said that a non-causal model’s predictions would fail, once you started to act on the data? E.g., you encouraged the kids to study more, and made wrong predictions about test scores; or you reduced your marketing spend, and sales fell much more than you expected.

Well, just as you can frame causal inference as a missing data problem (potential outcomes) or as a graph theory problem (structural causal models), you can frame it as a prediction problem. This influential paper takes the issue above — “non causal models make inaccurate predictions when you make interventions on the data” — and turns it on its head; they define causal models as those that make accurate predictions across all possible datasets, including those where you make interventions the underlying data.

As they put it in their abstract:

What is the difference of a prediction that is made with a causal model and a non-causal model? Suppose we intervene on the predictor variables or change the whole environment. The predictions from a causal model will in general work as well under interventions as for observational data. In contrast, predictions from a non-causal model can potentially be very wrong if we actively intervene on variables. Here, we propose to exploit this invariance of a prediction under a causal model for causal inference

For a variety of reasons I don’t think this framework has as far to run as potential outcomes or structural causal models, but it’s useful to remember that causal models are those that correctly handle the *qualitative* nature of reality, and therefore should make more accurate predictions across a wider range of scenarios (including when you intervene on the process that generated the data).

Where does this leave us? Whichever model we’re building, it’s clear that the “inference” part (inspecting the internal parameters of the model) is simply an exercise in understanding *how the model made its predictions*. The causal-ness (or lack thereof) of the model outlines the boundaries within which those predictions will be accurate.

Whether or not you can do inference is really a property of the model you built — meaning, does the model have transparent, or at least opaque walls? or is it a black box?

But how black are these boxes, really? and how sturdy are their walls? Can we do inference with models built for prediction? Can we explain *how* a “purely predictive” model made its predictions?

We can, and here’s how: Make a bunch of predictions with different inputs — and see how the predictions change when you change the inputs. Take your input data on students’ study patterns and their test scores, turn “studies a lot” off for everyone, make predictions, turn it back on again, make more predictions, and compare the sets of predictions. Voilá, now you know how much studying matters according to the model.

Something like this, more or less, is what’s happening inside of LIME (Locally Interpretable Model-agnostic Explanations), which was the state-of-the-art on how to do this for complex models, last I checked on this several years ago. It probably still works for whatever you’re doing.

Somewhere, in the mists of my memory, is a paper or talk called something like “Prediction is all there is”, but, alas, I couldn’t find it, so that became the title of this blog post. But it’s not *all* there is — statistical (and ML) models can do much more than predict; mainly, they can simulate, they can create whole new worlds of data that aren’t governed by some existing set of observations. But “prediction is a lot of what there is”, or “mostly, you’re doing prediction, even when you think you’re not” weren’t quite as pithy, so here we are.

Lots of folks in machine learning / AI call “inference” the process by which one trains a model, just one of the many instances of jargon collisions across the fields