NHST falsifies the wrong thing

Jun 26, 2020

(All credit to Richard McElreath and his book Statistical Rethinking)

Falsifiability is what distinguishes science from non-science. Scientific hypotheses are capable of being proven wrong and can therefore be subjected to the trial of experiment. Non-falsifiable statements can either not be proven at all or can only be proven right, and verifying that something is true is usually intractable (see below). In this framework, one conjectures a hypothesis, and devises experiments that would be capable of proving the hypothesis wrong.

One can never completely prove a hypothesis true, but one can gain confidence that a hypothesis is true through repeated, failed attempts to prove it wrong. This is in contrast to verification. To see how verification is doomed as a knowledge-producing framework, consider that to prove X is true for all Y, you have to observe every Y. To falsify the same statement, you just have to observe one Y for which X is not true.

Null hypothesis significance testing is the dominant framework in statistical methods. P values, asterisk symbols, "was it significant?", are all symptoms of this framework. P values may be on their way out because they are brittle, noisy, and almost always misinterpreted (it is not the probability that the null hypothesis is true, but that's what everyone thinks it is).

Bayesian estimation methods are better, but in practice often replicate the NHST framework by plotting the credible interval and highlighting the 0-line to show how much mass the posterior places on the null hypothesis

This is falsifying the wrong thing. It is falsifying the null hypothesis, but that's not how Popper's framework works. In Popper’s framework, we should be attempting to falsify the hypothesis we’re interested in.

Where this becomes a real problem is when the null hypothesis is not exactly mutually exclusive with the true hypothesis; let's say our true hypothesis is that some factor accounts for 30% or more of some observation. The null (at least how it’s typically set up) is that there is no relationship. If we "reject the null", we falsify the hypothesis that there is no relationship, but where does that leave us with respect to the true hypothesis?

Applied Inference

Discussion about this post