Does X cause Y? (part four of many)
To stick with the stereotype of economists being dismal, I’d like to keep discussing ways in which experimental evidence can be misleading. Again, use these powers for good, not to criticize findings of studies you don’t like.
Yet another potential pitfall of experimental studies (related to the issue of generalizability) is did the treatment “dose” correspond to real-world doses you’re drawing a conclusion about? For example, if you randomly fed some mice 10 times their body weight in tannins and observed that they lived longer, you shouldn’t use those results to conclude that consuming red wine will extend human lifespan. And it’s not just because wine also has alcohol in it. It’s because the relationship between an input and an output (in this case, tannins and longevity) can be highly non-linear and non-monotonic. For example, humans will die if deprived of oxygen. But they will also die if given 100% oxygen to breathe. Similarly, you need iron to survive. But you can also overdose on iron. Thus, it’s important that an experiment uses a reasonable treatment “dose” and does not extrapolate to higher or lower doses. This can apply to program evaluations as well: if you randomly put some kids into an intensive tutoring program, you cannot use the results to say something about a once-a-month tutoring program and vice versa.
Another issue that applies to all research but can be particularly acute for experimental studies is spurious results. Some experiments are run with relatively few subjects, be they humans or mice. Within the experiment, this shouldn’t cause a problem if you’re only looking at one outcome because any test statistics you create will reflect the sample size. But because small-scale experiments are fairly easy to run, they can create spurious results on aggregate. Imagine that a researcher runs ten or twenty such experiments each year. Purely by statistical chance, some of them may show a non-zero treatment effect even if the real effect is always zero. The more experiments are run, the higher the chances of that happening. This applies not just to one researcher running multiple experiments, but to multiple researchers each running one experiment. Note that the researchers themselves also cannot know whether the findings are spurious or not, especially if we’re talking about multiple researchers each running one experiment. The possibility of spurious findings also applies to non-experimental studies. The good news is that spurious findings are very unlikely to be replicable even once, much less two or three times. Thus, it might be wise to be skeptical of findings (experimental or otherwise) that have not been replicated by other researchers, especially if such findings contradict several prior studies.
That concludes the issues that are more or less experiment-specific. In the next blog post in this series, I’ll discuss common statistical pitfalls that can affect experimental, quasi-experimental, and observational studies alike.
Comments are closed.