|Main » Articles » Science|
|Entries in category: 17
Shown entries: 1-10
|Pages: 1 2 »|
Sort by: Date · Name · Rating · Comments · Views
Now that I’ve written about why randomized controlled experiments are so great, it’s time to talk about some of the common ways in which they can go wrong. But first I’d like to make an important caveat: finding potential flaws with any research, even randomized controlled experiments, is actually pretty easy. I haven’t come across any study that couldn’t be criticized on one or more grounds. So with the power to criticize also comes great responsibility: don’t use it to dismiss results you don’t like. Don’t selectively apply these criticisms to some studies and accept the findings of others that could be subject to similar criticisms. Use the knowledge wisely.
The main concern with randomized controlled experiments is the question of “external validity”. Sure, you’ve shown that something works in the laboratory or in a carefully controlled setting, but does it work in the real world? If people in the laboratory are different from those who will be subject to the treatment in the real world or if people (including those administering the treatment) behave differently in the experiment.
For example, maybe you run a clinical trial for a drug and only recruit men to participate in the trial. Will the drug work as well on women? Will there be different side effects for them? For a long time, clinical trials frequently omitted or under-enrolled women, although that is now changing. Or maybe you enroll obese individuals in a weight-loss trial but only includes ones without other health problems like diabetes. But once the drug goes to market, it may be prescribed to all types of obese individuals, and potentially have different effects than what you observed in the laboratory. Or maybe the nurses working on your trial are really good at getting patients to take the drug on time, but in the real world people forget to take it and you observe much lower effectiveness.
External validity is a potential problem with all experiments, not just clinical trials and not just stylized laboratory experiments. As long as people know they are part of an experiment, they may change how they act (maybe to make the experimenter happy, maybe to hide socially unacceptable views or behaviors, or maybe because they don’t take the experimental treatment as seriously as they do things in the real world). This is known as the Hawthorne effect, and it’s essentially impossible to rule out unless your subjects do not know that they are being studied.
Finally, external validity can also be a concern if you’re trying to say something about high-stakes decisions by running a low-stakes experiment. For example, you’re open to this criticism if you want to say something about how people save for retirement and you either run a hypothetical choice experiment or an experiment with low stakes (because who can afford to run an experiment where tens of thousands of dollars are at stake?). In some cases, the low-stakes findings survive in a high-stakes environment, but in others they don’t.
The bottom lines is that the most convincing experimental conclusions are those that are based on a representative population that faces stakes similar to what they would be in the real world, and where the experiment closely resembles real-world conditions (including individuals being unaware that they are part of an experiment).
(click here for part 1)
I was going to write more about quasi-experimental methods, but then I realized why these are usually discussed last in econometrics/empirical methods books. In order to see why quasi-experimental methods are useful, it’s first helpful to understand why experiments are good and where non-experimental methods can falter. Of course, experiments have drawbacks too and non-experimental non-quasi-experimental methods can produce valid results under some conditions. But we’ll talk about all that later.
When properly designed and executed, an experiment will easily allow you to estimate the causal effect of a randomly assigned condition (“treatment”), X, on any outcome Y: effect of a job training program on employment, effect of teacher training on student outcomes, effect of a drug on mortality, effect of dog ownership on health, etc. At a very basic level, a valid experiment only requires two things: (1) a control group (let’s say one composed of people) that is not exposed to the treatment X and (2) random assignment to treatment. This kind of setup is called a “randomized controlled experiment”. In this case, you can just compare the differences in Y’s in the two groups to arrive at the causal effect of X (divide by differences in X between the two groups if X is continuous).
Why do you need a control group? Because things change over time. Over longer time scales, people age, get sick, get better, gain/lose weight, get/lose jobs, learn/forget things, move, and generally act in ways that could affect Y even without X. Over shorter time scales, people might be affected by the time of day, by the temperature, by changes in their mood, by the building into which you bring them, or even by the fact that they are taking part in an experiment. If you don’t have a control group, it’s essentially impossible to tease out the effect of X on Y from the influence of other forces on Y. Most researchers know this and use a control group to ensure that the estimated effect of X on Y is not confounded by anything else happening to the treated group.
One exception I found (there surely are others) is this study, which recruited 4-10 month old infants and mothers for a sleep lab study of “crying it out” (a method by which some parents teach babies to fall asleep on their own by letting them cry and learn to self-soothe). All mothers were instructed to let the babies “cry it out” when falling asleep, so no control group was used. Even after the babies stopped crying on the third day, their cortisol levels were elevated, suggesting that they were stressed out. As this Slate article points out, it is impossible to know whether the babies were stressed out by exposure to “cry it out” (as the research article claims) or by the fact that they were in a foreign environment – the sleep lab. The absence of a control group that faced the same conditions without being exposed to “crying it out” thus fundamentally limits this study’s ability to say anything definitive about how crying it out affects stress levels.
Now you might say, “Sure, for some things, a control group that’s part of the experiment is important. But for outcomes like mortality or income, why can’t we just compare outcomes of people who enrolled in the experiment to outcomes similar people who are not part of the experiment? That seems easier and cheaper.” The problem with this approach is that it’s hard to be sure you’re comparing treated “oranges” to untreated “oranges” as opposed to treated “oranges” to untreated “apples”. Even if you collect information on hundreds of individual characteristics, it’s hard to be sure that there aren’t other characteristics that differ between your experimental treatment group and your real-world control group. And those unobserved differences might themselves influence outcomes. For example, maybe the group that signed up for your job training experiment is more (less) motivated and would have gotten jobs at higher (lower) rates than the real-world control group even if they didn’t take part in your experiment. Or maybe the experimental group is healthier (sicker) in ways that you aren’t capturing and they would have lived longer (died sooner) than the real-world control group. For these reasons, you should always be suspicious of “experiments” where the control group is non-existent or isn’t drawn from the group that signed up for the study.
Finally, why can’t you let people decide themselves whether to be in the control group or not? For the same reason that your control group needs to consist of people who signed up for your experiment – if you don’t assign people to the treatment group randomly you can’t be sure that the two groups – treatment and control – are alike in every single way that affects Y except for X. It could be that people who sign up for the treatment are more desperate for whatever reason, and desperate people may behave differently in all sorts of ways that then affect all sort of outcome. Or it could be that they are more adventurous, which again could affect them in all sorts of ways. Or they eat more broccoli/cheese/ice cream and you didn’t think to ask about that. If there are any such differences that you don’t observe and control for adequately, you can never be sure that differences in Y between the two groups are solely due to the treatment X.
But what if you’re ABSOLUTELY SURE that there’s nothing different between your treatment and non-randomly selected control group that could affect Y other than X and other things you’ve controlled for? The thing is, you can never be sure, otherwise you probably wouldn’t be running an experiment. To be absolutely sure would imply that you know everything about how Y is determined except for the effect of X on Y. And there’s just no way that we know that much about anything that we’d want to study (at least as far as social science and medicine are concerned). But if you have a good counter-example, email me!
That was a long one! Next time, we’ll talk about how even randomized controlled experiments can go wrong.
You just read a fascinating article suggesting that drinking a glass of red wine is equivalent to spending an hour at the gym, that morning people are better positioned for success, or that gun control reduces policy shootings. Let’s pretend that instead of immediately posting the article on your favorite social media website (which I’ll admit I’m sometimes guilty of myself), you instead wonder if the scientific methods behind the study are sound and if you can draw conclusions about cause and effect. How do you figure that out?
Unsurprisingly, it can be really hard. Alex Edmans, a Professor of Finance, has a recent excellent blog post about separating causation and correlation. After seeing lots of (often subtly) flawed research shared on social media, I’ve also been planning to write a guide to separating solid findings from not-so-convincing ones. It was going to be a cool flowchart that you can make your way through, with explanations along the way about why each step matters. But after having it on my “fun” to do list for months, I realized that the only way this flowchart will ever see the light of day is if I write it as a series of blog posts and then summarize things in a flowchart. This is part one.
The first question to ask when evaluating a study is whether it is based on an experiment (where researchers manipulated something, either in a laboratory or in the “field”) or is observational (where researchers collected some data). Experiments may be more reliable if done correctly, but they are not panaceas: there are many ways experiments can go wrong and a big issue is whether experimental findings translate to the real world. But we do evaluate experiments slightly differently from observational studies, so this is the first fork in our imaginary flowchart.
Let’s start with observational studies (this will repeat Alex’s post a bit, but I think it’s useful repetition). The first question to ask yourself is whether the researchers used any “quasi-experimental” variation to come to their conclusion. In general, studies that do are more credible than studies that do not. For example, sometimes researchers get lucky and stumble on a seemingly arbitrary rule that separates subjects (firms, individuals, regions) into two or more different groups. Certain scholarships are given to individuals who meet a specific cutoff on a standardized test score. Because it’s very difficult to control your score down to the point, people right below and right above the cutoff should be very similar in ability, except that the ones right below the cutoff did not get a scholarship and those above the cutoff did. Voila – you can study the effect of getting a scholarship on, for example, college completion, without worrying whether people without scholarships are fundamentally different from people with scholarships!
In order for this approach – called a “regression discontinuity” – to work well, (a) it must be impossible, or at least very difficult, for entities to manipulate whether they’re right below or above the cutoff and (b) researchers must not stray so far from the cutoff that the similarity of subjects below and above the cutoff starts becoming questionable. Ultimately, whether these two conditions hold depends on the context and how narrow of a range around the cutoff researchers select. For example, it’s hard to control whether your SAT score is 1480 or 1490, but scoring 1300 versus 1400 is unlikely to be mostly due to chance. In other contexts, small manipulations are easy to do – for example, many firms have enough flexibility in accounting to turn slightly negative earnings into slightly positive earnings, making a regression discontinuity approach not-so-credible in this setting.
In the next post in this series (which may or may not be the next post chronologically), we’ll talk about other kinds of quasi-experimental variation. Bonus points to people who email me an article about a study they want scrutinized!
I loved PhD Comics as a grad student. Frankly, I still find them quite relatable as an Assistant Professor (and in a few years, I can let you know if they’re relatable from a tenured professor’s standpoint!). So I was really intrigued when I heard that the author of PhD Comics, Jorge Cham, co-wrote a book (with Daniel Whiteson) called “We Have No Idea”. It’s not about being a grad student (though the title could probably be reused for a book about that as well); it’s “A Guide to the Unknown Universe”. As the title suggests, it’s about some of the current limits of our knowledge.
If you like popular science books, physics, and astronomy, I definitely recommend this book. Most popular science books talk about what we know, and if you end up with a nagging “but” or a question, it’s hard to tell whether it’s because details were left out, because you misunderstood the explanation, or because you stumbled on an question scientists haven’t figured out yet. This book is different because it explicitly discusses big-picture stuff science hasn’t figured out yet (and suggests some possibilities for what the answers could be). It’s written in the easy-to-read and humorous style of Jorge Cham but co-written with an experimental high-energy physicist, so you can be pretty sure the book is both entertaining and correct.
The book covers dark matter, dark energy, elementary particles, and time, among other things. Even if you’ve read books on these subjects before, reading a book that synthesizes our lack of knowledge in these areas is both enlightening and exciting.
Let’s talk about genetically modified organisms (GMOs). But first, let me ask you a question. Are chainsaws good or bad? That’s a weird question, isn’t it? A chainsaw can be very useful if you need to cut something, but it can also be dangerous if you’re not careful or if you deliberately attack someone with it.
Now let’s go back to talking about GMOs. As I elaborate on below, it’s just as silly to ask whether GMOs are good or bad as it is to ask whether chainsaws are good or bad. Genetic modification is a tool. If used wisely, it can provide a significant advantage over traditional plant-breeding techniques. But it can also be used for evil. So my proposal is that we stop treating all GMOs as being the same (this also goes for people who love GMOs!) and instead think about what exactly is being genetically modified.
Let me demonstrate why this is important. Two very common genetic modifications out there are to (1) make crops herbicide-resistant (e.g., "Roundup ready corn") or (2) make crops produce their own pesticides (e.g., "Bt corn"). What effect would the first modification have? Well, it’s likely to increase the amount of herbicide farmers spray on crops because now you don’t have to worry about killing the crops themselves. This may be undesirable to the extent that higher levels of herbicide are more harmful to human health (although there’s no evidence that Roundup is harmful to human health unless you are stupid enough to swallow it in high doses) and to the extent that it contributes to the creation of weeds resistant to Roundup ("superweeds"). But making crops produce their own pesticides will likely decrease the amount of pesticide farmers spray on crops because the crops are making their own (oh, and for the record, organic farmers use Bt as a pesticide all the time). That could be a significant improvement for the environment, for crop productivity and (because less pesticides are used) for human health.
Fine, but these are only the intended consequences of genetic engineering. What about the unintended ones? Well, let’s think about traditional plant breeding where you’re letting the mutations in DNA happen naturally and selecting the offspring with the best traits. We’ve done A TON of that. How else do you think your banana or your “traditional” corn got here? And we really had no idea what was being altered in the plants’ DNA. It was essentially impossible to guarantee that the new variety was different ONLY in the desirable traits. By contrast, because genetic engineering is very targeted, we can be very confident that no other changes are taking place. So it’s pretty hard to claim that genetic engineering will produce unintended consequences (at least on a systematic basis) – I would be much more worried about that traditionally bred apple you’re eating.
But, you say, these traditional varieties have been grown for hundreds or thousands of years so if there were something wrong with the crops that we developed during this time, we would know by now. That’s certainly true if a mutation made a crop poisonous such that eating a bite killed you. But if we accidentally bred something that, say, doubled your chances of developing a certain kind of cancer if eaten for prolonged periods of time, there’s a good chance no one would have noticed because they were too busy dying of other things. And many fruits and vegetables do contain toxins naturally. So enjoy those glycoalkaloids in your "non-genetically modified" potatoes!
In summary, there is absolutely no reason to think that the entire concept of genetically modifying organisms is a bad idea. By all means, we should ask if a specific genetic modification can have adverse health or environmental consequences. But let’s stop being unscientific about this whole GMO thing by saying we shouldn’t do genetic modification at all.
A while back, I posited a simple mechanism by which completely ineffective treatments can appear effective and maybe even gain prominence as "alternative” or “traditional” medicine. So then are all alternative medicines ineffective? After all, there's that famous joke: "Q: What do you call non-traditional medicine that works? A: Traditional medicine."
At first glance, there's a lot of logic to that idea. If something really works, won't it soon get incorporated into mainstream medicine? Here's a simple explanation for why the answer is "no".
In the US, non-traditional medicine can be roughly described as anything that seems like it’s supposed to make you healthier in one way or another, but with the cautionary label "This statement has not been evaluated by the Food and Drug Administration. This product is not intended to diagnose, treat, cure, or prevent any disease." If you want to “legally” be able to claim that something works, you need to have Food and Drug Administration (FDA) approval.
Because FDA approval requires clinical trials, which are expensive, private companies will only undertake such trials if they expect to profit from the results. But private companies will not be able to patent most alternative medicines because most are by definition not novel treatments but ones that have been in use for years, decades, or even centuries. And you cannot patent something that isn’t novel. Instead, a reasonable expectation is that other companies will use the results to market the same medicine and the company who did the testing will not be able to recoup the trial costs by charging more for the medicine.
Thus, testing whether alternative medicine is effective is a “public good”: society (including other companies) captures most of the benefits, while whoever does the testing bears the full cost. This implies that the private market will under-test alternative medicine. In fact, the only reason private companies would test anything that they can’t patent is for PR purposes, which is probably a pretty weak incentive.
The WRONG conclusion to draw from this analysis is that alternative medicine is effective but overlooked by the private sector. But, as my previous post makes clear, alternative medicine could just be “correlated” with feeling cured or work as a placebo. So what do we do about this? The clearest implication is to have public funding of scientific research to test which alternative medicines do and do not work.
It’s true that there is already some testing of alternative medicine. But if you search for “alternative medicine research funding”, you basically get nothing (you get much better results for "dog diabetes research funding"). And given how prevalent the use of alternative medicine is, it seems like we should be funding more research of its effectiveness. It’s worth it (up to a point, of course) to spend some money up front and either put a definitive nail in the coffin of a useless approach or discover medicine that could be incorporated into everyday medical practice. Undoubtedly, some people will keep taking “natural” medicine no matter what research says. But we should figure out what’s true and what’s not.
Recently, the CDC recommended that sexually active women who don’t use birth control don’t drink. I’m not talking about not drinking heavily or not getting buzzed. Not drinking at all. Not even a little bit. Not even half a glass of wine. Because who knows what could happen? Even though there seems to be no good evidence that drinking half a glass of wine here and there will do anything bad to your baby, even if you know you’re pregnant (see here, here, and here, for example), why risk it?
So in the spirit of not risking, I think the CDC should extend their recommendations to women who aren’t on birth control to include: no skydiving, no skiing, no biking, no hot tubs, no ibuprofen, no caffeine, no deli meats, and no jogging. Wait, you say. But can’t pregnant women jog? Yes, but we actually don’t know whether it’s safe or not. Even though there isn’t good evidence that it’s NOT safe, why risk it? Clearly, pregnant women shouldn’t sprint, so maybe jogging is bad too.
Oh, and let’s not forget that pregnant women and their unborn babies die in car accidents all the time (here’s one from yesterday). I’m surprised the CDC has not recommended that pregnant women not get into cars. Or even non-pregnant ones. Let’s stay at home barefoot like nature intended.
Update: this article does a great job discussing other issues with the new CDC recommendations. Summary: CDC, I'm very very disappointed in you.
There has been a lot of sickness around my household, prompting me to try to figure out what I could do to prevent myself from getting sick. I found myself taking probiotic pills, even though the germs around my house were not the kind a probiotic could help against. I also drank vitamin C mixes and in general kept wondering about what other non-clinically-tested thing I could take that maybe marginally works. And then I remembered one thing that we know works very well in many situations - sugar pills, aka placebos. In fact, they sometimes work even when people know they're taking a placebo (see here and here). So here's my great business idea: someone should sell placebo pills that people can take when they feel sick.
Now I know what you're going to say - there are many "placebos" out there in the form of homeopathic treatments and herbal remedies. Those things, however, are fairly expensive. Although there's some evidence that more expensive placebos provide more relief (see here and here), the market needs some cheap placebos too. And the best part is that you don't even have to deceive people. In fact, I was surprised to see that no one makes such a thing already (if you want to have a good laugh, google "placebo pills"). You're welcome.
Ever since a class I took in college, I've been skeptical about whether organic fruits and vegetables are better for you in any meaningful way. What happened in that class? The professor pointed out that the term "organic" does not mean that the produce was grown with no pesticides or herbicides; only that it was grown with non-man-made ones. But surely that's better than synthetic pesticides and herbicides? Maybe. Nature has produced some pretty toxic crap (think of all the plants out there that are poisonous to humans). It's not clear that being restricted to a subset of chemicals (i.e., ones approved for organic farming) will mean that organic food ends up covered with less harmful ones. Indeed, the best available evidence, summarized here, is that organic isn't necessarily better for you.
I think it's entirely appropriate to worry about the pesticide and herbicide levels in our food. But the organic label is at best a distraction because the distinction between man-made and natural chemicals is quite meaningless. At worst, it's harmful because it leads people to believe that they can avoid the negative impacts of chemicals by eating organic. What we really need is a way to know what was sprayed on the food we're eating and how much. Unfortunately, as long as people believe that "organic" = "healthy", that's unlikely to happen.
Last week, I received the following email about an awesome new organization:
I think this is a great initiative to (a) give researchers a better idea of how long journals take and (b) put pressure on journals to be faster with their reviews by creating transparency. Consider joining and tracking your own review experience!