|Main » Articles|
Now that I’ve written about why randomized controlled experiments are so great, it’s time to talk about some of the common ways in which they can go wrong. But first I’d like to make an important caveat: finding potential flaws with any research, even randomized controlled experiments, is actually pretty easy. I haven’t come across any study that couldn’t be criticized on one or more grounds. So with the power to criticize also comes great responsibility: don’t use it to dismiss results you don’t like. Don’t selectively apply these criticisms to some studies and accept the findings of others that could be subject to similar criticisms. Use the knowledge wisely.
The main concern with randomized controlled experiments is the question of “external validity”. Sure, you’ve shown that something works in the laboratory or in a carefully controlled setting, but does it work in the real world? If people in the laboratory are different from those who will be subject to the treatment in the real world or if people (including those administering the treatment) behave differently in the experiment.
For example, maybe you run a clinical trial for a drug and only recruit men to participate in the trial. Will the drug work as well on women? Will there be different side effects for them? For a long time, clinical trials frequently omitted or under-enrolled women, although that is now changing. Or maybe you enroll obese individuals in a weight-loss trial but only includes ones without other health problems like diabetes. But once the drug goes to market, it may be prescribed to all types of obese individuals, and potentially have different effects than what you observed in the laboratory. Or maybe the nurses working on your trial are really good at getting patients to take the drug on time, but in the real world people forget to take it and you observe much lower effectiveness.
External validity is a potential problem with all experiments, not just clinical trials and not just stylized laboratory experiments. As long as people know they are part of an experiment, they may change how they act (maybe to make the experimenter happy, maybe to hide socially unacceptable views or behaviors, or maybe because they don’t take the experimental treatment as seriously as they do things in the real world). This is known as the Hawthorne effect, and it’s essentially impossible to rule out unless your subjects do not know that they are being studied.
Finally, external validity can also be a concern if you’re trying to say something about high-stakes decisions by running a low-stakes experiment. For example, you’re open to this criticism if you want to say something about how people save for retirement and you either run a hypothetical choice experiment or an experiment with low stakes (because who can afford to run an experiment where tens of thousands of dollars are at stake?). In some cases, the low-stakes findings survive in a high-stakes environment, but in others they don’t.
The bottom lines is that the most convincing experimental conclusions are those that are based on a representative population that faces stakes similar to what they would be in the real world, and where the experiment closely resembles real-world conditions (including individuals being unaware that they are part of an experiment).
(click here for part 1)
I was going to write more about quasi-experimental methods, but then I realized why these are usually discussed last in econometrics/empirical methods books. In order to see why quasi-experimental methods are useful, it’s first helpful to understand why experiments are good and where non-experimental methods can falter. Of course, experiments have drawbacks too and non-experimental non-quasi-experimental methods can produce valid results under some conditions. But we’ll talk about all that later.
When properly designed and executed, an experiment will easily allow you to estimate the causal effect of a randomly assigned condition (“treatment”), X, on any outcome Y: effect of a job training program on employment, effect of teacher training on student outcomes, effect of a drug on mortality, effect of dog ownership on health, etc. At a very basic level, a valid experiment only requires two things: (1) a control group (let’s say one composed of people) that is not exposed to the treatment X and (2) random assignment to treatment. This kind of setup is called a “randomized controlled experiment”. In this case, you can just compare the differences in Y’s in the two groups to arrive at the causal effect of X (divide by differences in X between the two groups if X is continuous).
Why do you need a control group? Because things change over time. Over longer time scales, people age, get sick, get better, gain/lose weight, get/lose jobs, learn/forget things, move, and generally act in ways that could affect Y even without X. Over shorter time scales, people might be affected by the time of day, by the temperature, by changes in their mood, by the building into which you bring them, or even by the fact that they are taking part in an experiment. If you don’t have a control group, it’s essentially impossible to tease out the effect of X on Y from the influence of other forces on Y. Most researchers know this and use a control group to ensure that the estimated effect of X on Y is not confounded by anything else happening to the treated group.
One exception I found (there surely are others) is this study, which recruited 4-10 month old infants and mothers for a sleep lab study of “crying it out” (a method by which some parents teach babies to fall asleep on their own by letting them cry and learn to self-soothe). All mothers were instructed to let the babies “cry it out” when falling asleep, so no control group was used. Even after the babies stopped crying on the third day, their cortisol levels were elevated, suggesting that they were stressed out. As this Slate article points out, it is impossible to know whether the babies were stressed out by exposure to “cry it out” (as the research article claims) or by the fact that they were in a foreign environment – the sleep lab. The absence of a control group that faced the same conditions without being exposed to “crying it out” thus fundamentally limits this study’s ability to say anything definitive about how crying it out affects stress levels.
Now you might say, “Sure, for some things, a control group that’s part of the experiment is important. But for outcomes like mortality or income, why can’t we just compare outcomes of people who enrolled in the experiment to outcomes similar people who are not part of the experiment? That seems easier and cheaper.” The problem with this approach is that it’s hard to be sure you’re comparing treated “oranges” to untreated “oranges” as opposed to treated “oranges” to untreated “apples”. Even if you collect information on hundreds of individual characteristics, it’s hard to be sure that there aren’t other characteristics that differ between your experimental treatment group and your real-world control group. And those unobserved differences might themselves influence outcomes. For example, maybe the group that signed up for your job training experiment is more (less) motivated and would have gotten jobs at higher (lower) rates than the real-world control group even if they didn’t take part in your experiment. Or maybe the experimental group is healthier (sicker) in ways that you aren’t capturing and they would have lived longer (died sooner) than the real-world control group. For these reasons, you should always be suspicious of “experiments” where the control group is non-existent or isn’t drawn from the group that signed up for the study.
Finally, why can’t you let people decide themselves whether to be in the control group or not? For the same reason that your control group needs to consist of people who signed up for your experiment – if you don’t assign people to the treatment group randomly you can’t be sure that the two groups – treatment and control – are alike in every single way that affects Y except for X. It could be that people who sign up for the treatment are more desperate for whatever reason, and desperate people may behave differently in all sorts of ways that then affect all sort of outcome. Or it could be that they are more adventurous, which again could affect them in all sorts of ways. Or they eat more broccoli/cheese/ice cream and you didn’t think to ask about that. If there are any such differences that you don’t observe and control for adequately, you can never be sure that differences in Y between the two groups are solely due to the treatment X.
But what if you’re ABSOLUTELY SURE that there’s nothing different between your treatment and non-randomly selected control group that could affect Y other than X and other things you’ve controlled for? The thing is, you can never be sure, otherwise you probably wouldn’t be running an experiment. To be absolutely sure would imply that you know everything about how Y is determined except for the effect of X on Y. And there’s just no way that we know that much about anything that we’d want to study (at least as far as social science and medicine are concerned). But if you have a good counter-example, email me!
That was a long one! Next time, we’ll talk about how even randomized controlled experiments can go wrong.
You just read a fascinating article suggesting that drinking a glass of red wine is equivalent to spending an hour at the gym, that morning people are better positioned for success, or that gun control reduces policy shootings. Let’s pretend that instead of immediately posting the article on your favorite social media website (which I’ll admit I’m sometimes guilty of myself), you instead wonder if the scientific methods behind the study are sound and if you can draw conclusions about cause and effect. How do you figure that out?
Unsurprisingly, it can be really hard. Alex Edmans, a Professor of Finance, has a recent excellent blog post about separating causation and correlation. After seeing lots of (often subtly) flawed research shared on social media, I’ve also been planning to write a guide to separating solid findings from not-so-convincing ones. It was going to be a cool flowchart that you can make your way through, with explanations along the way about why each step matters. But after having it on my “fun” to do list for months, I realized that the only way this flowchart will ever see the light of day is if I write it as a series of blog posts and then summarize things in a flowchart. This is part one.
The first question to ask when evaluating a study is whether it is based on an experiment (where researchers manipulated something, either in a laboratory or in the “field”) or is observational (where researchers collected some data). Experiments may be more reliable if done correctly, but they are not panaceas: there are many ways experiments can go wrong and a big issue is whether experimental findings translate to the real world. But we do evaluate experiments slightly differently from observational studies, so this is the first fork in our imaginary flowchart.
Let’s start with observational studies (this will repeat Alex’s post a bit, but I think it’s useful repetition). The first question to ask yourself is whether the researchers used any “quasi-experimental” variation to come to their conclusion. In general, studies that do are more credible than studies that do not. For example, sometimes researchers get lucky and stumble on a seemingly arbitrary rule that separates subjects (firms, individuals, regions) into two or more different groups. Certain scholarships are given to individuals who meet a specific cutoff on a standardized test score. Because it’s very difficult to control your score down to the point, people right below and right above the cutoff should be very similar in ability, except that the ones right below the cutoff did not get a scholarship and those above the cutoff did. Voila – you can study the effect of getting a scholarship on, for example, college completion, without worrying whether people without scholarships are fundamentally different from people with scholarships!
In order for this approach – called a “regression discontinuity” – to work well, (a) it must be impossible, or at least very difficult, for entities to manipulate whether they’re right below or above the cutoff and (b) researchers must not stray so far from the cutoff that the similarity of subjects below and above the cutoff starts becoming questionable. Ultimately, whether these two conditions hold depends on the context and how narrow of a range around the cutoff researchers select. For example, it’s hard to control whether your SAT score is 1480 or 1490, but scoring 1300 versus 1400 is unlikely to be mostly due to chance. In other contexts, small manipulations are easy to do – for example, many firms have enough flexibility in accounting to turn slightly negative earnings into slightly positive earnings, making a regression discontinuity approach not-so-credible in this setting.
In the next post in this series (which may or may not be the next post chronologically), we’ll talk about other kinds of quasi-experimental variation. Bonus points to people who email me an article about a study they want scrutinized!
In my talks with graduate students, I realized that many of them look for research ideas in the conclusion of a paper, where author(s) will frequently say that something is "a fruitful avenue for future research." I always tell the students that this is a terrible idea, and I thought I'd share why that is.
When I write that something is "a fruitful avenue for future research", it generally means one of three things. The first is that it actually IS a great idea, and I'm already working on it. So you'd probably be behind. The second is that the direction for future research is a great idea, but I have no clue how to do it correctly. Of course, a clever graduate student or other researcher could come up with a novel research strategy, but unless you're reading terrible papers, chances are that the paper's author already thought hard about how to do it and gave up. The third possibility is that the "area for future research" is very doable and straightforward but not very interesting (e.g., replicating the findings in a different sample). And unless you're just looking for an additional paper to pad your resume, you shouldn't do this either.
In short, don't look for great research ideas in statements like these. There's no such thing as a free lunch.
I loved PhD Comics as a grad student. Frankly, I still find them quite relatable as an Assistant Professor (and in a few years, I can let you know if they’re relatable from a tenured professor’s standpoint!). So I was really intrigued when I heard that the author of PhD Comics, Jorge Cham, co-wrote a book (with Daniel Whiteson) called “We Have No Idea”. It’s not about being a grad student (though the title could probably be reused for a book about that as well); it’s “A Guide to the Unknown Universe”. As the title suggests, it’s about some of the current limits of our knowledge.
If you like popular science books, physics, and astronomy, I definitely recommend this book. Most popular science books talk about what we know, and if you end up with a nagging “but” or a question, it’s hard to tell whether it’s because details were left out, because you misunderstood the explanation, or because you stumbled on an question scientists haven’t figured out yet. This book is different because it explicitly discusses big-picture stuff science hasn’t figured out yet (and suggests some possibilities for what the answers could be). It’s written in the easy-to-read and humorous style of Jorge Cham but co-written with an experimental high-energy physicist, so you can be pretty sure the book is both entertaining and correct.
The book covers dark matter, dark energy, elementary particles, and time, among other things. Even if you’ve read books on these subjects before, reading a book that synthesizes our lack of knowledge in these areas is both enlightening and exciting.
“Often wrong, always certain”, goes a saying I once heard about economists. Frankly, I hate admitting when I’m wrong or when I don’t know something (don’t try to use that against me in an argument, I’ll totally deny that it applies in that case). I force myself to do it, and I think I succeed most of the time, but it is very unpleasant.
I think the same thing applies to other people. We would rather take an “educated guess” than to say “I don’t know” and we would rather defend our original point in an argument even though halfway through we may have started wondering (very very deep down) if we’re wrong.
A few days ago, I realized that I actually have great data to test this hypothesis. In a survey experiment pilot my colleague Olga Shurchkov and I ran recently, we asked people two multiple choice questions: (1) what is the current concentration of CO2 in the atmosphere? and (2) what is the “albedo effect”? We included “I don’t know” as an answer option and used the number of correct answers as a gauge for objective knowledge about climate science (there was another question asking people to name greenhouse gases, but that one is more complicated because there are multiple correct answers).
Our sample was not necessarily representative of the US (Amazon MTurk workers), but there is definitely a wide range of various demographic and economic characteristics in our data. We never looked at what fraction of people answered “I don’t know”, but my prior was that it was low. I got really excited to have some data to test my hypothesis. I was even going to run an informal survey on Facebook, making up a fake city and asking people which continent it was located on to see how many of my friends would admit that they didn’t know.
But I decided to look at the survey data first, and frankly I was shocked. 224 out of 361 respondents (62%) admitted they didn’t know the CO2 concentration in the atmosphere (22% chose the right answer and the rest chose a wrong answer, if you’re wondering). 200 (55%) admitted that they didn’t know what the albedo effect was (20% got it right). Apparently the majority of people have no problem admitting when they don’t know something (at least on a survey). Even though my original hypothesis didn't pan out, I thought the results might be interesting to some of my blog readers.
And there you go: I was wrong.
Obviously, it’s been a while since I’ve blogged. As it turns out, I’m up for tenure review in two years, and with the publication lag being what it is (especially considering my historical rejection probabilities), I’ve been focusing on getting my working papers published. The good news is that it worked – three papers got accepted this year, and two more are under review. I finally get to work on analyzing new-ish data and putting together first drafts, which is my favorite part of the process.
I’ve also been working on Academic Sequitur (slowly but surely). We’re all set up to track new articles in 88 journals and working paper series, which is very exciting. (The website is still being built, but if you want to be notified when it’s ready for prime-time, sign up here). In the meantime, I’ve decided to post some fun facts about our current database. Keep in mind that this database isn’t representative of all research in economics/finance because we have more years of information for some journals. But for blogging purposes, it’s close enough!
First fun fact: the average econ/finance paper has 2.08 authors. About 29 percent of the papers have one author, 42 percent have two, 22 percent have three, and 5 percent have four. That covers 98.7 percent of papers. Then we get into crazy territory with papers that have 5, 10, or even 17 authors! And the record for the largest number of authors goes to…“Everything You Always Wanted to Know about Inventors (But Never Asked): Evidence from the PatVal-EU Survey” (a CEPR Discussion Paper from 2006). Let’s see if another paper comes along in the future to break that record.
Now let’s talk about the content of the articles themselves. If we don’t count word variations as unique words (“rate” and “rated”, “tax” and “taxes”, etc.), only count words that are used 3 times or more (even the internet has spelling errors!), and ignore very common English words like “the”, “I”, and “we”, the abstracts contain over 15,000 unique words. Out of these, what do you think is the most common word that economists use in their abstracts? It is…drum roll…“model”. How stereotypical, right? That is followed by (in order): “effect”, “paper”, “market”, “result”, “increase”, “country”, “policy”, “firm”, and “data”. Interestingly, “increase” is used almost 6 times more than “decrease” (which ranks 195th on the list). So maybe economics are not so dismal after all? Unless all these articles are about tax increases.
At this point, it would be pretty straightforward for us to release a product that lets you pick which journals, authors, and/or user-specified keywords you want to be notified about. But we’re going further and developing an algorithm that classifies articles into both broad subject areas (e.g., “Development economics”) and narrower topics (e.g., “credit constraints”). Text analysis is a difficult problem, especially when you’re dealing with text that’s not written in everyday English language (because there are fewer existing tools available to process the words). But we have a plan, and we’re confident that it will succeed! Shameless self-promotion over. Stay tuned.
Many charter schools appear to work quite well. Here are two quotes from two articles summarizing the research:
“sound research has shown that, when properly managed and overseen, well-run charter schools give families a desperately needed alternative to inadequate traditional schools in poor urban neighborhoods.” (NY Times, October 13, 2016)
“The briefest summary is this: Many charter schools fail to live up to their promise, but one type has repeatedly shown impressive results.” (NY Times, November 4, 2016)
Because in many cases admissions to charter schools is done through a lottery, assignment to charter schools is literally random, for students that apply. So the level of confidence in these results should be as high as it gets. There’s also no reason to think that the “one type” of charters that has shown significant results cannot be replicated elsewhere (in fact, it has). Then why do so many liberals appear to be against charter schools?
I don’t have a good answer to that question. Liberals’ resistance to charter schools in any way, shape or form reminds me of conservatives’ resistance to any gun control regulation. No matter what type of gun control legislation is proposed, their answer is always “this is a terrible idea”. They also frequently invoke a slippery slope argument – “first, the Democrats will impose more thorough background checks, next, they will take away all our guns”. My sense is that liberal voters see charter schools as a similar existential threat to public school funding. But just like in the case of gun control, to me that logic is very dubious.
We need more evidence-based education reform. Charter schools that have been shown to work seem worthy of our support. I agree with Sue Dynarski, a prominent economics of education scholar, who was quoted in the second article as saying “To me, it is immoral to deny children a better education because charters don’t meet some voters’ ideal of what a public school should be. Children don’t live in the long term. They need us to deliver now.”
I teach masters students the basics of micro- and macro-economics. When we talk about government intervention, one of the first topics is the effect of taxes in an otherwise competitive market. By this point, it’s pretty easy for them to see that taxes hurt both consumers and producers in that market because, generally, (1) buyers have to pay more for the good than before and sellers receive less in revenue than before and (2) taxes reduce the activity that is being taxed, lowering surplus for everyone. For example, if it costs a seller $1 to make a cup of coffee and every day she was selling one to a buyer who was willing to pay only $1.05 (presumably for some amount between $1 and $1.05), placing a 10-cent tax on that market will probably eliminate that transaction. This second effect is called the “deadweight loss” of taxation because losing these transactions creates only costs (to the affected buyers and sellers) and no benefits (because the government doesn’t get tax revenue and consumers/producers do not benefit from transactions that don’t happen). That doesn’t mean we should never have taxes in competitive markets: if the government puts the tax revenue to good use, then social gains can overcome the deadweight loss. It just means there’s no free lunch!
It’s important to note that the assumption here is that we don’t want to limit the economic activity itself (e.g., because it generates pollution). When we talk about “externalities” such as pollution and how taxes can be used to resolve them, I usually ask “Do taxes to correct an environmental externality create deadweight loss?” By this point, a lot of my students have learned to equate taxes with “deadweight loss”, so many will generally say “yes”. However, that is not the case (but I’ll save that for another post).
After we cover taxes, I ask my students: “Do subsidies (in the form of a payment per unit of something produced/sold) in otherwise competitive markets create deadweight loss?” I always think this is an easy question because a subsidy is just a negative tax. The answer then should clearly be “yes”, but the students are usually stumped. So I thought I would write a post about the economics of subsidies.
Unsurprisingly, subsidies work in the opposite way that taxes do: they generally benefit both buyers and sellers by raining the amount a seller receives for selling a good and lowering the amount a buyer pays. No one participating in a subsidized market has an incentive to want to get rid of the subsidy because both sides benefit! Subsidies also increase the amount of the subsidized activity – add a 10-cent-per-cup subsidy for coffee and people will drink more coffee. Someone who wasn’t willing to pay more than $0.95 for that cup of coffee may now buy it for $1 because they also get a ten-cent subsidy that offsets some of that cost. Alternatively, if the subsidy goes to the seller, the seller may lower the price to $0.93, also inducing the buyer to buy.
But this increase in economic activity is not a good thing because the additional “units” being produced and exchanged are costing more to make than the buyers value them at. The net benefit (value to consumer minus cost to producer) to society of this additional economic activity is negative because the buyer values the good less than what it costs to produce. On top of that, subsidies need to be paid for by taxes, which means possibly creating deadweight loss in another market!
One justification people give for supporting subsidies is distributional concerns. Maybe we’re losing some efficiency, but we’re making sure that (presumably poor) people can afford to buy the good in question. However, subsidies are a crude and expensive way for achieving distributional goals because they help everyone who buys in the market, rich or poor. For example, subsidizing college education will certainly help poor students, but if the subsidy is given to everyone, it becomes much more expensive in terms of the amount of revenue (and deadweight loss of taxation) that needs to be generated.
An obvious way to improve on subsidizing something for everyone is more targeted subsidies (like financial aid for poor students). However, even that is not ideal because it distorts individuals’ choices. If we start subsidizing coffee for low-income individuals, coffee will be more affordable, but people will also drink more of it relative to other goods, and it’s not clear that we (or the individuals) want that. Rather, economists advocate giving poor individuals money and letting them decide what to spend it on. That comes with its own set of issues because it creates a larger incentive to pretend to be low-income, but it also respects individuals’ choices and does not lead to unnecessary distortions.
My representative, Rodney Davis, recently introduced a health care bill "to protect people with pre-existing conditions from discrimination against insurance companies." (yes, if you think about it, that sentence is poorly written).
I just wrote to him to ask a few details about his plan. I'm sharing the letter below because it demonstrates the difficulty of ensuring that individuals with pre-existing conditions can buy affordable insurance.
"I read about your new health care bill to make sure people with pre-existing conditions can buy health insurance. I'm just curious as to what happens if insurers offer someone who has cancer insurance for, say, $50,000 per year. Would you consider that acceptable? If not, what provisions does your plan have in place to ensure that does not happen?
If your plan has limits on whether insurers can charge different prices based on pre-existing conditions, how will the plan ensure that younger and healthier people do not have a disincentive to sign up because they are being offered insurance at a price that is much higher than their expected healthcare costs?"
There are really only two ways (that I can think of) to ensure that (1) people with pre-existing conditions are not being offered health insurance only at exorbitant prices and (2) you don't create a "death spiral" where people buying insurance on the individual market are increasingly sick because the healthier people drop out due to rising prices. The first is having an individual mandate (a stick) and the second is a generous tax credit that makes buying health insurance very cheap on the margin even if the pre-credit price is very high (a carrot). I look forward to seeing what Davis's actual plan is (the "Better Way" Republican agenda does mention a tax credit).
|Total entries in catalog: 145
Shown entries: 1-10
|Pages: 1 2 3 ... 14 15 »|