|Main » Articles » Musings on Economics|
|Entries in category: 48
Shown entries: 1-10
|Pages: 1 2 3 4 5 »|
Sort by: Date · Name · Rating · Comments · Views
Academic Sequitur is finally here! For those of you thinking “What the wha is that?”, Academic Sequitur is a website I’ve been working on for about two years. It makes staying up to date on current research much more easier and more systematic. We track 90+ economics/finance journals and working paper series. You create a “research portfolio”, telling us which journals, authors, and/or keywords you want to follow. I know that academics are super busy, so we made that step as easy as possible. We then deliver information on any new papers we find directly to your inbox daily, weekly or monthly. We provide a link to article on the publisher’s website when it is available, where you can view the full text if you have access (we don’t provide access ourselves).
Another huge benefit of Academic Sequitur is that you find out about papers much earlier than if you were to subscribe to journals’ tables of content. AER, JPE, Econometrica, the AEJ’s, and other journals often post accepted/forthcoming papers months before they are officially “published”. You can also browse your research portfolio updates on our website, save articles to favorites, and discover more things to follow by browsing site-wide updates. We also track the AEA RCT Registry, which means you can also learn about papers that aren't even written yet! And just for fun, we made a network graph for each author, so you can see who’s coauthoring with whom.
Academic Sequitur isn’t free because there’s no such thing as a free lunch. We charge $1.99/month after a 3-month trial period (no payment info required to sign up, so I guess that’s kind of a free lunch), but your time is valuable and your daily cup of coffee probably costs more than that anyway. And this way we don’t have to sell your information to advertisers to recoup development costs!
So go ahead and try Academic Sequitur for free! Let me know what you think.
Last week, I tweeted about a tool I use when I review papers I’ve reviewed before, especially at a different journal (basically, you can upload two pdfs and it will highlight deletions and additions in each one). This prompted a discussion about whether one should ever agree to review a paper for the second time at a different journal. I realized that my thoughts on this did not fit into tweet-sized text, so I decided to blog about the pros and cons of having the same reviewer review a paper at more than one journal. Ultimately, my recommendation for what to do if you’re in such a situation is to ask the editor what she or he prefers, but I think the pros and cons are worth thinking about for their own sake.
First, it’s worth noting that if you get the same person twice, she or he is likely to be quite knowledgeable in your area of research. Otherwise, it’s unlikely that two or more editors would independently think of him or her. You’re certainly not getting that third-year PhD student a second time! That fact on its own should benefit you if your paper is good (which I would hope you think is the case), as someone who is an expert may be better able to appreciate your contribution than someone who is not. It can be bad if you’re challenging a long-standing orthodoxy in your field. But in that case, it’s hard to see how getting a different draw would be that much more helpful – by definition, orthodoxy is something “generally accepted”, so your chances of getting a more favorable reviewer are unlikely to be much higher with a new draw. If your paper is challenging a particular person’s work, then it may be a good idea to note that person’s potential conflict of interest in a cover letter when you first submit the paper (note that I have no idea how often this is actually done).
Second, you may be concerned that someone who has reviewed your paper at a different journal and thus has seen it rejected (or even played a hand in that rejection) now has preconceived notions about its quality and is less likely to be favorable than a new reviewer. Here, it’s hard to know for sure what the objective truth is, but I’ll offer some thoughts. Papers get rejected for many reasons. Sometimes the reviewers think it’s a “crowded literature” and that your contribution is too marginal for publication in that particular journal. In that case, getting the same reviewer at a similarly-ranked journal is unlikely to be good news, unless you failed to spell out your contribution clearly the first time around. But getting the same reviewer at a lower-ranked journal is not necessarily bad – what’s not good enough for AER could be good enough for Journal of Development Economics.
If your paper wasn’t rejected for not making enough of a contribution, it was probably rejected for not being convincing enough in one way or another. Maybe you didn’t cluster your standard errors or maybe your null finding is not precise enough. Maybe the reviewer doesn’t believe your instrument is valid. Maybe you don’t have some robustness check. Maybe it’s just lots of little things that together make the reviewer think your paper is not salvageable. Here, there’s usually something you can do to improve the paper. Yes, you might not be able to find a different instrument, but maybe you can think of some ways to indirectly probe its validity. You can try that robustness check. You can add a footnote explaining why doing X doesn’t make sense in your case. All these should make the reviewer view your paper more favorably next time around. By contrast, suppose you get a different reviewer. If you didn’t do any of the things the previous reviewer suggested, the new reviewer might well have very similar concerns (my experience is that the major reviewer comments are pretty correlated!). If you did do something to address the previous reviewers’ concerns, the new reviewer is surely to think of some new ones to bring to your attention (the correlation coefficient is not 1!). That may mean a more rigorous review for the paper as a whole, but it will also make your life a whole lot harder.
Note that I am not suggesting you address literally everything a reviewer brought up, just in case you get him or her again. My strategy is to try to address things that are likely to also be brought up by different reviewers in the future (I personally have never gotten a referee report that did not contain a single useful suggestion) and that are not too difficult to do.
It’s also worth noting that the reviewer you’re getting again may not have recommended rejection in the first place. In that case, getting him or her again is good news, especially because presumably you changed something in response to their comments (see point above).
I’ve reviewed the same papers multiple times (3 times is my record), and I’ve had a few of my papers reviewed by the same referee at different journals (incidentally, here the record is also 3 times getting the same reviewer). Generally, these have not been negative experiences. The negative experiences I have heard about both involved the authors changing something in between submissions and the reviewer literally copy-pasting the same report. That is unfortunate indeed. But I think that is a failure of the individual reviewer and the reason I recommend all reviewers in such situations use the tool above.
I think the bottom line is that it’s an empirical questions whether or not it’s good to have the same reviewer look at your paper again or not (someone should do an RCT on this). If most reviewers are reasonable and diligent people, then it’s not a problem. If most reviewers are lazy and vindictive, then it is. I do think we live in the former world, and (on average) I’m optimistic when I think about getting the same reviewer again.
As a reward for reading all that, here’s a fun meme.
During my recent travels, I cleaned 315 typed pages of research notes, which I’ve been accumulating since early grad school days. Some of them weren’t really “research ideas” but just questions or thoughts. Many were boring and/or poorly articulated, some had been done, others were clearly important questions that I had no idea how to answer. A few were good ones that I’m going to pursue in the medium run. I deleted most of the text so that I’m not plagued by the notion that maybe I came up with something Nobel-prize worthy that I didn’t appreciate or couldn’t develop sufficiently at the time (I’m down to 29 pages). I also found a few funny one-liners that I thought I’d share (trust me, I didn’t write them down for entertainment, I really thought it might go somewhere). Overall, I highly recommend writing your research ideas down – at the very worst, you’ll get a laugh out of them later!
Very specific research questions (feel free to use at your own risk):
I also found my first-ever paper presentation, but that’s a story for another post.
I’ve recently had a number of conversations about the value and incentives of publishing in the top 5 economics journals. Why do people try so hard to publish in AER/QJE/JPE/ReStud/Econometrica? Why do departments and the profession reward them for it? For a fun article on Top5itis, click here. For an excellent discussion on the issues by top economists, click here (video of a 2017 AEA session). For a discussion of the information problem that “top” journals help solve, keep reading.
Here’s my theory: the only reason to care about journals in the internet age is because they act as a signal of a paper’s quality (this view is also echoed by one of the economists in the aforementioned video). This is a blog, so don’t ask me for a formal model, but here’s an informal proof by counterexample.
Suppose we lived in a world where the quality of a paper, however you want to define “quality” were perfectly observable. Why would we ever care if a paper published in journal A or journal B? One reason is if journal A were read more widely, in which case your articles would have a bigger impact there. But in a world where paper quality is observable, why would journal A be read more widely?
I can think of three reasons. All of them assume that it’s hard to read individual articles from many journals, so you pick your journals first and your articles to read second. Without this assumption, I struggle to think of any reason to care about journals. So here they are:
Clearly, (a) and (b) are bad, while having more competition on the speed and quality of the publication process would be a welcome thing. Also, pretty much every part of the review process is easily replicable except editor quality, so in equilibrium we might see journals improve a lot on all dimensions except this one.
Of course, paper quality is not observable, so the main point of this discussion is to highlight how valuable such quality measures would be. Departments could fret a lot less about how many “A-level” publications you have and so could you. Starting a new journal would likely become a lot easier. World hunger would be history [cross out]. Frankly, you wouldn’t even have to publish your paper!
But, alas, quality is difficult to observe, unless a paper happens to be in your field and you have read it carefully. Rating papers by the number of cites is problematic for at least two reasons. First, it preserves journals’ advantage: I’m 95% confident that there’s a causal effect of publishing in a top 5 on citation counts. Second, it would take a long time for paper quality to be revealed, at which point we’re all dead.
Could a well-designed rating system work to reveal paper quality and move us away from caring about journals? I think so – look at all the people who are providing public goods on Quora, Wikipedia, Amazon, etc. There’s already a recommendation system along these lines for biology and medical publications, though it’s hard to judge how well it works. Because stakes in academia are so low, there’s reasons to think that letting ratings be a free-for-all might go terribly wrong, but we’ll never know until we try! In the meantime, I’ve gotta go and try again for that top 5 publication.
About once a month, I come across a Facebook post or online article that expresses a sentiment along the lines of “If you have more than $X million/billion dollars and you’re not giving it away to causes that benefit poor people, you are a bad person”. So I wanted to provide an economic perspective on this view by discussing the distinction between having a lot of money and spending a lot of money.
Let’s start with what “money” is. Given that the dollar bills and electronic money have no inherent value, the best way to think about money today is that it gives you the right to control a certain amount of society’s resources that are made available for sale – final goods, services, land, and so on. The more money you have, the more resources you have the right to control.
But of course, having the right to control resources and having the resources themselves is not the same thing. You don’t get the resources unless you hand over the appropriate amount of money. So the most obvious things a millionaire can decide to do with her money is to finance consumption – buy herself a yacht, an island, and a private jet, and hire herself a personal shopper. But if a millionaire used all her money in this way, she would no longer be a millionaire. Thus, the next question is, what happens to money and the corresponding resources if the millionaire doesn’t use her money for consumption?
The answer depends a bit on where the money is. If I earn a billion dollars and put it under my mattress as cash, no one else has access to the money, and it just sits there (and I have a very tall bed to sleep on). Fewer dollars are now circulating in the economy and because dollars themselves are not worth anything, eventually the price of controlling resources (i.e., of goods and services) will fall just a tiny bit. Quite a boring outcome.
But millionaires and billionaires typically don’t put their money under a mattress. More likely, the money is with financial services firms (including banks) that either keep the money safe or manage the money to try to earn the rich person a return on investment. What happens then? Well, the firm will keep a bit on hand to meet demands for deposits/redemptions and loan out/invest the rest. The individuals and firms who receive that money then decide what to do with it. Generally, financial firms try not to invest in someone else’s consumption, so the money recipients will probably use that money to control resources in a way that could lead to profits down the road. Maybe they start an online retailer, invest in developing a new drug, open a new restaurant, or simply expand their existing operations.
The idea is similar if the billionaire is the one investing in a firm directly – the resources that money can buy will be used by the recipients of the money for various purposes. Finally, if the financial services firm or the millionaire herself buys an existing financial asset from another individual or firm, we’re back to the initial situation where someone just got a bunch of money and has to decide what to do with it.
Why did I make you read through this boring discourse on where money can go? To point out that there is only one case where the rich person is definitely using up resources that otherwise could have benefitted someone less fortunate than him – when he uses his money to finance his own consumption. In the other cases, it’s less clear that the billionaire’s decisions are hurting disadvantaged people, because it depends on how the economy’s resources end up being used. Thus, you want to make the argument that merely having a lot of money and holding onto it is immoral, you also have to believe that the vast majority of investments that are being made today do not benefit poor people. Finally, it’s worth mentioning that rich people have much higher savings rates than the average person, which helps finance aggregate investment in the economy. And, at the very least, we need investment to offset depreciation of the economy’s productive factors. At best, investment helps us create new technology that improves lives.
To anticipate some criticism, I am not trying to argue that inequality is not a problem and that we as a society shouldn’t do anything about it. I only want to point out that a blanket shaming of rich people misses an important facet of what their money could be doing for the economy.
Moral of this post: if you want to criticize rich people for being immoral, you should criticize their consumption, not the mere fact that they have a lot of money.
To stick with the stereotype of economists being dismal, I’d like to keep discussing ways in which experimental evidence can be misleading. Again, use these powers for good, not to criticize findings of studies you don’t like.
Yet another potential pitfall of experimental studies (related to the issue of generalizability) is did the treatment “dose” correspond to real-world doses you’re drawing a conclusion about? For example, if you randomly fed some mice 10 times their body weight in tannins and observed that they lived longer, you shouldn’t use those results to conclude that consuming red wine will extend human lifespan. And it’s not just because wine also has alcohol in it. It’s because the relationship between an input and an output (in this case, tannins and longevity) can be highly non-linear and non-monotonic. For example, humans will die if deprived of oxygen. But they will also die if given 100% oxygen to breathe. Similarly, you need iron to survive. But you can also overdose on iron. Thus, it’s important that an experiment uses a reasonable treatment “dose” and does not extrapolate to higher or lower doses. This can apply to program evaluations as well: if you randomly put some kids into an intensive tutoring program, you cannot use the results to say something about a once-a-month tutoring program and vice versa.
Another issue that applies to all research but can be particularly acute for experimental studies is spurious results. Some experiments are run with relatively few subjects, be they humans or mice. Within the experiment, this shouldn’t cause a problem if you’re only looking at one outcome because any test statistics you create will reflect the sample size. But because small-scale experiments are fairly easy to run, they can create spurious results on aggregate. Imagine that a researcher runs ten or twenty such experiments each year. Purely by statistical chance, some of them may show a non-zero treatment effect even if the real effect is always zero. The more experiments are run, the higher the chances of that happening. This applies not just to one researcher running multiple experiments, but to multiple researchers each running one experiment. Note that the researchers themselves also cannot know whether the findings are spurious or not, especially if we’re talking about multiple researchers each running one experiment. The possibility of spurious findings also applies to non-experimental studies. The good news is that spurious findings are very unlikely to be replicable even once, much less two or three times. Thus, it might be wise to be skeptical of findings (experimental or otherwise) that have not been replicated by other researchers, especially if such findings contradict several prior studies.
That concludes the issues that are more or less experiment-specific. In the next blog post in this series, I’ll discuss common statistical pitfalls that can affect experimental, quasi-experimental, and observational studies alike.
I recently had two conversations with third-year PhD students about how to do research. Both of them started with the students asking me if I thought it was a good idea to find a dataset first and then think of a research question. My answer was a resounding “no”. Given the difficulties graduate students have in figuring out how to go about research, I thought I would share my suggestions in a blog post. These are based on wisdom my advisers passed onto me and my experience in grad school in general, and I claim no credit for inventing any of them.
It’s tempting to find a cool dataset and then think of a question you can answer with it because one of the most disappointing experiences of research is coming up with a great research design and not being able to find the data. But it doesn’t work. Empirically, I have only heard of one professor even trying this approach – he was collecting lots of industry data but didn’t have a question in mind yet. There may be individuals who are experienced and talented enough to take this approach – he was a tenured professor at Harvard – but most of us mere mortals shouldn’t expect to be successful in this way and all professors I’ve ever spoken to about this actively discourage this method. Also, my conversation with this professor took place about ten years ago, and there’s still no working paper based on the data he was collecting, so maybe it didn’t work well for him either.
Besides being empirically unpopular among successful professors, why doesn’t the data-driven approach work? I think it’s just too constraining. There are thousands if not hundreds of thousands datasets out there and limiting yourself to one significantly reduces your choice of research questions. So it’s kind of like trying to win the lottery – it’s possible that you will pick a dataset that will lead to an interesting research question, but it’s not likely. Moreover, many datasets are collected or put together for particular purposes, so you may find it difficult to divert your mind from the most obvious uses of the data, which have probably already been done.
So what should you do instead? Start with a big-picture research question. You can do this by thinking about what got you interested in economics in the first place (or whatever it is you’re studying), by reading the news, by thinking about modern social problems and concerns, or by reading academic overview articles, such as those in the Journal of Economic Literature or Journal of Economic Perspectives. I do not recommend looking for research questions in non-review academic articles (see post here). Make sure your question is big enough by answering “Why is this question important?”.
Once you have a big-picture question, think about a few smaller related research questions that you can try to tackle, i.e., ones that could actually become academic articles. Make sure that you can also answer “Why is this question important?” for each of them. Then write down the ideal “experiment” or quasi-experiment that would be needed to answer each question. Be creative and don’t think about what is feasible at this stage.
Next comes the grueling part – actually looking for settings that come as close to your ideal setting as possible. Brainstorm what could be out there. Consider whether you could run a lab or field experiment. Ask your classmates if they’ve heard of anything. This stage takes time and effort, and this is where a lot of projects stall. I’ve been interested in estimating the effect of economic uncertainty on investment for years (along with hundreds of other economists, I’m sure), but alas I have not come across any good quasi-experiments (one can of course do structural estimation, a stylized laboratory/field experiment, or theory, but these are not the roads I’ve chosen). But if I ever come across the right dataset, I have a great question already!
Finally, once you’ve identified the setting, look for data. Again, brainstorm what could be out there. Then Google around, ask your advisors and peers, contact government officials and private companies until you’re told to go away or given the data. Consider whether you can collect your own data. Yes, projects will fail at this stage too, and it will be very sad. You’ve spent all this time thinking of a question and the setting, you found the perfect natural experiment, but the data just aren’t there or the organization that has it won’t give it to you. Give yourself a big hug and move on. All that effort was not wasted – you’ve thought critically about research questions, you’ve refreshed yourself on methodology, you’ve learned a bit more about the world and what data are/are not out there. And you have a well-developed research question in case data become available in the future or you decide to collect your own.
If your project makes it past this stage, now is the time to check whether it has been done already by doing a thorough literature search. Again, students can get very disappointed to come all this way and find out that the paper they’re thinking of writing has already been written. But I view it as a positive sign in student development, especially if the existing paper published well. It means that you’re thinking like a good researcher and it’s a good sign that you’ll be able to come up with an original question in the not-too-distant future.
To summarize, here’s a template you can fill out for each research question:
Big picture question:
Specific research question:
Why are these questions important?
Ideal setting for answering research question:
Possible actual settings for answering research question:
Possible datasets for answering research question:
This process isn’t easy. You should expect the vast majority of research questions to “die” along the way (or, if you don’t like the idea of permanently giving up, put the ones that stall “on the back burner”). But I think this is still the best way to get started. The good news is that it gets easier. As you get more experienced and more familiar with your field, questions will pop up more naturally and knowing whether they are answerable will be easier. You will think of new related questions while working on an existing paper. They might even involve data you’ve already used. But it takes time and effort to get there. Keep up the good work!
Now that I’ve written about why randomized controlled experiments are so great, it’s time to talk about some of the common ways in which they can go wrong. But first I’d like to make an important caveat: finding potential flaws with any research, even randomized controlled experiments, is actually pretty easy. I haven’t come across any study that couldn’t be criticized on one or more grounds. So with the power to criticize also comes great responsibility: don’t use it to dismiss results you don’t like. Don’t selectively apply these criticisms to some studies and accept the findings of others that could be subject to similar criticisms. Use the knowledge wisely.
The main concern with randomized controlled experiments is the question of “external validity”. Sure, you’ve shown that something works in the laboratory or in a carefully controlled setting, but does it work in the real world? If people in the laboratory are different from those who will be subject to the treatment in the real world or if people (including those administering the treatment) behave differently in the experiment.
For example, maybe you run a clinical trial for a drug and only recruit men to participate in the trial. Will the drug work as well on women? Will there be different side effects for them? For a long time, clinical trials frequently omitted or under-enrolled women, although that is now changing. Or maybe you enroll obese individuals in a weight-loss trial but only includes ones without other health problems like diabetes. But once the drug goes to market, it may be prescribed to all types of obese individuals, and potentially have different effects than what you observed in the laboratory. Or maybe the nurses working on your trial are really good at getting patients to take the drug on time, but in the real world people forget to take it and you observe much lower effectiveness.
External validity is a potential problem with all experiments, not just clinical trials and not just stylized laboratory experiments. As long as people know they are part of an experiment, they may change how they act (maybe to make the experimenter happy, maybe to hide socially unacceptable views or behaviors, or maybe because they don’t take the experimental treatment as seriously as they do things in the real world). This is known as the Hawthorne effect, and it’s essentially impossible to rule out unless your subjects do not know that they are being studied.
Finally, external validity can also be a concern if you’re trying to say something about high-stakes decisions by running a low-stakes experiment. For example, you’re open to this criticism if you want to say something about how people save for retirement and you either run a hypothetical choice experiment or an experiment with low stakes (because who can afford to run an experiment where tens of thousands of dollars are at stake?). In some cases, the low-stakes findings survive in a high-stakes environment, but in others they don’t.
The bottom lines is that the most convincing experimental conclusions are those that are based on a representative population that faces stakes similar to what they would be in the real world, and where the experiment closely resembles real-world conditions (including individuals being unaware that they are part of an experiment).
(click here for part 1)
I was going to write more about quasi-experimental methods, but then I realized why these are usually discussed last in econometrics/empirical methods books. In order to see why quasi-experimental methods are useful, it’s first helpful to understand why experiments are good and where non-experimental methods can falter. Of course, experiments have drawbacks too and non-experimental non-quasi-experimental methods can produce valid results under some conditions. But we’ll talk about all that later.
When properly designed and executed, an experiment will easily allow you to estimate the causal effect of a randomly assigned condition (“treatment”), X, on any outcome Y: effect of a job training program on employment, effect of teacher training on student outcomes, effect of a drug on mortality, effect of dog ownership on health, etc. At a very basic level, a valid experiment only requires two things: (1) a control group (let’s say one composed of people) that is not exposed to the treatment X and (2) random assignment to treatment. This kind of setup is called a “randomized controlled experiment”. In this case, you can just compare the differences in Y’s in the two groups to arrive at the causal effect of X (divide by differences in X between the two groups if X is continuous).
Why do you need a control group? Because things change over time. Over longer time scales, people age, get sick, get better, gain/lose weight, get/lose jobs, learn/forget things, move, and generally act in ways that could affect Y even without X. Over shorter time scales, people might be affected by the time of day, by the temperature, by changes in their mood, by the building into which you bring them, or even by the fact that they are taking part in an experiment. If you don’t have a control group, it’s essentially impossible to tease out the effect of X on Y from the influence of other forces on Y. Most researchers know this and use a control group to ensure that the estimated effect of X on Y is not confounded by anything else happening to the treated group.
One exception I found (there surely are others) is this study, which recruited 4-10 month old infants and mothers for a sleep lab study of “crying it out” (a method by which some parents teach babies to fall asleep on their own by letting them cry and learn to self-soothe). All mothers were instructed to let the babies “cry it out” when falling asleep, so no control group was used. Even after the babies stopped crying on the third day, their cortisol levels were elevated, suggesting that they were stressed out. As this Slate article points out, it is impossible to know whether the babies were stressed out by exposure to “cry it out” (as the research article claims) or by the fact that they were in a foreign environment – the sleep lab. The absence of a control group that faced the same conditions without being exposed to “crying it out” thus fundamentally limits this study’s ability to say anything definitive about how crying it out affects stress levels.
Now you might say, “Sure, for some things, a control group that’s part of the experiment is important. But for outcomes like mortality or income, why can’t we just compare outcomes of people who enrolled in the experiment to outcomes similar people who are not part of the experiment? That seems easier and cheaper.” The problem with this approach is that it’s hard to be sure you’re comparing treated “oranges” to untreated “oranges” as opposed to treated “oranges” to untreated “apples”. Even if you collect information on hundreds of individual characteristics, it’s hard to be sure that there aren’t other characteristics that differ between your experimental treatment group and your real-world control group. And those unobserved differences might themselves influence outcomes. For example, maybe the group that signed up for your job training experiment is more (less) motivated and would have gotten jobs at higher (lower) rates than the real-world control group even if they didn’t take part in your experiment. Or maybe the experimental group is healthier (sicker) in ways that you aren’t capturing and they would have lived longer (died sooner) than the real-world control group. For these reasons, you should always be suspicious of “experiments” where the control group is non-existent or isn’t drawn from the group that signed up for the study.
Finally, why can’t you let people decide themselves whether to be in the control group or not? For the same reason that your control group needs to consist of people who signed up for your experiment – if you don’t assign people to the treatment group randomly you can’t be sure that the two groups – treatment and control – are alike in every single way that affects Y except for X. It could be that people who sign up for the treatment are more desperate for whatever reason, and desperate people may behave differently in all sorts of ways that then affect all sort of outcome. Or it could be that they are more adventurous, which again could affect them in all sorts of ways. Or they eat more broccoli/cheese/ice cream and you didn’t think to ask about that. If there are any such differences that you don’t observe and control for adequately, you can never be sure that differences in Y between the two groups are solely due to the treatment X.
But what if you’re ABSOLUTELY SURE that there’s nothing different between your treatment and non-randomly selected control group that could affect Y other than X and other things you’ve controlled for? The thing is, you can never be sure, otherwise you probably wouldn’t be running an experiment. To be absolutely sure would imply that you know everything about how Y is determined except for the effect of X on Y. And there’s just no way that we know that much about anything that we’d want to study (at least as far as social science and medicine are concerned). But if you have a good counter-example, email me!
That was a long one! Next time, we’ll talk about how even randomized controlled experiments can go wrong.
You just read a fascinating article suggesting that drinking a glass of red wine is equivalent to spending an hour at the gym, that morning people are better positioned for success, or that gun control reduces policy shootings. Let’s pretend that instead of immediately posting the article on your favorite social media website (which I’ll admit I’m sometimes guilty of myself), you instead wonder if the scientific methods behind the study are sound and if you can draw conclusions about cause and effect. How do you figure that out?
Unsurprisingly, it can be really hard. Alex Edmans, a Professor of Finance, has a recent excellent blog post about separating causation and correlation. After seeing lots of (often subtly) flawed research shared on social media, I’ve also been planning to write a guide to separating solid findings from not-so-convincing ones. It was going to be a cool flowchart that you can make your way through, with explanations along the way about why each step matters. But after having it on my “fun” to do list for months, I realized that the only way this flowchart will ever see the light of day is if I write it as a series of blog posts and then summarize things in a flowchart. This is part one.
The first question to ask when evaluating a study is whether it is based on an experiment (where researchers manipulated something, either in a laboratory or in the “field”) or is observational (where researchers collected some data). Experiments may be more reliable if done correctly, but they are not panaceas: there are many ways experiments can go wrong and a big issue is whether experimental findings translate to the real world. But we do evaluate experiments slightly differently from observational studies, so this is the first fork in our imaginary flowchart.
Let’s start with observational studies (this will repeat Alex’s post a bit, but I think it’s useful repetition). The first question to ask yourself is whether the researchers used any “quasi-experimental” variation to come to their conclusion. In general, studies that do are more credible than studies that do not. For example, sometimes researchers get lucky and stumble on a seemingly arbitrary rule that separates subjects (firms, individuals, regions) into two or more different groups. Certain scholarships are given to individuals who meet a specific cutoff on a standardized test score. Because it’s very difficult to control your score down to the point, people right below and right above the cutoff should be very similar in ability, except that the ones right below the cutoff did not get a scholarship and those above the cutoff did. Voila – you can study the effect of getting a scholarship on, for example, college completion, without worrying whether people without scholarships are fundamentally different from people with scholarships!
In order for this approach – called a “regression discontinuity” – to work well, (a) it must be impossible, or at least very difficult, for entities to manipulate whether they’re right below or above the cutoff and (b) researchers must not stray so far from the cutoff that the similarity of subjects below and above the cutoff starts becoming questionable. Ultimately, whether these two conditions hold depends on the context and how narrow of a range around the cutoff researchers select. For example, it’s hard to control whether your SAT score is 1480 or 1490, but scoring 1300 versus 1400 is unlikely to be mostly due to chance. In other contexts, small manipulations are easy to do – for example, many firms have enough flexibility in accounting to turn slightly negative earnings into slightly positive earnings, making a regression discontinuity approach not-so-credible in this setting.
In the next post in this series (which may or may not be the next post chronologically), we’ll talk about other kinds of quasi-experimental variation. Bonus points to people who email me an article about a study they want scrutinized!