I recently had two conversations with third-year PhD students about how to do research. Both of them started with the students asking me if I thought it was a good idea to find a dataset first and then think of a research question. My answer was a resounding “no”. Given the difficulties graduate students have in figuring out how to go about research, I thought I would share my suggestions in a blog post. These are based on wisdom my advisers passed onto me and my experience in grad school in general, and I claim no credit for inventing any of them.

It’s tempting to find a cool dataset and then think of a question you can answer with it because one of the most disappointing experiences of research is coming up with a great research design and not being able to find the data. But it doesn’t work. Empirically, I have only heard of one professor even tryingthis approach – he was collecting lots of industry data but didn’t have a question in mind yet. There may be individuals who are experienced and talented enough to take this approach – he was a tenured professor at Harvard – but most of us mere mortals shouldn’t expect to be successful in this way and all professors I’ve ever spoken to about this actively discourage this method. Also, my conversation with this professor took place about ten years ago, and there’s still no working paper based on the data he was collecting, so maybe it didn’t work well for him either.

Besides being empirically unpopular among successful professors, why doesn’t the data-driven approach work? I think it’s just too constraining. There are thousands if not hundreds of thousands datasets out there and limiting yourself to one significantly reduces your choice of research questions. So it’s kind of like trying to win the lottery – it’s possible that you will pick a dataset that will lead to an interesting research question, but it’s not likely. Moreover, many datasets are collected or put together for particular purposes, so you may find it difficult to divert your mind from the most obvious uses of the data, which have probably already been done.

So what should you do instead? Start with a big-picture research question. You can do this by thinking about what got you interested in economics in the first place (or whatever it is you’re studying), by reading the news, by thinking about modern social problems and concerns, or by reading academic overview articles, such as those in the Journal of Economic Literatureor Journal of Economic Perspectives. I do not recommend looking for research questions in non-review academic articles (see post here). Make sure your question is big enough by answering “Why is this question important?”.

Once you have a big-picture question, think about a few smaller related research questions that you can try to tackle, i.e., ones that could actually become academic articles. Make sure that you can also answer “Why is this question important?” for each of them. Then write down the ideal “experiment” or quasi-experiment that would be needed to answer each question. Be creative and don’t think about what is feasible at this stage.

Next comes the grueling part – actually looking for settings that come as close to your ideal setting as possible. Brainstorm what could be out there. Consider whether you could run a lab or field experiment. Ask your classmates if they’ve heard of anything. This stage takes time and effort, and this is where a lot of projects stall. I’ve been interested in estimating the effect of economic uncertainty on investment for years (along with hundreds of other economists, I’m sure), but alas I have not come across any good quasi-experiments (one can of course do structural estimation, a stylized laboratory/field experiment, or theory, but these are not the roads I’ve chosen). But if I ever come across the right dataset, I have a great question already!

Finally, once you’ve identified the setting, look for data. Again, brainstorm what could be out there. Then Google around, ask your advisors and peers, contact government officials and private companies until you’re told to go away or given the data. Consider whether you can collect your own data. Yes, projects will fail at this stage too, and it will be very sad. You’ve spent all this time thinking of a question and the setting, you found the perfect natural experiment, but the data just aren’t there or the organization that has it won’t give it to you. Give yourself a big hug and move on. All that effort was not wasted – you’ve thought critically about research questions, you’ve refreshed yourself on methodology, you’ve learned a bit more about the world and what data are/are not out there. And you have a well-developed research question in case data become available in the future or you decide to collect your own.

If your project makes it past this stage, now is the time to check whether it has been done already by doing a thorough literature search. Again, students can get very disappointed to come all this way and find out that the paper they’re thinking of writing has already been written. But I view it as a positive sign in student development, especially if the existing paper published well. It means that you’re thinking like a good researcher and it’s a good sign that you’ll be able to come up with an original question in the not-too-distant future.

To summarize, here’s a template you can fill out for each research question:

Big picture question:

Specific research question:

Why are these questions important?

Ideal setting for answering research question:

Possible actual settings for answering research question:

Possible datasets for answering research question:

This process isn’t easy. You should expect the vast majority of research questions to “die” along the way (or, if you don’t like the idea of permanently giving up, put the ones that stall “on the back burner”). But I think this is still the best way to get started. The good news is that it gets easier. As you get more experienced and more familiar with your field, questions will pop up more naturally and knowing whether they are answerable will be easier. You will think of new related questions while working on an existing paper. They might even involve data you’ve already used. But it takes time and effort to get there. Keep up the good work!

Want to be notified when I write a new blog post? Sign up here.

I don’t spam!