A rant about and an ode to gen AI: stories of triumphs and pitfalls

People have told me I’m optimistic about generative AI, in the sense that I don’t think it’s going to have a direct catastrophic impact on the labor market. Below I will give you illustrative examples of why I think that’s roughly the right view. But before I tell you the bad, let me start with an example of generative AI at its best. Note that details of both the good and the bad have been changed (with the help of generative AI!) to preserve the anonymity of individuals involved.

In good hands, gen AI can give skills to people that would have taken them years to develop. Many professors (including myself) now use it daily and are getting excellent results. But we’re usually not worried about OUR job market outcomes, so let me give you a more relevant example.

I have an undergraduate research and general assistant who’s not an economics major. I’ve tasked him with compiling structured summaries of recent work by economists in a particular subfield I was exploring—a task that would be hard (and, at the very least, time-consuming) even for an advanced economics major, but for a sophomore in an unrelated field it would be effectively impossible. But with the help of generative AI, producing high-quality summaries of papers is much easier. My assistant can also pull and analyze data from the web despite having limited programming experience.

And, of course, there are probably many examples of excellent AI use that I don’t even notice, precisely because they’re excellent.

Next are examples of bad gen AI. I’m not sure if these were bad prompts or poor checking. Probably a bit of both but of course the ultimate problem is not checking the output carefully. The bottom line ends up being that gen AI is not a very powerful tool without supervision, and potentially a counterproductive one when poorly supervised.

Note that I have examples at multiple levels of seniority, so this is not just about 20-year olds using it poorly. And the undergraduates in question were not bad students—they had excellent grades and other promising attributes.

An undergraduate student used gen AI to generate a summary and action list from a meeting. AI made up parts of the action list, adding items that were never discussed and that didn’t make sense. This kind of stuff has happened more than once.
An undergraduate student used AI to make graphs illustrating economic trends in a particular region. The graphs looked great. But when I asked for data sources, she returned many links that did not work. After several iterations of me asking the student to re-do the task and check each link, she returned an incomplete set of (now working) links. Upon further investigation, it turned out that some of the data that appeared in the original graphs were paywalled, so it wasn’t clear where the AI pulled the data from. Other data was apparently “obtained” by generative AI reading graphs from pdf, which is obviously not great for high-quality work.
I asked another undergraduate student to make a nice-looking slide template for presentations. He returned something that was so hideous that I’m pretty sure he didn’t even look at it herself before sending it to me.
A PhD student wrote up some IV results. The first-stage F was around 4 and the instrument flipped signs across subsamples. The description of the results called it “strongly predictive.”
A PhD student asked gen AI to micro-found a reduced-form result on minimum wages and small business entry. It produced smart-sounding nonsense with lots of equations that took me forever to wade through. The nonsense also didn’t match the empirical framework, which is the point at which I figured out that gen AI didn’t just edit the writing but wrote all of it.
A literature review in a manuscript I handled that materially misrepresented what the papers said (e.g., saying that a paper found large negative employment effects of a labor market regulation, whereas it actually reports precisely estimated nulls). It was pretty clear to me that no one had read the papers being summarized.
A paper that stated that a dataset had been downloaded from a government website, whereas upon further investigation it turned out that the authors had constructed the dataset themselves.

You might say that these kinds of issues existed before, and now generative AI has taken them to a new level. That’s exactly the point—garbage in, garbage out. I have pre-gen-AI stories of sloppiness and bad output. Just like before, human judgment is needed to produce good results.

Next, you might claim that generative AI will get even better, so that maybe I don’t even need a research assistant but I can have the AI compile the research summaries for me (currently, it hallucinates too much for that approach to be reliable). That is, of course, possible, but I suspect there are diminishing marginal returns, in the sense that, while we’ll of course see the technology improve, it’s not going to be as transformational as it has been so far. Human supervision will still be required.

Does this mean that I’m a 100% optimist? No. I think the biggest risk of the technology is that schools are not going to handle it well and it’s going to hinder human capital formation among relatively disadvantaged students (and maybe even advantaged ones that are unlucky enough to be in schools with poor supervision of generative AI). We’ve already seen this with COVID, where poorly implemented remote learning put many disadvantaged kids behind, and I don’t have any reason to think that schools will, on average, handle gen AI any better. So we may end up with millions of kids who use gen AI at the expense of developing their own reasoning abilities. And that to me is the real danger of AI.