Opinion

NYT’s edge in suit against OpenAI

by Noah Feldman

The lawsuit filed by the New York Times against OpenAI and Microsoft for copyright infringement pits one of the great establishment media institutions against the purveyor of a transformative new technology. Symbolically, the case promises a clash of the titans: labor-intensive human newsgathering against pushbutton information produced by artificial intelligence. But legally, the case represents something different: a classic instance of the lag between established law and emerging technology.

Copyright law, a set of rules that date back to the printing press, was not designed to cover large language models like ChatGPT. It will have to be consciously evolved by the courts — or amended by Congress — to fit our current circumstances.

The key legal issue in the case will be the doctrine known as fair use. Codified in the Copyright Act of 1976, fair use tells you when it’s acceptable to use text copyrighted by someone else. The fair use test has four factors. Educational and nonprofit uses are more likely to be found to be fair use. Creative work gets more copyright protection than technical writing or news. The amount of the work that has been copied matters, as does the centrality to the copied work of the material that’s been copied. And perhaps most important for the Times’ lawsuit, courts also consider whether the copying will harm the present or future market for the work copied.

Once you know the law, you can guess roughly how the legal arguments in the case are going to go. The Times will point to examples where a user asks a question of ChatGPT or Bing and it replies with something substantially like a New York Times article. The newspaper will observe that ChatGPT is part of a business and charges fees for access to its latest versions, and that Bing is a core part of Microsoft’s business. The Times will emphasize the creative aspects of journalism. Above all, it will argue that if you can ask an LLM-powered search engine for the day’s news, and get content drawn directly from the New York Times, that will substantially harm and maybe even kill the Times’ business model.

Most of these points are plausible legal arguments. But OpenAI and Microsoft will be prepared for them. They’ll likely respond by saying that their LLM doesn’t copy; rather, it learns and makes statistical predictions to produce new answers. If I read an article in the New York Times and then write a Bloomberg opinion column on the same topic, that isn’t copyright infringement, even though I may have learned a great deal from the Times piece and relied on that information to form my own opinion. For this reason, many copyright experts have been theorizing that it cannot be a copyright violation for an LLM to learn from existing online material, even if it’s under copyright. The defendants can also be expected to argue that news consists of facts and should therefore be treated more permissively than creative material.

But Microsoft and OpenAI will have a hard time refuting the final point — that their product, which relies on newsgathering businesses like the Times, will harm those businesses. ChatGPT and other LLMs cannot go out into the world to gather and vet new facts. They are restricted, for the foreseeable future, to “learning” from information that has already been published.

It follows that for LLMs to provide useful information, someone else — that is, a human LLM — must first gather the information, ascertain that it is accurate, and publish it. This is the essence of newsgathering. It’s costly to get it right.

What’s more, to know that we can rely on news, we need it to come from an institution that we can trust — one with a track record and a reputation it has a business interest in upholding. Otherwise, we would not have news. We would have an iterative echo chamber untethered from reality.

Here is where the fundamental public interest in the maintenance of the free press becomes relevant to the fair use question. If you can get information more cheaply from an LLM than from the New York Times, you might drop your subscription. But if everyone did that, there would be no New York Times at all. Put another way, OpenAI and Microsoft need the New York Times and other news organizations to exist if they are to provide reliable news as part of their service. Rationally and economically, therefore, they ought to be obligated to pay for the information they are using.

Fitting this powerful public interest into copyright law won’t be simple for the courts. Literal copying is the easiest form of infringement to punish. In ordinary legal circumstances, if LLMs change words sufficiently to be summarizing rather than copying, that weakens the Times’ case. Yet summaries in different words would still be sufficient to kill the Times and similar organizations — and leave us newsless.

The courts will need to be attuned to all this. If they don’t get it right, Congress will have to act. The news infrastructure is already tottering. If we destroy it altogether, democracy will be the loser.

Noah Feldman is a Bloomberg Opinion columnist and A professor of law at Harvard University.