Millions in the hand or billions in the bush: will negotiation, litigation or legislation give the best answer to the AI copyright question?

Context: This week again put the differences between publishers’ approaches to AI on display. Various major media companies have already granted copyright licenses to AI providers, most recently (last Monday) the Financial Times (April 29, 2024 Financial Times article). Near-simultaneously, a group of U.S. newspapers belonging to a hedge fund filed a copyright infringement lawsuit against Microsoft and OpenAI (April 30, 2024 ai fray article) that will presumably be consolidated with the New York Times case pending in the same district, in which the question of whether NYT lawyers might use knowledge obtained from confidential litigation documents to influence the vaunted newspaper’s future reporting on the case (May 4, 2024 ai fray article).

What’s new: This article looks at the AI copyright situation in strategic and probabilistic terms. It’s just an effort to understand different organizations’ agenda.

The FT was not the first and presumably will not be the last major media company to enter into a content license agreement with a major AI provider. Previously, the Associated Press (July 13, 2023 announcement), Europe-based Axel Springer (which also has a presence in the U.S. through Business Insider and Politico) (December 13, 2023 announcement) and France’s Le Monde (March 13, 2024 announcement (in French)) signed similar contracts.

“Trust but verify” (as Ronald Reagan used to say) describes the FT’s approach. They decided to do the deal, but they also made it clear they would keep watching the space and see how things work out.

There are three approaches by publishers:

negotiated agreements with AI providers;
copyright infringement litigation (sometimes with claims that are not strictly copyright, such as DMCA or trademark dilution); and
lobbying for proposals such as the Generative AI Copyright Disclosure Act (Wikipedia page).

The first option rules out the second: if you have a deal, you can’t sue until it expires. The aim of litigation is to reach settlements that some people apparently believe will likely give them a better deal. And all publishing companies, whether they enter into agreements or sue or simply wait, are free to lobby lawmakers about the subject.

License agreements of the kind that have been announced appear to generate royalties for publishers in the millions of dollars per year. By contrast, the NYT has made it clear that it believes the appropriate damages figure is in the billions. That is a huge discrepancy, but both its reasons and its consequences are not alway fully understood. Misunderstandings arise if one doesn’t understand the probabilistic nature, litigation economics, and what could be some companies’ political endgame.

One pretty common misconception is that AI providers allegedly concede infringement by paying for licenses, and that this weakens them in litigation. But the amounts that are apparently paid for those AI content licenses are nowhere near the level of the NYT’s damages claims. They are low enough that AI providers, who almost certainly put clauses into those agreements that clarify they do not concede any infringement, can truthfully argue that

they’d rather avoid litigation expenses (if the NYT case went to trial, which is at least doubful as discussed further below, defendants will easily spend tens of millions, if not even north of $100 million, enough to pay for dozens of content license deals) and
they can furthermore argue that they like to be on good terms with media companies for all intents and purposes.

Even if one assumed (incorrectly) that infringement has been conceded, deals like the one between OpenAI and the FT would not really help the NYT:

The key defense — fair use — is going to be resolved on summary judgment: by the judge, not a jury. Judges, unlike laypersons, won’t attach much importance to “without prejudice” license agreements. They know how often they actually encourage litigants to just enter into details like that without that constituting any kind of admission. It’s about litigation economics in the end.
If the NYT overcame the fair use defense, the case would end up before a jury, but those media companies that entered into content license agreements in the range of millions (rather than suing for billions) do far more harm to the NYT’s pursuit of damages than anything.

In complex, high-stakes commercial litigation, many things can happen. But the Supreme Court made it clear in its Oracle v. Google decision (and elsewhere) that fair use is for judges, not juries, to decide. If that defense succeeds on summary judgment, then there won’t be a copyright trial (maybe just one on other claims, such as trademark dilution and/or unfair competition). The NYT can then appeal, and needs to prevail on appeal to bring the copyright part of its case back to life. If the judge does not consider the training of LLMs with copyrighted material fair use, then there’ll be a copyright trial, and unless a Daubert motion by Microsoft and OpenAI requires the NYT to lower its damages claim, the jury may indeed be presented with a huge damages claim. And the jury could hand down a major verdict. But then the question of fair use would also end up before the appeals court and possibly go up to the Supreme Court. And any damages award could be adjusted on appeal.

So why do the NYT and the hedge fund behind some other newspapers (Alden Capital) apparently think that the FT and others are stupid to take OpenAI’s money? Why do they believe it’s worth litigating?

In many commercial litigations, a settlement ahead of or during trial is fairly likely if the defendant otherwise runs the risk of a major damages award. For instance, last year the same major law firm that is the NYT’s lead counsel against Microsoft and OpenAI, Susman Godfrey, secured a §787.5 million settlement for its client Dominion Voting Systems with Fox News (Susman Godfrey announcement). Here, the NYT’s lawyers clearly have a litigation strategy that is all about a narrative for the jury: they’ll point to how valuable Microsoft and OpenAI are, how much money there is in play, and above all they’ll capitalize on fears of AI killing jobs (and potentially not only jobs). They’ll try to blow such unintentional issues (which OpenAI is working hard to address) such as hallucination and regurgitation out of proportion.

The possibility of a huge damages award is not the only driver of pretrial settlements. Injunctive relief can also be very powerful and is sometimes even more important. In a hypothetical scenario in which publishers would be totally free to withhold content licenses from AI providers, the NYT may hope that prices would go up substantially. But those who ask for too much would still not get anything, as AI systems don’t need to be trained with 100% of all content that exists and a lot of which is duplicative.

The NYT would definitely settle if the price is right. But when you bring in a couple of law firms, and one of them is a litigation powerhouse like Susman, a settlement for a few million dollars means you lost the case in economic terms. Would the NYT take $200M? That’s quite possible. But Microsoft and OpenAI know that it would just invite the next media company to bring a similar lawsuit.

What everyone needs is legal certainty. When some seek billions while others take millions, it’s clear that the latter group believes copyright infringement litigation is not very likely to succeed. But there are now more than 20 U.S. AI copyright lawsuits pending, and over time there may be significant litigation activity in Europe (where there is no fair use doctrine like in the U.S.). That’s the opposite of legal certainty.

The NYT’s calculus may be that if its lawsuit fails, it has another bite at the apple: publishers can tell Congress that journalism is allegedly in jeopardy unless the law is changed and AI providers are required to license the copyrighted content with which they train their language models.

But the defendants may view it the same way: if the NYT succeeded, the AI revolution would face a new kind of threat. And if it’s strictly about economic policy, the potential of AI to contribute to economic growth is a major reason not to let copyright law get in the way of the most important technological revolution of our times.

This is the largest and most important copyright dispute ever between copyright owners and the technology industry, but not the first. Individual copyright holders and, to an even greater extent, collecting societies first had to deal with companies making photocopiers, then with digital storage media, and more recently with social networks. If the past is any indication, it will take many years for all of this to play out in court, and the related lobbying may still be ongoing, in one form or another, in a few decades.