In-depth reporting and analytical commentary on artificial intelligence regulation. No legal advice.

Five proportionality considerations concerning New York Times’ copyright lawsuit over ChatGPT

Context: The New York Times Company sued Microsoft and OpenAI this week (Southern District of New York, case no. 1:23-cv-11195; previous article).

What’s new: This article is not about a new development in the case but the result of further reflection on the complaint.

Direct impact: Proportionality considerations will complicate the NYT’s pursuit of major leverage in five legal contexts: three of the four fair use factors and, should the NYT prevail on the merits, damages and access to injunctive relief. One particularly important proportionality consideration is the likely negligible percentage of ChatGPT queries that ordinary users perform and which result in an alleged infringement of the NYT’s copyrighted works.

Wider ramifications: In light of those proportionality considerations, this case is unlikely to jeopardize Generative AI in the end, but a jury trial is a bit of a coin toss and what the NYT is attempting here could have a chilling effect on innovation regardless.

What the New York Times Company can realistically achieve in its AI copyright action against Microsoft and OpenAI is not likely to be sufficiently superior over a negotiated outcome that it’s worth the time, money and energy that would probably be better spent on making the NYT’s business fit for the future rather than trying to slow down progress. That raises the question of whether the decision to embark on litigation was ultimately driven by emotions (such as fear of the future) and/or ideology.

Theoretically, one can always imagine a huge jackpot. The complaint almost suggests that a large part of the recent trillion-dollar increase in Microsoft’s market capitalization is attributable to copyright infringement, which appears to be the primary reason that Microsoft is the first-named defendant despite OpenAI being at issue. They also make it sound like the NYT would be entitled to more compensation for it than any other company on this planet. But before they can even present their case to a jury, they will have to overcome certain hurdles (different types of pretrial motions) as the judge will play a gatekeeper role. If and when this case goes to trial, the jury will hear both sides. And thereafter, the case will be in the hands of professional judges again for post-trial proceedings and final remedy determinations.

Apart from the skepticism expressed herein, ai fray does see the need forand predicts for the near futuretechnological improvements to Generative AI (GAI) that will feature (among other things) two critical safeguards:

  • GAI must not serve as a paywall circumvention mechanism.
  • Fake news or other forms of unreliable and inaccurate information should not be misattributed, much less to reputable publications such as the NYT.

The NYT has identified shortcomings and loopholes that ai fray is not attempting to defend. But its prayers for relief (i.e., the remedies sought) appear overreaching and contrary to the public interest in AI innovation.

GAI is already increasing people’s productivity in many areas. Its childhood diseases don’t justify shutting down GAI the way the NYT requests (such as a wide-ranging prohibition or the court-ordered destruction of data generated as a result of LLM training). We all must live with certain temporary problems in the name of progress. Standstill is not a serious proposal in this context.

The pursuit of space exploration cost lives. People may also have died when the New York Times building was under construction, or when they were delivering newspapers to people’s homes. By contrast, no one is going to die because of ChatGPT’s alleged imperfections in the copyright and trademark dilution contexts raised by the NYT’s complaint.

Copyright is not such an absolute property right that infringements of minor scale give rise to draconian remedies such as multi-billion dollar damages awards or devastating injunctions. Instead, things will be put into perspective every step of the way. There are five major respects in which proportionality considerations will matter during the course of this litigation (should it not be settled before certain junctures are reached).

Fair use factor #1 (purpose and character of use)

As the previous article already explained, U.S. copyright case law has been rather expansive and permissive in recent years with respect to what constitutes transformative use. If the question is direct copying, Google Books would be a much clearer case than ChatGPT, yet it was deemed permissible. Whether a type of use is transformative is, at least in part, potentially also a proportionality question: how much value (not in a strictly quantitative but usually rather qualitative sense) does the allegedly infringing product add? Here, the court might even decide on summary judgment (i.e., ahead of trial) that the biggest quantum leap in digital innovation since the invention of the computer itself is undoubtedly transformative.

Fair use factor #3 (amount and substantiality of portion used in relation to copyright work as a whole)

Copyright holders always try to argue that what is used constitutes a copyrighted work in its own right, in which case the “portion” is 100% of the base. But courts don’t view it that way. Here, the NYT will argue that virtually all of its articles were used for LLM training, but LLM training per se is not a copyright violation. The quantity at issue here may very well go beyond Google News-style snippets, but it is not clear how much of the NYT’s body of copyrighted works is really reproduced by ChatGPT in certain situations that are edge cases anyway.

Fair use factor #4 (effect of defendant’s use upon potential for or value of copyright work)

This is the economic part of fair use: is something harmful on the bottom line? There have been copyright cases in which right holders had far more of a benefit than they suffered a damage from the alleged infringements.

Here, one reasonable way of looking at it is to ask what percentage of all ChatGPT queries “copy” copyrighted works belonging to the NYT. The complaint does not say so. It has an exhibit with 100 examples in which, simply put, ChatGPT was instructed to produce an output of NYT material. Even 1,000 or 10,000 such examples wouldn’t change anything about the fact that the average usernot someone who is trying to create evidence on the NYT’s behalfwill rarely ever ask those types of questions. To the effect that ChatGPT can be used for paywall circumvention, the commercial effect is probably limited by the delays involved, and the NYT’s complaint may now have encouraged more people to try to elicit paywalled material, paragraph by paragraph, than who would have done so in the absence of the complaint.

In the same effects-centric context, the NYT may argue that misattributions of false information are harmful, but if ChatGPT puts out something that the NYT didn’t say, there likely isn’t any copyright infringement at issue in the given context, no matter how much the NYT’s lawyers may try to confuse the jury about it (which in turn depends on what the court lets them get away with).

If one focuses on the legally relevant economic question here, chances are that it is actually even beneficial to the NYT if ChatGPT (without misattribution of falsehoods, of course) points to that paper as a source. ChatGPT as a wholei.e., if we look at 100% of the queries, including the 99.999999% that don’t “copy” from the NYT’s copyrighted materialprobably is, to be fair, a problem for the media industry. It means that search engines will increasingly be not just a portal, not just a starting point, but quite often give answers that end users consider sufficient for their purposes. That may be the NYT’s real concern, but the impact of undoubtedly lawful activity (such as “reading” texts in order to acquire knowledge) is not a legally valid one.

Fair use decisions by judges and juries (there are often even disputes over who should make the related equitable determination of fact) are very hard to predict. There is a pattern, however, that U.S. courts have increasingly applied the concept to the benefit of software innovation.

Damages

The question of copyright damages won’t even be reached if the judge finds before or the jury determines after the trial that the use in question is covered by the fair use doctrine.

The problem for a copyright holder is, however, that unless the trial is bifurcated (one jury for the merits, one for the damages), a jury that considers the defendant liable but struggled with whether the conduct in question wasn’t actually fair use is unlikely to be inclined to award a huge amount. And even if a jury did it, an unreasonable damages award would probably not stand. Many U.S. damages verdicts (not only but particularly in patent cases, and that’s another field of intellectual property law) made news because they seemed huge, but were tossed or at least curtailed, often beyond recognition.

Injunctive relief

Intellectual property rights holders don’t easily win injunctions in the United States, unlike some foreign jurisdictions where a liability finding more or less automatically results in the grant of an injunction.

Again, the case may not even prove meritorious, but even if some kind of liability finding was made, that would still not guarantee an injunction. The destruction claim appears to be a long shot, possibly just meant to be a Damocles sword in hopes of extracting a more favorable settlement. But even a merely prohibitive injunction would involve a balancing of the equities and the question of whether the NYT can simply be compensated with money.

Here, again, there’s the important question of how many (in fact, negligibly few) ChatGPT queries that regular users (not the NYT’s lawyers, experts or employees) perform will actually raise any issues concerning the NYT’s copyrighted works as compared to all of the huge benefits that GAI brings, which is also a public-interest consideration (beyond the balancing of the private equities).

The NYT’s 100 examples of queries that allegedly result in copyright infringements are not ordinary use cases. GAI is about contextualization, and if the context is narrow enough, a certain kind of output may result. But users who use ChatGPT (be it via Bing or by other means) phrase questions differently, and especially more broadly. They don’t ask ChatGPT about what the NYT wrote. They don’t provide one paragraph and then follow up with a request for the subsequent one.

Also, this kind of litigation would take years before any remedies would be realistically enforceable. Whatever (if anything) the district court might order would likely be stayed pending an appeal. In the meantime, some of the issues may already go away.

To the extent that the NYT’s complaint raises issues that policy makers may also be concerned with, there is the important question of whether the problem is the tool itself or that it could be used in bad faith. Let’s leave that for another day, though.