In-depth reporting and analytical commentary on artificial intelligence regulation. No legal advice.

New York Times shifts focus of AI copyright case from output to input, surprisingly says Exhibit J (regurgitation of articles) no longer matters

Context: No exhibit to a copyright infringement complaint ever received as much attention as Exhibit J to New York Times v. Microsoft & OpenAI (PDF): 100 examples of NYT articles being reproduced, in response to very specific prompts, by ChatGPT. In a March 12, 2024 article, law.com (published by ALM) described it as “the crux of the [NYT’s] copyright-infringement lawsuit [against Microsoft and OpenAI]” and furthermore wrote: “Attorneys tell ALM publication and Law.com affiliate Legaltech News that Exhibit J is the most striking part of the complaint, and likely what elevates this suit above its predecessors.”

What’s new: In response to a letter by which OpenAI’s lawyers ask the court to obligate the NYT to come clean on how those examples of regurgitation were produced (including how many failed attempts were made before they got there) (May 24, 2024 ai fray article), the NYT’s lawyers told Judge Sidney H. Stein on Tuesday now says “The Times will not present Exhibit J to the jury.” That statement is nuanced elsewhere in the filing: “The Times does not intend to rely on Exhibit J at trial so long as OpenAI complies with its discovery obligations.”

Direct impact: What the NYT’s lawyers seek to avoid here is discovery into the making of Exhibit J. But instead of a definitive and irrevocable commitment not to rely on it at trial (which would also involve the question of what to do about references to Exhibit J in the original complaint), the NYT’s lawyers leave the door open. If Open AI did not “compl[y] with its discovery obligations,” the court (not a party) would have to make that determination and enforce the law. The NYT makes it sound, however, like it could just use Exhibit J in self-defense if it, in its sole discretion, arrives at the conclusion that OpenAI wasn’t sufficiently forthcoming. That’s not how litigation works.

Wider ramifications: Regardless of whether the declaration of an intent not to use Exhibit J (unless the NYT simply decides otherwise) disposes of the discovery dispute, the plot is thickening that OpenAI’s lawyers were correct when they wrote in their March 18, 2024 reply in support of a motion to dismiss that “[the NYT] has changed its story” to the effect that “this case is fundamentally about inputs and not outputs“: it’s about training language models, not about the fear that NYT readers would cancel subscriptions because of free-of-charge access via ChatGPT prompts. Another question here is whether the NYT secretly believes that some hypothetical jury members or people close to them may breach the rules and look up internet coverage of the case, in which event Exhibit J would come up. The NYT’s litigation strategy is quite PR-oriented.

Here’s the NYT’s latest letter to the court (filed on Tuesday):

The second section of the letter is the one related to the discovery dispute over the making of Exhibit J. The qualified intent not to use it at trial (if the case even goes to trial, which will not be case if it’s thrown out on summary judgment) is stated in different ways in that second section.

Essentially, the NYT now says it just had to put that Exhibit J together in order to have a sufficiently specific basis for bringing a complaint:

“The Times was forced to create this exhibit because OpenAI has repeatedly refused to tell the public what works were used to train its models.”

What an understatement of the most likely original intent. Copyright infringement is shown by showing copying. And a fair use defense is overcome not only but also by proving a negative impact of the derivative works on the market opportunity for the original works (fourth statutory factor).

Apart from the NYT not even ruling out that it may show Exhibit J to the jury anyway (all it takes would be for the NYT to declare itself dissatisfied with OpenAI’s discovery responses, and it will probably never be happy about them anyway), it’s clear that Exhibit J was (and may ultimately still be) intended to take center stage. And now they would have the court (and observers like us) believe that all they really want is to know what material was processed by OpenAI in the training of its language models, and they’ll then make a copyright infringement argument on that basis as opposed to showing that significant passages of their copyrighted works are reproduced by ChatGPT.

After the NYT’s lawyers filed their opposition to OpenAI’s motion to dismiss parts of the case (March 17, 2024 ai fray article), OpenAI’s lawyers replied in support of their motion and described the NYT’s new legal strategy as follows:

“The New York Times has changed its story. No longer is this case about a fundamental threat to journalism itself, on the mistaken premise that access to Times articles through ChatGPT will drive readers to cancel subscriptions in droves or imperil the newspaper’s very existence. [reference to Exhibit J] Now that OpenAI has highlighted the falsity of that core narrative in the Complaint, the Times has charted a new course.

“What the Times’s opposition belatedly clarifies is that this case is fundamentally about inputs and not outputs. The focus now is on the use of some Times articles among billions of other texts and trillions of other words to train OpenAI’s large language models. This “lead claim” about inputs runs headlong into decades of case law affirming that it is lawful to use pre-existing content in the service of creating a new technology for a new and different service. […]

“The Times explains its backtracking by saying that its allegations regarding the appearance of passages from Times articles in ChatGPT outputs were just part of an ‘investigation’ into what content was used to train OpenAI’s model. […] But as OpenAI explained in its opening brief, none of that was a secret that required any kind of investigation at all.”

ai fray wouldn’t have quoted a mid-March court filing in late May if the NYT’s lawyers’ own latest filing didn’t serve as an indication that OpenAI’s lawyers had a point.

Exhibit J was (and maybe secretly still is) supposed to be the NYT’s key output-related evidence. It was going to be used to portray ChatGPT as a copyright infringement machine of unprecedented proportions.

This copyright case is indeed a “different animal” if the focus has shifted ever more clearly to inputs and the black box part of how LLMs are trained. It will then come down to the question of whether LLMs, which obviously can’t be equated to human readers, are allowed to read publicly available material in order to draw inferences.

The NYT’s lawyers obviously take the position that the case will go to trial. But that is not a given. A settlement may indeed be unlikely as the NYT has invested heavily in this litigation based on a decision to reach for billions in the bush instead of having millions in the hand (May 5, 2024 ai fray article). And the motions to dismiss that OpenAI and Microsoft brought wouldn’t dispose of the entire case even if granted in full. But the most important stage in this litigation will be summary judgment, particularly on the fair use defense. If the NYT lost that part, it would have to appeal to the Second Circuit and possibly on to the Supreme Court in an attempt to bring the (copyright part of the) case back to life. With a clearer-than-ever shift of focus from outputs to inputs, the probability of the court deciding on summary judgment that OpenAI’s use of the NYT’s asserted copyright works is fair use has increased substantially.