New York Times triples down on OpenAI, Microsoft copyright case: seeks to go from 3 million documents to 10 million, up to 96 years old

Context: Judge Sidney Stein of the United States District Court for the Southern District of New York is currently working on a ruling on Microsoft’s and OpenAI’s motions to dismiss parts of the New York Times’s copyright infringement complaint (March 17, 2024 ai fray article). A dispute between the parties concerning the protection of confidential business information, which raises the interesting question of whether access to such material could influence the NYT’s “extensive” reporting on its own case, is also pending judicial resolution (May 4, 2024 ai fray article).

What’s new: Yesterday (Monday, May 20, 2024), the New York Times’s lawyers filed a motion for leave to amend the original (December 2023) complaint in order to “(1) correct errors in the identification of copyright registration numbers for previously asserted works, and (2) to add approximately 7 million additional works to the suit.” Half of the additional copyrighted works are from the period from April 1928 through August 1950, while the original complaint related to copyrighted works from September 1950 through September 23. Other than that, nothing changes. The NYT is not even trying to add a reputational harm cause of action that a case brought by eight news papers belonging to a hedge fund presented on top of the NYT’s claims (April 30, 2024 ai fray article). The court treats the two cases as related, which means they are pending before the same judge.

Direct impact: According to the memorandum in support of the motion, it is not known yet whether OpenAI and Microsoft will oppose the amendment. In any event, the amended complaint will at the earliest become operative 21 days after the court’s ruling on the pending motions to dismiss. And that ruling could result in a narrowing of the case in some other respects, in which case the ultimately filed First Amended Complaint may differ from the one for which the NYT is now seeking permission. A recent Supreme Court ruling may help the NYT defend its pursuit of damages going back by a few more years (May 10, 2024 ai fray article), though it was always clear that the most recent period is the economically most important one.

Wider ramifications: There are presently different schools of thought in the media industry concerning the use of copyrighted material for the training of language models. License deals worth millions of dollars continue to be signed by some publishers while others, such as the NYT, hope for billions of dollars in damages and/or engage in lobbying to change the law in publishers’ favor (May 5, 2024 ai fray article).

Here’s the memorandum of law that explains what’s different about the NYT’s First Amended Complaint and what’s not:

Out of the approximately 7 million additional documents more than half are from the same period as the ones that were exhibits to the original complaint: September 1950 through September 2023. According to the memorandum, those were “inadvertently omitted” from the original complaint due to a “data processing issue.”

A little less than half are from April 1928 through August 1950, and the reason for going back to that period only now is that “the Copyright Office’s online database for such works consists only of scanned images of card catalog files, which had to be reviewed and correlated with The Times’s own records of its online works by hand.” What the memorandum does not explain is why the Times filed the original complaint if it was clear that millions of documents would have to be added later.

Finally, they also add about 10,000 copyrighted works from October 2023 to January 2024. Those are newer than the ones attached to the original complaint.

The New York Times has been seeking damages in the billions of dollars all along. By going from 3 million documents to 10 million, they probably hope to have a psychologically stronger message for the jury, but in order to get to the jury, the complaint has to survive summary judgment, which is the procedural stage where the fair use defense to copyright infringement will be resolved.

There is now the practical question of how to provide those millions of additional copyrighted works as exhibits to the complaint. The New York Times’s lawyers propose to do so in the form of Excel files rather than PDF documents.

Finally, a redlined version highlights the very few changes to the text of the complaint, which are just of an editorial nature:

ai fray will continue to follow developments in major AI-related lawsuits, and this one has a very high profile because of the parties, which is separate from the question of whether it has merit. You can sign up for email alerts to new articles on the home page (right-hand column) and/or follow ai fray on LinkedIn and on X (formerly known as Twitter).