New York Times alleges copyright infringement by Microsoft and OpenAI

Context/What’s new: New copyright infringement lawsuit (United States District Court for the Southern District of New York, case no. 1:23-cv-11195, New York Times Company v. Microsoft, OpenAI et al.).

Direct impact: The NYT is seeking damages (to be determined by a jury) as well as injunctive relief against the defendants. According to the complaint, the NYT expects defendants to argue that their use of NYT material is protected by fair use with a focus on the transformative nature of this kind of use.

Wider ramifications: Many publishing companies have expressed skepticism about the impact of Generative AI (GAI) on their business models. The NYT apparently brought this litigation to set a precedent. What is at stake is the ability of companies to train GAI systems. The NYT seeks to reverse a judicial trend in the U.S. that has gradually expanded the scope of fair use (and particularly of the transformative nature criterion) in digital contexts.

First, here’s the complaint, which was filed today:

The first two sentence of paragraph 47 explain the challenges the newspaper industry is facing because its traditional business models are no longer working the way they used to:

“Making great journalism is harder than ever. Over the past two decades, the traditional business models that supported quality journalism have collapsed, forcing the shuttering of newspapers all over the country.”
New York Times copyright complaint against Microsoft and OpenAI

This would be the first time, however, for U.S. copyright enforcement to impede innovation. The closest case here was brought by the Authors Guild over Google Books (in the same district, in fact), and ultimately the way Google Books organized and presented content was considered sufficiently transformative to support a fair use defense.

While different in many ways, Oracle v. Google doesn’t bode well for the NYT’s complaint either. Contrary to widespread misbelief, what was at issue was not generally whether application programming interfaces (APIs) could be used in order to make new products. It was actually just about whether the wholesale copying (again by Google) of thousands of lines of highly original program code in order to build a competing product would be allowed. And despite the fact that Oracle’s Java programming language was actually very popular on mobile phones when Android entered the market, the Supreme Court deemed Android’s use of those Java APIs transformative and sided with Google on fair use grounds.

In today’s complaint, the NYT points to examples where certain ChatGPT answers purportedly amount to direct copying of some of its content. The NYT points to almost 200 years of quality journalism though a certain part of that material is certainly beyond the term of copyright protection (70 years). The NYT says that millions of its articles have been used to train ChatGPT, but obviously the complaint doesn’t (and can’t) prove that millions of those articles are actually used in a way that amounts to direct copying.

Microsoft is the first-named defendant though the company is actually two steps removed from where the NYT’s material is copied. Microsoft is a ChatGPT partner, and neither Microsoft nor ChatGPT appear to be the ones who copy any NYT material: an organization named Common Crawl.

The NYT alleges direct as well as vicarious and contributory copyright infringement; a violation of the Digital Millennium Copyrights Act (DMCA) based on the alleged removal of information used for digital rights management (DRM); unfair competition by misappropriation; and trademark dilution because of incorrect answers allegedly being attributed to the NYT in some situations.

Besides damages and an injunction, the complaint asks the court to order the “destriction […] of all GPT or other LLM models and trainings that incorporate [the New York Times’ copyrighted material].” The harm to OpenAI and Microsoft from such destruction would be unimaginable. But that is presumably not the NYT’s actual priority:

Most likely, the NYT’s plan is to gain leverage in order to extract a settlement enabling the publisher (and, by extension, other publishers) to collect royalties on content used for LLM training.

Presumably Microsoft and OpenAI will seek a routine extension of the deadline to answer or otherwise respond to the complaint (which could be a motion to dismiss or an answer to the complaint), given the importance and complexity of the issues raised. In that case, their motion to dismiss or answer to the complaint will probably be filed in the second half of February.