OpenAI motion seeks to narrow scope of NY Times case and of discovery, counters complaint’s narrative

Context: In late December 2023, the New York Times brought a copyright infringement complaint against OpenAI and Microsoft (December 27, 2023 ai fray article). The court’s decision on the claims as well as any remedies will involve proportionality considerations (December 29, 2023 ai fray article).

What’s new: On Monday, OpenAI brought a motion (PDF) to dismiss parts of the NYT’s complaint. After stating OpenAI’s position on the case in general and countering the NYT’s narrative, the motion seeks to limit the relevant period of the NYT’s first claim and the complete dismissal of three other claims. OpenAI also requested an oral hearing on its motion. While Microsoft is not the moving party, it will also benefit from the motion if it succeeds.

Direct impact: The immediate effect is that the court will resolve this motion before the defendants can answer the complaint, as there is a realistic possibility of the case being narrowed. Also, in addition to a public statement made by OpenAI in early January (January 8, 2024 OpenAI blog post), the general public can now read OpenAI’s position on the issues in the case, with a particular focus on fair use.

Wider ramifications: This is the highest-profile one of various AI-related copyright actions before various U.S. courts. There is also lobbying activity in different jurisdictions concerning the use of copyrighted material by AI systems for training purposes.

OpenAI’s counter-narrative

There are three reasons for which OpenAI’s memorandum in support of its motion to dismiss goes beyond that which is strictly necessary to argue for dismissal:

With fair use being the most important defense, the overall economic and societal impact of AI as well as the negligible impact on NYT revenues are going to be key arguments.
At the motion-to-dismiss stage, the court will not yet weigh the statutory fair use factors, but it is obviously in a defendant’s interest to contradict a narrative of reckless infringement as early as possible.
The latter also matters in the court of public opinion.

The “juiciest” part that was already picked up by various media reports earlier in the week is that “the Times paid someone to hack OpenAI’s products” in order to generate certain ChatGPT answers that allegedly violate the NYT’s rights. Moreover, “{t]hey were able to do so only by targeting and exploiting a bug (which OpenAI has committed to addressing) by using deceptive prompts that blatantly violate OpenAI’s terms of use.”

Politically and psychologically, one of the strongest defenses is to not only deny wrongdoing, but to argue that the other party violated certain rights.

The motion states the key issue in this litigation: “whether it is fair use under copyright law to use publicly accessible content to train generative AI models to learn about language, grammar, and syntax, and to understand the facts that constitute humans’ collective knowledge.” They’re not there yet, but that question reads almost like a question for review that would be proposed by a petition for writ of certiorari (request for Supreme Court review).

OpenAI does not appear to argue that there is no copyrightable material involved at all, but the less there is in terms of copyrightable material that has been used, the easier it is to prevail on a fair use defense. The motion says “no one—not even the New York Times—gets to monopolize facts or the rules of language.” That part unmistakably reflects an abstraction-filtration-comparison (AFC) approach. AFC is a sequential method for evaluating a copyright infringement claim that the Second Circuit (the very appeals court to which the losing party would have to appeal this case) developed in its 1992 Computer Associates v. Altai case. It is one of the most influential decisions in the history of copyright law, throughout and even beyond the United States. It comes down to filtering out the parts of the asserted material that are not copyrightable, focusing then on the remainder.

Another key Second Circuit copyright decision that is referenced shortly thereafter is the Google Books case. In that one, the fair use criterion of transformativeness was deemed satisfied because of a technological process that created a new and innovative product (as opposed to the more traditional understanding of transformative use, which was about creative modifications, such as a parody versus the original).

And just like ai fray expected, the Supreme Court’s Oracle v. Google decision (January 24, 2024 ai fray article) is also mentioned as a key precedent with an expansive take on fair use, again in a technological context.

The fair use question will ultimately come down to whether

humanity and the economy at large should be deprived of the benefits of Generative AI,
though what GAI really uses is uncopyrightable material (facts, linguistic structure) and
to the extent any expressive and copyrightable material is involved, the wholesale regurgitation of entire paragraphs from articles is a bug that the NYT maliciously exploited, but has nothing to do with how people actually use ChatGPT in practice.

With a view to the viability of journalism, OpenAI points out that it’s a profession that has undergone other transformations and the NYT’s copyright infringement allegations only came up after ChatGPT became a viral sensation.

Arguments for dismissal of certain claims

At the motion-to-dismiss stage, the court will have to take the plaintiff’s allegations as true unless they are not plausible. For the most part, motions to dismiss are resolved on the basis of missing links in a legal theory.

The NYT’s first claim alleges direct copyright infringement. The motion does not say that it should be dismissed in its entirety, but it should be limited to acts of reproduction that occurred in the three years preceding the complaint (which apart from a few days means the calendar years 2021-2023). However, it could be that if the motion succeeded, there actually wouldn’t be anything left of the first claim. OpenAI says that the creation of the “WebText” database and its second version (“WebText2”) and the use of that material by OpenAI for training purposes “occurred more than three years ago,” making them “stale claims” as the Supreme Court called them.

The statutory basis is 17 U.S.C. § 507, a general statute of limitations for civil law claims. Just last week, the Supreme Court heard a case, Warner v. Nealy, in which the question of whether there can be exceptions for copyright cases from that three-year rule will be decided. That case has a very peculiar fact pattern, so the decision may or may not be relevant to the NYT case.

The next claim to be challenged by the motion to dismiss is the NYT’s Count IV (contributory infringement). Contributory infringement presupposes the defendant’s knowledge of the acts in question. According to the motion, this means OpenAI would have had to have knowledge of the NYT’s use of ChatGPT for the purpose of eliciting certain answers.

Thereafter, the motion takes aim at Count V, which is about the removal of copyright management information (CMI) in violation of the Digital Millennium Copyright Act (DMCA). The motion says the complaint doesn’t even specify the CMI (a deficiency that may be cured, but that would require an amended complaint) and here, too, OpenAI alleges that claims about anything that happened more than three years ago is time-barred (which might not dispose of the claim as a whole, but at least of parts of it). The motion also raises other issues in this regard, such as that the NYT doesn’t plead a sufficient injury.

Finally, the motion argues that the NYT can’t pursue copyright claims and then, in addition, an unfair competition claim over certain copyright infringement allegations. In such cases, the motion essentially argues, the copyright claim pre-empts the unfair competition claim.

All in all, a narrowing of the case appears reasonably likely, though ai fray would not predict that the motion to dismiss will necessarily succeed to its maximum extent.