This NYT-OpenAI discovery issue can complicate any news organization’s copyright lawsuit against AI providers: which parts are original?

Context: The New York Times and defendants OpenAI and Microsoft are still embroiled in a number of discovery disputes (June 12, 2024 ai fray article). Questions keep piling up for Judge Sidney H. Stein to decide, also in the earlier-filed and related Authors’ Guild v. OpenAI & Microsoft case (June 22, 2024 ai fray article). And new cases of this kind keep getting filed (June 27, 2024 Reveal article).

What’s new: Yesterday (July 1, 2024), OpenAI’s lawyers made a filing (PDF) raising an issue that may considerably up the ante for any news publisher asserting copyrights against AI provider. By touting the quality of its reporting and its massive investments in the editorial process, the NYT has provoked discovery requests relating to reporter’s notes and other information that enables a distinction between the NYT’s original creative works and those parts of the asserted articles that are actually from other sources (such as interviewees, news agencies or press releases; note that OpenAI’s filing does not provide those examples, but ai fray believes that they are relevant).

Direct impact: This question presents the court with a dilemma between what’s legally reasonable and what’s practically feasible. It’s not that derivative works enjoy no protection at all — OpenAI concedes this much. But someone seeking billions of dollars in damages and facing a fair use defense is not entitled to damages or other remedies over that which others created. But there’s a quantitative problem with zillions of articles at issue, and the NYT also argues that discovery into reporter’s notes and other material used and produced during research for articles runs counter to the First Amendment and other laws protecting the press.

Wider ramifications: Apart from authors of fictional works, virtually all AI plaintiffs asserting text material will run into the same issue. Many parties will therefore be watching closely how this one is resolved in New York Times v. Microsoft & OpenAI (and that may even involve appeals).

The July 1, 2024 letter by OpenAI’s lawyers to the court says “[t]he Times should be ordered to provide discovery showing the copyrighted works are original works of authorship.” That’s because the Times can only bring claims over what its own journalists created as opposed to “preexisting material employed in the work” (which also includes whatever may have been in the public domain).

Those text passages must also be human-authored. In other words, should the NYT itself have used AI tools for the creation of any articles, then they couldn’t sue over it either. (As an aside, while U.S. copyright law is clear about human authorship as a hard requirement, courts in different jurisdictions all over the world are now grappling with the question of whether AI-assisted or AI-made inventions are patentable.)

OpenAI’s Request for Production (RFP) 12 seeks “documents sufficient to show each and every written work that informed the preparation of each of [the NYT’s] Asserted Works, regardless of its length, format, or medium.” The NYT’s lawyers call this request overbroad and too burdensome. They furthermore say the terms “written work,” “informed the preparation, ” “format” and “medium” are “vague and ambiguous.” That is a question courts routinely resolve.

Furthermore, the NYT raises First Amendment and state law issues involving reporters’ privilege (the shielding of reporters’ work from discovery). But that privilege has limit. Here, it’s not that someone seeks discovery into reporter’s notes for the purpose of bringing claims against sources or to otherwise compromise the freedom of the press. This here is all about copyright, and OpenAI argues that without more information, it’s not possible to tell which parts of the asserted articles were actually human-authored by the NYT.

This is far from the only interesting discovery issue in those AI copyright disputes. The articles linked to further above (in the Context paragraph) are only two examples of other relevant questions for the court to resolve.