Context: More than two dozen AI-related copyright lawsuits are pending in the U.S. (July 2, 2024 ai fray article). Several of them involve, besides copyright claims per se and sometimes other theories, the allegation of violations of Section 1202 of the Digital Millennium Copyright Act (DMCA), which prohibits the removal of copyright management information.
What’s new: Microsoft’s lawyers have drawn the attention of Judge Sidney H. Stein in the Southern District of New York, who presides over various AI copyright cases there including the ones brought by the New York Times and a group of other newspapers including the Daily News, to a decision that has finally been published in the Northern District of California (notice of supplemental authoritym (PDF)). Judge Jon S. Tigar dismissed for a second time (and definitively this time, barring a successful appeal) an open-source developer class action’s DMCA § 1202 claim (order on motions to dismiss (PDF)). The class-action lawyers had alleged that Github, using ChatGPT-powered AI technology, would suggest code snippets of approximately 150 characters to developers without providing copyright management information such as the name of the original author.
Direct impact: A decision by another district judge is not binding on Judge Stein, but it can be persuasive authority. The newspapers would presumably not just disagree but also argue that this kind of decision is just fact-specific. It wouldn’t be a surprise to see them file a response to the request for judicial notice. But there is a clear commonality: ChatGPT, like Github’s AI-powered assistant, won’t reproduce (“regurgitate”) identical copyrighted works unless a prompt contains a lengthy passage from the original work, which was not enough for Judge Tigar to conclude that the allegation crossed the line from “conceivable” to “plausible” (and the latter is the so-called Twiqbal standard to meet at the motion-to-dismiss stage). DMCA § 1202 has an identicality requirement. In the newspapers’ AI copyright cases, the problem is essentially the same: unless one prompts ChatGPT with extensive passages from the original article, one won’t realistically elicit an output that amounts to a reproduction of identical works. Microsoft’s motion to dismiss in the NYT case places greater emphasis on identicality than Open AI’s, but the latter also referenced in a footnote the previous decision in the Github case (which was not yet a dismissal with prejudice but the class-action plaintiffs still had the chance to improve their pleadings).
Wider ramifications: Perplexity, if or when it gets sued, may have a bigger problem as it is apparently easier to get it to reproduce lengthy passages from copyrighted works (June 15, 2024 ai fray article). For the Authors Guild class action against OpenAI and Microsoft (June 22, 2024 ai fray article), which is related to the newspaper actions and therefore before the same New York judge, there are no implications as it is a case about input and not output, thus there is no DMCA § 1202 claim in that litigation.
Doe1 v. Github was brought in November 2022 by the Joseph Saveri Law Firm and an individual lawyer. The former sued Microsoft the following month over its acquisition of Activision Blizzard, piggybacking on a Federal Trade Commission lawsuit that didn’t prevent the merger from closing.
Parts of the Github case are still alive, such as the claim that open-source contracts are breached by Github. But the DMCA part was dismissed without prejudice the first time and with prejudice the second and final time. The Saveri Firm and its co-counsel then filed a second amended complaint in January 2024. Github, which belongs to Microsoft (who is a co-defendant), and OpenAI renewed their motion to dismiss. Some claims had already died along the way, and on June 24, 2024, Judge Tigar threw out a few more (but not all of them). Before the order on the motions to dismiss became public, the court clarified potential redactions with the parties. After that process was completed, Microsoft’s lawyers submitted the decision to Judge Stein in NYC.
The plaintiffs in the Github case failed to persuade Judge Tigar that the DMCA’s identicality requirement should be ignored or at least vitiated. They tried it again at this stage, but once again to no avail.
In an effort to make an identicality claim plausible, they pointed to a study (Quantifying Memorization Across Neural Language Models by Nicholas Carlini et al.), but that study is neither exclusively about Codex and Copilot nor does it talk about the open-source code that the class-action plaintiffs (programmers) published on Github. In any event, the study says the GitHub copilot model “rarely emits memorized code in benign situations, and most memorization occurs only when the model has been prompted with long code excerpts that are very similar to the training data.”
That was enough to show that it’s conceivable (which pretty much anything is with a large language model that is trained on vast amounts of data), but not enough to establish plausibility. Therefore, the claim was thrown out.
It’s worth re-reading the quoted sentence from the study with the NYT’s claims of a reproduction of lengthy passages from its articles in mind (thus replacing “code” with “text”): ChatGPT “rarely emits memorized [text] in benign situations, and most memorization occurs only when the model has been prompted with long [text] excerpts that are very similar to the training data.”
The NYT, too, has apparently been unable to show that its articles are regurgitated in normal situations.
Technically, the notice of supplemental authority was given only in the Daily News case, but the cases are related and before the same judge, who will bear the California decision in mind when ruling on the motions to dismiss in the newspaper publishers’ cases.