Context: Last month, OpenAI was embroiled in two major copyright infringement lawsuits on opposite sides of the world: one by a German music rights collecting society in Munich (November 14, 2024 ai fray article) and the other by India’s Asian News International (ANI) agency (November 18, 2024 ai fray article).
What’s new: The AI provider is now facing fresh copyright infringement claims in Canada, where the country’s top news organizations have asked the Ontario Superior Court of Justice to grant them an injunction, as well as CAD20,000 in statutory damages for every article that OpenAI has allegedly used to train its ChatGPT software unlawfully.
Direct impact: OpenAI must now face another lawsuit that adds to the ongoing debate over “fair use”, or as known under Canada’s Copyright Act “fair dealing”, practices. Lead counsel for the news organizations claim that this defence will not “hold water” in this particular case, as the “scraping” of the articles violates the terms and conditions of each publisher’s website.
Wider ramifications: While Canada’s “fair dealing” exception is indeed a much narrower concept that can only be applied for given purposes, nothing in the fair dealing debate is certain at the moment. The action also seems to be a move by the publishers to force OpenAI to the negotiation table. While it may not be enough to make the ChatGPT owner feel pressure alone, it joins dozens of other suits that could.
Seven of Canada’s leading news organizations have sued tech giant OpenAI, alleging that the company is illegally using news articles to train its ChatGPT software. The suit was filed on Friday in Ontario’s Superior Court of Justice and marks the first time all of a country’s major news publishers have come together in litigation against OpenAI.
The plaintiffs include the Toronto Star, Metroland Media, Postmedia, The Globe and Mail, The Canadian Press, CBC and PNI Maritimes, and are being represented by Toronto-based litigation firm Lenczner Slaght.
The coalition is seeking damages of up to CAD20,000 (US$14,000) per article. With a minimum of 16 million pieces of work that could be at issue here, the total damages could be worth billions of dollars. They are also seeking the disgorgement of any profits made by OpenAI from using the news organizations’ articles and a permanent injunction restraining OpenAI from using any of the news articles in the future.
The suit alleges:
“To obtain the significant quantities of text data needed to develop their GPT models, OpenAI deliberately ‘scrapes’ (i.e., accesses and copies) content from the [plaintiffs’] websites […] It then uses that proprietary content to develop its GPT models, without consent or authorization.”
In a memo to employees Toronto Start’s chief executive Neil Oliver said: “We will not stand by while tech companies steal our content.” He wrote: “While we embrace the opportunities that technological innovation can bring, all participants must follow the law, and any use of our intellectual property must be on fair terms,” adding that the work produced by the company’s journalists is vital to democracy and the company’s bottom line.
Fair dealing
Lenczner Slaght partner Sana Halwani, who is lead counsel for the plaintiffs, alleged that the fact that OpenAI has signed licensing agreements with other media organizations to pay for content is a sign the company knows it’s in the wrong. She noted that OpenAI’s use of the articles might be different than copyright infringement from another era, but that doesn’t make it right.
“The uses (to) which they are putting those copies is something new and different because it’s this new technology, but copying is copying,” she added, noting that OpenAI’s “scraping” of the articles violates the terms and conditions of each publisher’s website.
The argument of fair use – known as “fair dealing” under Canadian copyright law – doesn’t hold water here, she added.
“Fair dealing”, an exception under Canada’s Copyright Act that permits the use of copyright-protected work depending on certain factors, is a narrower concept than that used in the U.S., known as “fair use”. According to Canada’s Copyright Act, to be granted the exception, individuals/organisations must consider the following six criteria:
- “A fair dealing analysis should attempt to make an objective assessment of the user’s primary purpose or motive in using the copyrighted work.
- If multiple copies of works are being widely distributed, this will tend to be unfair. If, however, a single copy of a work is used for a specific legitimate purpose, then it may be easier to conclude that the dealing was fair.
- Although in general use of a greater proportion of a work may tend toward unfairness, it may be possible to deal fairly with an entire work.
- If there is a non-copyright-protected or openly licensed equivalent of the work that could have been used instead of the work, this should be considered in the determination of fairness.
- Although certainly not determinative, if a work was intended to be published but is not widely available, the dealing may be more fair in that its reproduction with acknowledgement could lead to a wider public dissemination of the work – one of the goals of copyright law.
- If the reproduced work is likely to compete with the market of the original work, this may suggest that the dealing is not fair. Although the effect of the dealing on the market of the copyright owner is an important factor, it is neither the only factor nor the most important factor to be considered in deciding whether the dealing is fair.”
“Let’s remember that this is a commercial entity making money from the content that they’re taking … That’s not an allowable purpose under the fair dealing exception,” Halwani said.
In response, a spokesperson for OpenAI has stated: “Hundreds of millions of people around the world rely on ChatGPT to improve their daily lives, inspire creativity, and solve hard problems. Our models are trained on publicly available data, grounded in fair use and related international copyright principles that are fair for creators and support innovation.”
Forcing OpenAI to the negotiation table?
Professor Michael Geist, a Canada Research Chair in Internet and E-commerce Law at the University of Ottawa, notes that while the lawsuit itself isn’t a huge surprise, the “relatively weak, narrow scope of the claims” is. In fact, he believes that the lawsuit was filed because the media companies want a settlement that involves OpenAI paying licence fees for the inclusion of their content in its large language models. This action is just to “kickstart” negotiations, he says.
This is also pointed to throughout the claim, where the companies have said at least once that OpenAI “was and is well aware of its obligations to obtain a valid licence to use [their articles]”.
However, one of the biggest issues with this case, Professor Geist believes, is that the Canadian media companies have admitted that they don’t actually know how much of their work is being used:
“The full particulars of when, from where, and exactly how, the Works were accessed, scraped, and/or copied is within the knowledge of OpenAI and not the News Media Companies.”
As there is no actual unauthorized publicly available publication of their works, the focus instead targets the scraping of the data for inclusion in training materials, he comments. The data that was scraped will then have to be put through Canada’s fair dealing rules, where it will be confronted with the first hurdle by qualifying as research. Under the six-factor analysis (as noted above) there could be an “interesting debate”, Geist says, “that is by no means certain”.
“There will be arguments that the tokens derived from the underlying works involve statistical analysis rather than copying of those works,” he says.
Geist also notes that this action is a lot narrower than those filed in the U.S., for example, where the New York Times focused on both the inputs (the materials used to train ChatGPT) and the outputs (allegations ChatGPT occasionally provides copyright infringing results), while this one targets inputs only. “Further, unlike the NY Times, they also don’t sue Microsoft, a major investor and user of ChatGPT, which also suggests that a licence from OpenAI is the real goal,” he comments.