Top EU court receives first AI copyright case, raising general questions about text and data mining and questions specific to news snippets

Context: Copyright enforcement actions against AI providers are pending around the globe, most but not all of them in U.S. courts (May 23, 2025 ai fray article). Lawsuits have been brought in different EU member states, among them Germany (January 21, 2025 ai fray article) and France (March 13, 2025 ai fray article). Plaintiffs in those EU cases do not conceal their objective to have some fundamental questions of copyright enforceability in the AI context clarified by the European Court of Justice (ECJ), whose interpretation of EU statutes is binding on courts throughout the 27-member bloc (which is more than the top national courts of major member states like Germany can say).

What’s new: The first AI copyright case to have reached the top EU court originated from Hungary. The case caption is Like Company v. Google Ireland Limited and the rather catchy case no. is C-250/25. On April 3, 2025, the Budapest Környéki Törvényszék, a court in the Hungarian capital, made a preliminary reference to the ECJ that has now surfaced. The Hungarian court raises four questions of interpretation of EU law, the first one of which has two parts.

Direct impact & wider ramifications: The first question concerns only news publishers as it invokes a special EU rule for news snippets that was passed into law in 2019, but all other questions (even the follow-up question that is part of the first question) are of transcendental relevance. Given that this case is now ahead of any other AI copyright case in the EU that has the potential to reach the ECJ, there will now be a huge EU-wide lobbying battle over what the European Commission (EC) and the EU member states will say in their submissions to the court. Unlike U.S. courts, the ECJ does not entertain amicus curiae briefs by stakeholders. Only certain EU institutions and member state governments may make submissions. So far the ECJ case that got more membe states involved than any other was European Superleague v. UEFA & FIFA, which reflects the societal importance of soccer, but the question of copyright enforcement against LLMs objectively eclipses a sports governance case.

The plaintiff, Like Company, is a news publisher. It argued to the trial court that it its possible, through particular questions, to elicit the alleged regurgitation of a news article from Google’s Gemini (or its predecessor named Bard) chatbot. Here’s a sample question:

Can you provide a summary in Hungarian of the online press publication that appeared on balatonkornyeke.hu regarding Kozsó’s plan to introduce dolphins into the lake?

The referring court is not Hungary’s top court, but it has the status of a regional court and is, therefore, above the lower echelon of Hungarian courts, which in English would be called district courts. Any court in the EU has the right (and only national courts of final appeal have an obligation, unless a question is pointless) to make requests to the ECJ for a preliminary ruling on questions of EU law. This is also called a “preliminary reference.” Unlike a final appeal, it can be made at an early stage of proceeding, which was the case here: the Hungarian court was persuaded that the parties should first know the ECJ’s interpretation of applicable EU laws before further briefing the court. Such a preliminary reference at an extremely early stage of proceeding is indeed possible (which will strike as odd anyone familiar with the U.S. “appeal from final judgment” rule), and it also happened in the above-mentioned European Superleague case.

Here are the questions that the ECJ has been asked to answer, which won’t happen this year but may before the end of 2026, with some quick commentary below each question regarding its relevance in basic terms:

Must Article 15(1) of Directive (EU) 2019/790 of the European Parliament and of the Council of 17 April 2019 [on copyright and related rights in the Digital Single Market and amending Directives 96/9/EC and 2001/29/EC], and Article 3(2) of Directive 2001/29/EC of the European Parliament and of the Council of 22 May 2001 on the harmonisation of certain aspects of copyright and related rights in the information society, be interpreted as meaning that the display, in the responses of an LLM-based chatbot, of a text partially identical to the content of web pages of press publishers, where the length of that text is such that it is already protected under Article 15 of Directive 2019/790, constitutes an instance of communication to the public? If the answer to that question is in the affirmative, does the fact that [the responses in question are] the result of a process in which the chatbot merely predicts the next word on the basis of observed patterns have any relevance?

As the two question marks indicate, these are actually two questions, but the second question is contingent upon the first one being answered in the affirmative (i.e., with “yes”).

The first question is about whether an AI copyright claim can be based on the Infosoc (information society) directive of 2001, which among other things governs copyright enforcement in an internet context, in cases where the news snippets rule of the 2019 EU Copyright Directive applies.

It’s a valid question because normally lex specialis, meaning the law more specifically tailored to a question at hand (here, the news snippets statute of the EU Copyright Directive), trumps lex generalis, a more general rule (here, the statute in the Infosoc Directive that relates to the reproduction of copyrighted material). In this case, the lex specialis is also considerably younger.

The second question above will be reached only if the Infosoc directive (and not just the news snippets rule) applies, though the ECJ is free to comment on it regardless, and if it wants, it can make its answer binding as opposed to an obiter dictum (something that a court says in a decision but which was not key to reaching a decision and is therefore less influential going forward than the core of the judgment). And that one is about whether the functioning of LLMs, which comes down to predictive algorithms, has any relevance to the question of potential copyright infringment. That could be an additional defense for AI providers. Colloqiually put, they could say “we don’t really reproduce, we just predict what word comes next.”

That second part would be relevant not only in connection with news snippets but all other types of copyrighted content that can be reproduced.

Must Article 15(1) of Directive 2019/790 and Article 2 of Directive 2001/29 be interpreted as meaning that the process of training an LLM-based chatbot constitutes an instance of reproduction, where that LLM is built on the basis of the observation and matching of patterns, making it possible for the model to learn to recognise linguistic patterns?

This question also amounts, at least potentially, to two because the ECJ may feel compelled now to answer it for the Infosoc Directive and additionally for the Copyright Directive.

It is, simply put, about whether training in and of itself can already constitute copyright infringement. That is also a big question in U.S. copyright cases, where some plaintiffs initially emphasized output (the New York Times Company, for instance), but the focus has increasingly shifted to the input (training) side.

If the answer to the second question referred is in the affirmative, does such reproduction of lawfully accessible works fall within the exception provided for in Article 4 of Directive 2019/790, which ensures free use for the purposes of text and data mining?

This is another conditional question: it’s contingent upon the answer to the second question. It’s about the Text and Data Mining (TDM) exception. A similar question was addressed last year by a court of first instance in Germany (September 30, 2024 ai fray article), but now the “final judge” may be called upon to address the TDM exception. That is a very key element of many AI copyright cases in the EU, at least those involving text and other data.

Must Article 15(1) of Directive 2019/790 and Article 2 of Directive 2001/29 be interpreted as meaning that, where a user gives an LLM-based chatbot an instruction which matches the text contained in a press publication, or which refers to that text, and the chatbot then generates its response based on the instruction given by the user, the fact that, in that response, part or all of the content of a press publication is displayed constitutes an instance of reproduction on the part of the chatbot service provider?

Here, again, the question could be answered twice: once per statute. But the ECJ could also focus on only one of the statutes if it thinks that this case is exclusively about news snippets.

It raises a question that has also come up in the U.S. already and will have to be addressed there: how meaningful is it that an AI system regurgitates text if the question that elicits a certain response already contains a certain chunk of copyrighted material? In other words, what if the one who asks the question can be realistically assumed already to have that material in front of them? This, too, could lead to a novel and AI-specific defense.

This will now be one of the most-watched AI copyright cases in the world.