Politics

Legal grey zone deepens as AI training on copyrighted work sparks controversy

Thursday, 30 April 2026 10:12AM UTC

The evolving legal battles surrounding AI models' use of proprietary material highlight an industry operating in a regulatory limbo, with companies pushing boundaries amidst court disputes and controversial findings on data reproduction.

The question of how far "fair use" can stretch is now at the centre of the AI copyright fight. As large language models have moved from research labs into everyday use, one of the earliest and most persistent accusations against companies such as OpenAI and Anthropic has been that they built profitable systems on the back of other people’s creative work. Their defenders argue that training on books, articles and other protected material is transformative and therefore lawful; critics say the output can compete directly with the original works and, in some cases, reproduce them far too closely.

That legal argument has already produced mixed results in the United States. In June 2025, a federal judge in San Francisco ruled that Anthropic’s use of books to train Claude without permission fell within fair use, comparing the process to a reader learning from existing writing in order to create something new. But the same ruling also found that Anthropic’s storage of pirated books in a central library amounted to copyright infringement, leaving the company exposed to a separate damages trial. Reporting at the time suggested the decision strengthened the case for training itself, while still drawing a clear line around how the material was obtained.

The dispute is made more complex by the behaviour of the models themselves. Research reported in early 2026 by Stanford and Yale found that leading systems, including OpenAI’s GPT, Anthropic’s Claude, Google’s Gemini and xAI’s Grok, can reproduce substantial stretches of copyrighted material almost verbatim. That finding undermines the repeated claim by AI firms that their models do not retain training data in a way that resembles copying. For authors and publishers, the worry is not only that their work may have been used without consent, but that the systems may be capable of generating close substitutes that compete with the originals.

Anthropic’s own recent troubles underline how unsettled the sector remains. The company said it had identified more than 24,000 fraudulent accounts involved in large-scale "distillation" attacks on Claude, with activity linked to Chinese AI firms including DeepSeek, Moonshot AI and MiniMax. Separately, it agreed to a $1.5 billion settlement in a class-action case over the use of pirated books in training its models, in what is being described as a landmark payout. The company has also said it does not, by default, use customer prompts and responses to train its systems unless users opt in, reflecting the wider pressure on AI developers to prove they can use data more responsibly than their critics allege.

What emerges is an industry still operating in a legal grey zone, even as courts begin to define its boundaries. For now, AI companies appear willing to push copyright law to its limits first and negotiate later, often only after being challenged in court. That sequence has sharpened the sense among writers, artists and publishers that the technology sector spent years treating protected material as a free resource, before turning to settlements and safety measures once the legal risks became impossible to ignore.

Source Reference Map

Inspired by headline at: ^[1]

Sources by paragraph:

Paragraph 1: ^[2], ^[3]
Paragraph 2: ^[2], ^[3]
Paragraph 3: ^[7]
Paragraph 4: ^[4], ^[5], ^[6]

Source: Noah Wire Services

More on this

https://www.capital.bg/biznes/tehnologii_i_nauka/2026/04/30/4897917_ne_bihte_otkradnali_kola_no_openai_kato_nishto_bi/?ref=rss - Please view link - unable to able to access data
https://www.theguardian.com/technology/2025/jun/25/anthropic-did-not-breach-copyright-when-training-ai-on-books-without-permission-court-rules - In June 2025, a U.S. federal judge ruled that Anthropic's use of books to train its AI system, Claude, without the authors' permission, constituted 'fair use' under U.S. copyright law. The judge likened the AI's use of the books to a reader aspiring to be a writer, using existing works to create something new. However, the court also found that Anthropic's storage of pirated books in a central library infringed on the authors' copyrights, leading to a separate trial to determine damages. This case highlights the ongoing debate over AI companies' use of copyrighted material for training purposes.
https://fortune.com/2025/06/24/ai-training-is-fair-use-federal-judge-rules-anthropic-copyright-case/ - A federal judge in San Francisco ruled that training an AI model on copyrighted works without specific permission was 'fair use'. The case involved Anthropic, which had used books by authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson to train its Claude AI models. The judge supported Anthropic's claim that this practice was transformative and fell under fair use. However, the court also noted that the manner in which the books were obtained mattered, indicating that downloading pirated copies could still lead to legal issues.
https://www.pcgamer.com/software/ai/anthropic-says-it-has-identified-thousands-of-fraudulent-accounts-taking-claude-and-extracting-its-capabilities-to-train-and-improve-their-own-models/ - Anthropic reported identifying over 24,000 fraudulent accounts engaged in 'industrial-scale distillation attacks' on its AI model, Claude. These accounts, linked to AI firms DeepSeek, Moonshot AI, and MiniMax, generated more than 16 million interactions with Claude to extract its capabilities for training competing models. Anthropic emphasized that while distillation can be legitimate, in this case, the methods were improper and could enable foreign entities to absorb U.S.-developed AI technologies without safeguards.
https://www.tomshardware.com/tech-industry/anthropic-to-pay-landmark-settlement-over-claude-training - Anthropic agreed to a $1.5 billion settlement in response to a class-action lawsuit over the use of pirated books in training its AI models. The lawsuit, filed by authors Andrea Barta, Charles Graeber, and Kirk Wallace Johnson, alleged that Anthropic used unauthorized digital copies of hundreds of thousands of copyrighted books from torrent sources. The proposed settlement includes payouts of around $3,000 per infringed book, with the total potentially rising as more titles are identified.
https://www.axios.com/2024/12/09/anthropic-ai-training-data-users-personal-information - Anthropic, known for its Claude models, manages user data with a focus on privacy and safety. Unlike some competitors, Anthropic adopts an 'opt-in' policy for using customer data to train its AI models. By default, they do not use individual prompts or responses for training purposes unless users explicitly provide permission. However, Anthropic does automatically scan user interactions to enforce safety policies, but this data is not included in training. This approach aims to address increasing public and regulatory scrutiny over how AI companies handle personal data.
https://www.theatlantic.com/technology/2026/01/ai-memorization-research/685552/?utm_source=apple_news - An investigation by Stanford and Yale researchers revealed that leading AI models, including OpenAI's GPT, Anthropic's Claude, Google's Gemini, and xAI's Grok, can reproduce large sections of copyrighted material nearly verbatim. This phenomenon, termed 'memorization', counters claims by AI companies that their models do not retain or store training data. Instead, these models function more like lossy compression systems, storing approximations of their training inputs. The capability to regenerate copyrighted works raises serious legal and ethical issues, suggesting AI models may effectively contain unauthorized copies of proprietary content.

Noah Fact Check Pro

The draft above was created using the information available at the time the story first emerged. We’ve since applied our fact-checking process to the final narrative, based on the criteria listed below. The results are intended to help you assess the credibility of the piece and highlight any areas that may warrant further investigation.

Freshness check

Score: 7

Notes: The article references events from June 2025, including a federal judge's ruling on Anthropic's use of books for AI training and the identification of fraudulent accounts targeting Claude. These events are well-documented in multiple reputable sources. However, the article was published on April 30, 2026, indicating a delay of nearly 11 months in reporting. This significant lag raises concerns about the freshness of the information presented. Additionally, the article appears to be a translation of content from other sources, which may have led to the delay. The lack of new insights or developments since the original events further diminishes the article's timeliness. Given these factors, the freshness score is reduced.

Quotes check

Score: 6

Notes: The article includes direct quotes attributed to various sources. However, upon searching for the earliest known usage of these quotes, it appears that they have been used in earlier publications, suggesting potential reuse of content. The lack of independently verifiable sources for some quotes raises concerns about their authenticity. Without access to the original sources or direct verification, the credibility of these quotes cannot be fully confirmed. Therefore, the quotes check score is moderate.

Source reliability

Score: 5

Notes: The article originates from Capital.bg, a Bulgarian news outlet. While it may be reputable within its niche, its international reach and recognition are limited. The article relies heavily on secondary sources, including press releases and reports from other media outlets, which may have their own biases or inaccuracies. The lack of original reporting or firsthand accounts diminishes the overall reliability of the source. Given these factors, the source reliability score is moderate.

Plausibility check

Score: 7

Notes: The events described in the article, such as the legal ruling on Anthropic's use of books for AI training and the identification of fraudulent accounts targeting Claude, are plausible and have been reported by multiple reputable sources. However, the significant delay in reporting and the reliance on secondary sources without new insights or developments since the original events raise questions about the article's originality and depth. The lack of specific factual anchors, such as names, institutions, and dates, further diminishes the article's credibility. Therefore, while the claims are plausible, the overall plausibility score is moderate.

Overall assessment

Verdict (FAIL, OPEN, PASS): FAIL

Confidence (LOW, MEDIUM, HIGH): MEDIUM

Summary: The article presents information on events from June 2025, including a federal judge's ruling on Anthropic's use of books for AI training and the identification of fraudulent accounts targeting Claude. However, the article was published on April 30, 2026, indicating a significant delay in reporting. The content appears to be recycled from other sources, with a lack of original reporting or new insights. The reliance on secondary sources and the inability to independently verify some quotes further diminish the article's credibility. Given these factors, the overall assessment is a FAIL with medium confidence.

AI copyright law
Fair use
AI training data