The question of how far "fair use" can stretch is now at the centre of the AI copyright fight. As large language models have moved from research labs into everyday use, one of the earliest and most persistent accusations against companies such as OpenAI and Anthropic has been that they built profitable systems on the back of other people’s creative work. Their defenders argue that training on books, articles and other protected material is transformative and therefore lawful; critics say the output can compete directly with the original works and, in some cases, reproduce them far too closely.

That legal argument has already produced mixed results in the United States. In June 2025, a federal judge in San Francisco ruled that Anthropic’s use of books to train Claude without permission fell within fair use, comparing the process to a reader learning from existing writing in order to create something new. But the same ruling also found that Anthropic’s storage of pirated books in a central library amounted to copyright infringement, leaving the company exposed to a separate damages trial. Reporting at the time suggested the decision strengthened the case for training itself, while still drawing a clear line around how the material was obtained.

The dispute is made more complex by the behaviour of the models themselves. Research reported in early 2026 by Stanford and Yale found that leading systems, including OpenAI’s GPT, Anthropic’s Claude, Google’s Gemini and xAI’s Grok, can reproduce substantial stretches of copyrighted material almost verbatim. That finding undermines the repeated claim by AI firms that their models do not retain training data in a way that resembles copying. For authors and publishers, the worry is not only that their work may have been used without consent, but that the systems may be capable of generating close substitutes that compete with the originals.

Anthropic’s own recent troubles underline how unsettled the sector remains. The company said it had identified more than 24,000 fraudulent accounts involved in large-scale "distillation" attacks on Claude, with activity linked to Chinese AI firms including DeepSeek, Moonshot AI and MiniMax. Separately, it agreed to a $1.5 billion settlement in a class-action case over the use of pirated books in training its models, in what is being described as a landmark payout. The company has also said it does not, by default, use customer prompts and responses to train its systems unless users opt in, reflecting the wider pressure on AI developers to prove they can use data more responsibly than their critics allege.

What emerges is an industry still operating in a legal grey zone, even as courts begin to define its boundaries. For now, AI companies appear willing to push copyright law to its limits first and negotiate later, often only after being challenged in court. That sequence has sharpened the sense among writers, artists and publishers that the technology sector spent years treating protected material as a free resource, before turning to settlements and safety measures once the legal risks became impossible to ignore.

Source Reference Map

Inspired by headline at: [1]

Sources by paragraph:

Source: Noah Wire Services