Recent revelations concerning artificial intelligence training practices have raised significant questions about copyright infringement, particularly regarding Meta's practices. A new study indicates that certain AI models, notably Meta's, have not only been trained on copyrighted books but have also memorised extensive portions of their content verbatim. This finding could have extensive legal ramifications, with potential implications for billions of dollars as ongoing cases unfold in the U.S. and the UK over the legality of these practices.

Authors and publishers have been increasingly vocal against AI companies using their works without permission for model training. In April, a coalition of authors and publishers voiced their concerns regarding Meta's usage of copyrighted materials, which they argue constitutes a violation of their rights. The core legal dispute centres around whether tech firms can legitimately use copyrighted materials without prior consent, with many AI developers claiming that their models generate entirely new compositions rather than direct replications of the original texts.

The new research, led by a team including Mark Lemley from Stanford University, highlights a significant discrepancy among various AI models. While most models do not recall the exact wording of texts from their training, Meta's Llama 3.1 model shows a concerning aptitude for reproducing sizeable segments of well-known works, including "Harry Potter and the Philosopher’s Stone" and "The Great Gatsby." The implications of these findings are severe: if a court determines that Meta's model has infringed on the copyright of even a small fraction of the Books3 dataset — a trove of nearly 200,000 titles used for training — damages could reach nearly $1 billion.

Despite evidence suggesting that Meta's AI practices may not align with fair use, company representatives assert that their approach is legitimate. Emil Vazquez, a spokesperson for Meta, contends that "fair use of copyrighted materials is vital" for the evolution of AI technologies. However, this stance has drawn scrutiny from legal experts, who assert that the core issue isn't merely about whether or not AI models can generate new works but hinges on whether they can legally draw from copyrighted content in the first place.

Further complicating matters, European legal action against Meta adds another layer to the discourse. In March 2025, prominent French authors and publishers accused Meta of "monumental pillage" for its unequitable use of protected works, drawing attention to the importance of adhering to copyright laws as outlined in the European Union's AI Act. Law firms in different jurisdictions are also weighing in, with Robert Lands highlighting that under UK law, the memorisation finding could lead to unfavourable results for AI firms, given the narrower exceptions available compared to the U.S. framework.

The ethical ramifications of utilising pirated materials for AI model training cannot be overlooked. A report indicates that Meta employees considered relying on piracy to acquire necessary training data due to the prohibitive costs of legal sourcing. This decision may resonate with broader challenges facing tech companies, all of whom risk undermining the original creative work and intellectual discussions that the very AI technologies aim to enhance.

As litigation continues and the technology evolves, the balance between innovation and intellectual property rights remains a contentious issue. Judges and legal experts are tasked with drawing clear boundaries, a complex challenge illustrated by Judge Vince Chhabria's observations during a recent court hearing, where he questioned the fairness of allowing companies to profit from unlicensed use of copyrighted material.

The outcome of these ongoing legal disputes may shape the landscape of AI development, potentially creating precedents that either facilitate or hinder the use of copyrighted materials for the training of generative models. As opinions clash and new evidence emerges, the contentious dialogue about copyright, creativity, and technology is far from settled.

📌 Reference Map:

Source: Noah Wire Services