Politics

Legal uncertainty reshapes AI data strategies amid copyright disputes

Thursday, 23 April 2026 7:30PM UTC

As courts deliver conflicting rulings on fair use, AI companies pivot to licensed and synthetic data, signalling a shift towards more regulated and cautious AI development.

Generative AI has become as much a contest over data as over algorithms. As models improve, the quality, breadth and legality of the material used to train them are increasingly shaping which companies can move fastest and which must slow down to manage risk. That shift has made proprietary datasets, licensing deals and compliance strategy central to the next phase of AI development.

The dispute is now colliding with copyright law. In June 2025, judges in California took different approaches in two closely watched cases, with one ruling that training on copyrighted books could qualify as transformative fair use, while still leaving room for claims tied to pirated copies, and another finding that Meta’s use of books from shadow libraries was sufficiently transformative and did not harm the market. Earlier, the Delaware federal court also rejected a fair use defence in the Thomson Reuters v. ROSS Intelligence case, underscoring that the legal landscape remains unsettled and highly fact-specific.

That uncertainty is pushing technology firms away from indiscriminate scraping and towards more controlled data strategies. Companies are investing in licensed material, internal datasets, synthetic data and other forms of data augmentation in order to reduce exposure while preserving model performance. For larger players, that can mean exclusive partnerships and negotiated rights; for smaller companies, it raises the bar for entry because compliant data can be expensive to secure at scale.

Regulators and policymakers are adding to the pressure. The U.S. Copyright Office has indicated that some AI training uses may fall within fair use, but it has also warned of possible market harm to creators and pointed to voluntary licensing as a practical response. A Congressional Research Service brief similarly notes that fair use is a flexible doctrine that turns on context, including purpose, amount used and impact on the market. In this environment, companies are folding legal review deeper into product development rather than treating it as a final checkpoint.

The broader debate is no longer limited to copyright alone. Publishers and other rights holders are also pressing for clearer consent and compensation, while businesses are being asked to address bias, transparency and accountability in the datasets that shape their systems. Surveys cited in the sponsored material suggest public unease about trusting AI, reinforcing the case for stronger governance. The result is a more cautious, more commercial and more legally aware AI industry, where access to data may matter as much as model design itself.

Source Reference Map

Inspired by headline at: ^[1]

Sources by paragraph:

Paragraph 1: ^[2], ^[3]
Paragraph 2: ^[2], ^[4], ^[7]
Paragraph 3: ^[1], ^[5], ^[7]
Paragraph 4: ^[4], ^[5], ^[7]
Paragraph 5: ^[1], ^[3], ^[5]

Source: Noah Wire Services

More on this

https://news.google.com/rss/articles/CBMiwAFBVV95cUxNYTlKbVZobERjUnU3Ri13azA0dlpvS0FXblc5QUd0VHZybndBbWc4eWxxUjBXanJoRnlnTEdQMDZHS0FuNDVHejNMMms2UkVoUXlXSEM4TDh5MnlveXpJc2p4RVNHWEhpSjhQU05FUHh3VDNVdkE3Z0l3cWZMdWJ6eU5EU05WLUZBSWhMR3hJbUh6ZU9ob2pIWmc0MEU2M1lHdkVqZGppcVpRY20xQ21CWmJ0bnduWXVHRW0wRDZXc0I?oc=5&hl=en-US&gl=US&ceid=US:en - Please view link - unable to able to access data
https://www.jonesday.com/en/insights/2025/06/two-us-courts-address-fair-use-in-genai-training-cases - In June 2025, two U.S. District Court judges in California addressed the use of copyrighted materials for training generative AI models. In Bartz v. Anthropic PBC, the court ruled that training AI with copyrighted books was a transformative fair use, likening it to human learning from books. However, the court denied summary judgment on liability for pirated copies, indicating potential market harm. In Kadrey v. Meta Platforms, the court found Meta's use of copyrighted books from 'shadow libraries' to train its AI models was highly transformative and did not cause market harm, granting summary judgment for Meta. These cases highlight the evolving legal landscape of AI training and fair use considerations.
https://www.foley.com/insights/publications/2024/02/generative-ai-systems-fair-use/ - The article discusses the intellectual property challenges faced by generative AI systems, particularly concerning copyright infringement. It highlights that copyright holders allege AI systems infringe their works both during the training phase, by ingesting large amounts of publicly available content, and in the output phase, where AI-generated content may replicate copyrighted material. The piece underscores the need for AI developers to navigate these legal complexities to ensure compliance and foster innovation.
https://www.congress.gov/crs-product/LSB10922 - This Congressional Research Service report examines the intersection of generative AI and copyright law. It outlines the four factors of fair use under the Copyright Act: purpose and character of use, nature of the copyrighted work, amount and substantiality of the portion used, and effect on the market. The report notes that while some uses may be considered fair, the application of these factors is context-dependent, and the Supreme Court has described fair use as a 'flexible' doctrine.
https://www.jonesday.com/en/insights/2025/05/us-copyright-office-issues-guidance-on-generative-ai-training - In May 2025, the U.S. Copyright Office released a prepublication version of Part Three of its Copyright and Artificial Intelligence Report, focusing on generative AI. The report discusses whether AI developers need permission to use third-party copyrighted works for training or if such use is justified as fair use. It concludes that while fair use may permit certain training uses, there is a risk of market harm to creators, and encourages voluntary licensing solutions over broad legislation.
https://www.dwt.com/blogs/artificial-intelligence-law-advisor/2025/06/courts-find-fair-use-in-gen-ai-training - This article examines two recent court decisions in the Northern District of California regarding the use of unlicensed materials to train generative AI models. In both cases, the courts found that training AI with copyrighted books was a fair use, but for different reasons and with different implications. The piece highlights the evolving legal landscape and the importance of understanding fair use in the context of AI training.
https://www.lowenstein.com/news-insights/publications/client-alerts/federal-court-rules-against-fair-use-defense-for-ai-training-ai - In February 2025, the U.S. District Court for the District of Delaware ruled in Thomson Reuters Enterprise Centre GmbH et al. v. ROSS Intelligence Inc. that the defendant's unauthorized use of the plaintiff's copyrighted materials to train its legal research AI tool was not fair use. This ruling is significant as it is the first reported case addressing whether the use of copyrighted training data for AI training constitutes fair use, highlighting the legal complexities surrounding AI and copyright.

Noah Fact Check Pro

The draft above was created using the information available at the time the story first emerged. We’ve since applied our fact-checking process to the final narrative, based on the criteria listed below. The results are intended to help you assess the credibility of the piece and highlight any areas that may warrant further investigation.

Freshness check

Score: 6

Notes: The article references court cases from June 2025, indicating a publication date of at least April 2026. The earliest known publication date for similar content is June 2025. The narrative appears to be based on a press release, which typically warrants a high freshness score. However, the presence of similar content across multiple sources raises concerns about originality. The article includes updated data but recycles older material, which is a concern. Given these factors, the freshness score is moderate.

Quotes check

Score: 4

Notes: The article includes direct quotes from court cases and legal experts. However, these quotes cannot be independently verified through the provided sources. No online matches were found for the exact wording of these quotes, raising concerns about their authenticity. Unverifiable quotes significantly reduce the credibility of the content.

Source reliability

Score: 5

Notes: The narrative originates from a press release, which is typically considered a less reliable source due to potential biases and lack of independent verification. The press release is summarising content from other publications, which may not be independently verified. Given these factors, the source reliability score is low.

Plausibility check

Score: 6

Notes: The article discusses recent court cases and legal interpretations, which are plausible and align with known developments in AI and copyright law. However, the lack of independent verification for key claims and quotes raises concerns about the accuracy of the information presented.

Overall assessment

Verdict (FAIL, OPEN, PASS): FAIL

Confidence (LOW, MEDIUM, HIGH): HIGH

Summary: The article relies on unverifiable quotes and originates from a press release, which raises significant concerns about its credibility. The lack of independent verification sources further diminishes trust in the content. Given these issues, the overall assessment is a FAIL.

Artificial Intelligence
Copyright Law
Data Privacy