A significant legal challenge has been launched against OpenAI and Microsoft by nine newspaper publishers, who have filed a lawsuit alleging copyright infringements related to the training of large language models. The lawsuit, potentially worth over $10 billion, was filed with the U.S. District Court for the Southern District of New York and targets the use of copyrighted materials to create AI-driven products like ChatGPT and Microsoft’s Copilot.

The plaintiffs in the case include notable regional publishers such as California Newspapers Partnership, Prairie Mountain Publishing Company LLP, MNG-BH Acquisition LLC, Hartford Courant Company LLC, The Daily Press LLC, The Morning Call LLC, Virginia-Pilot Media LLC, Los Angeles Daily News Publishing Companies, and the San Diego Union-Tribune LLC. They accuse the defendants of “pilfering” hundreds of thousands of copyrighted works multiple times to train their AI models without permission or compensation.

Central to the complaint is the argument that the outputs generated by these AI systems do not provide users with direct links to the original publishers’ websites. Unlike traditional search engines that offer hyperlinks back to source material, the complaint states, “the synthetic output disguises the results as the work of the GPT system itself.” This omission, the publishers claim, deprives them of crucial web traffic, advertising revenue, and subscription fees.

Further, the lawsuit asserts that the AI models have effectively ‘memorized’ their copyrighted articles to generate synthetic responses, and that keeping the AI updated necessitates continual ingestion of new content, again taken without authorisation. This ongoing use is said to threaten the sustainable business models of newspapers, which rely on paid subscriptions and online engagement.

This lawsuit joins a mounting list of legal actions against AI companies, with previous filings by other publishers consolidated in the same court and surviving initial dismissals. Publications involved in earlier cases include prominent newspapers like the New York Daily News and the Chicago Tribune. These cases revolve around the tension between rapidly advancing AI technologies and traditional content creators seeking to protect their intellectual property and revenue streams.

Industry observers view this litigation as part of a larger, complex debate about copyright in the digital age, especially regarding the training of AI on vast datasets scraped from the internet. While AI developers argue that the use of such data is transformative and falls under fair use, publishers contend that this circumvents established copyright laws and undermines the economic foundation of journalism.

As the case proceeds, the outcomes could have profound implications for AI development practices and the legal boundaries of content usage. They may also influence how AI-generated content attributes and compensates original creators, shaping the future interplay between technology companies and traditional media outlets.

📌 Reference Map:

  • [1], [2] (MediaPost) - Paragraphs 1, 2, 3, 4, 5, 6, 7, 8
  • [3], [6] (CNBC, Fox Business) - Paragraphs 1, 3, 4, 5, 6
  • [4], [5], [7] (AP News, The Guardian, KPBS) - Paragraphs 3, 4, 6

Source: Noah Wire Services