Politics

Manhattan court orders OpenAI to release 20 million ChatGPT logs in landmark copyright case

Friday, 5 December 2025 12:28AM UTC

A Manhattan judge mandates OpenAI to disclose up to 20 million anonymised ChatGPT chat logs amid ongoing copyright litigation with major news organisations, raising pivotal questions on AI training, privacy, and industry regulation.

A federal judge in Manhattan has ordered OpenAI to produce up to 20 million anonymised ChatGPT chat logs in a copyright lawsuit brought by The New York Times and other news organisations. According to Reuters, U.S. Magistrate Judge Ona Wang found the de‑identification measures and protective orders sufficient to address privacy concerns and said the logs are directly relevant to the plaintiffs’ claim that ChatGPT reproduces copyrighted journalism. ^[1]^[2]

The suit, filed in 2023, alleges OpenAI and Microsoft trained their models on news organisations’ content without permission and that ChatGPT has on occasion regurgitated or closely reproduced that material. The plaintiffs argue the logs are necessary to establish instances of direct copying; OpenAI has maintained most chats are irrelevant and argued broad production would chill user trust. Industry reporting frames the order as part of a wider wave of litigation testing how copyright law applies to large‑scale AI training. ^[1]^[2]^[6]

OpenAI has repeatedly sought to limit disclosure, offering summaries or narrower samples and warning that full conversations, each log containing multiple prompt–response exchanges, could expose sensitive information. Ars Technica and Reuters reported that OpenAI emphasised more than 99.99% of the logs are unrelated to the case and called wholesale production at this scale unprecedented. Judge Wang rejected those narrowing arguments and set a tight production timeline. ^[3]^[5]^[2]

Privacy advocates and technical commentators have voiced concerns that de‑identified data can sometimes be re‑identified or reveal sensitive context, even when personal identifiers are removed. Reporting in WebProNews and Ars Technica notes that experts cautioned about the risks from large aggregated datasets, while the court and the plaintiffs say strict protective orders and anonymisation protocols will mitigate that risk. ^[1]^[5]

The ruling builds on international precedent and parallel litigation: Reuters highlighted recent foreign decisions finding OpenAI liable for reproducing copyrighted material such as song lyrics, and other jurisdictions are advancing similar publisher suits. Media organisations have filed numerous cases alleging unauthorised scraping and use of journalistic content, a trend that legal observers say could force AI firms to change training practices and licensing strategies. ^[1]^[2]

Publishers argue the outcome is a potential win for creators and a route to accountability and compensation; The New York Times and other plaintiffs characterise the case as necessary to prevent AI firms from unfairly deriving value from journalistic labour. Conversely, OpenAI and some technologists warn the order could create a chilling precedent for discovery in AI cases and impede innovation. Reuters and the New York‑led reporting capture both strands of reaction. ^[1]^[2]^[5]

Financially, analysts and trade reporting suggest the stakes are high: settlements or mandated licensing could run into the billions, reshape commercial relationships between AI firms and content owners, and encourage more licensed data partnerships. Coverage notes OpenAI has already pursued selective content deals but that critics say such arrangements do not resolve systemic questions about scale and scope of training data. ^[1]^[2]

Legally and regulatorily, the decision may prompt courts to demand greater internal data access in future AI disputes and could accelerate legislative and industry moves toward transparency and provenance for training datasets. Observers expect appeals from OpenAI and further litigation on scope, relevance and privacy, with the case likely to inform US and international approaches to AI governance. ^[2]^[3]^[4]

Ultimately, the order exposes a fault line between demands for evidentiary transparency in copyright enforcement and concerns about user privacy and corporate secrecy. As reporting across outlets shows, the ChatGPT logs dispute is far from finalised and will probably shape how AI systems are audited, licensed and governed going forward. ^[1]^[2]^[5]

📌 Reference Map:

##Reference Map:

^[1] (WebProNews) - Paragraph 1, Paragraph 2, Paragraph 4, Paragraph 6, Paragraph 9
^[2] (Reuters, 3 Dec 2025) - Paragraph 1, Paragraph 2, Paragraph 3, Paragraph 6, Paragraph 7, Paragraph 8, Paragraph 9
^[3] (Reuters, 12 Nov 2025) - Paragraph 3, Paragraph 8
^[4] (Reuters, 6 Jun 2025) - Paragraph 8
^[5] (Ars Technica) - Paragraph 3, Paragraph 4, Paragraph 9
^[6] (AP News) - Paragraph 2

Source: Noah Wire Services

More on this

https://www.webpronews.com/judge-orders-openai-to-disclose-20m-chatgpt-logs-in-nyt-copyright-case/ - Please view link - unable to able to access data
https://www.reuters.com/legal/government/openai-loses-fight-keep-chatgpt-logs-secret-copyright-case-2025-12-03/ - A U.S. federal judge has ruled that OpenAI must hand over 20 million anonymized ChatGPT chat logs in a copyright lawsuit brought by The New York Times and other media outlets. The decision by U.S. Magistrate Judge Ona Wang dismissed OpenAI’s concerns about user privacy, asserting that the de-identification measures and protections in place would sufficiently safeguard private data. The plaintiffs argue the logs are essential to prove that ChatGPT reproduced copyrighted material from their publications, while OpenAI insists the vast majority of chats are irrelevant and claims disclosure could compromise user trust and security. OpenAI has appealed the decision to a higher court, with the company reiterating its concerns via a prior blog post. The case, initiated by the Times in 2023, is part of a broader wave of legal battles confronting AI companies over the unauthorized use of copyrighted content in AI training. MediaNews Group, part of Alden Global Capital, also criticized OpenAI, accusing the firm of benefiting unfairly from journalism content. Judge Wang has ordered OpenAI to produce the de-identified chat logs within seven days.
https://www.reuters.com/business/media-telecom/openai-fights-order-turn-over-millions-chatgpt-conversations-2025-11-12/ - OpenAI has filed a request with a federal judge in New York to overturn an order that mandates it to release 20 million anonymized ChatGPT chat logs, as part of a copyright infringement lawsuit brought by The New York Times and other news organizations. OpenAI contends that disclosing these transcripts would compromise user privacy, arguing that the vast majority of conversations—99.99%—are unrelated to the claims. The lawsuit alleges OpenAI unlawfully used articles from the news outlets to train ChatGPT and seeks access to logs to validate these claims and counter OpenAI’s defense that evidence was artificially created by manipulating the chatbot. Magistrate Judge Ona Wang had previously ruled that privacy concerns would be mitigated through comprehensive anonymization measures, setting a deadline for OpenAI to submit the transcripts by Friday. OpenAI's Chief Information Security Officer emphasized in a blog post that complying with the order could expose highly personal conversations from users irrelevant to the lawsuit. The case is one of several legal challenges AI companies face over the alleged use of copyrighted material to train AI models.
https://www.reuters.com/business/media-telecom/openai-appeal-new-york-times-suit-demand-asking-not-delete-any-user-chats-2025-06-06/ - OpenAI is challenging a court order requiring it to indefinitely preserve all ChatGPT output data in an ongoing copyright lawsuit filed by The New York Times. The company argues that this mandate conflicts with its user privacy commitments. The order, issued last month, calls for the segregation and retention of ChatGPT logs after the Times requested their preservation. OpenAI CEO Sam Altman criticized the request, describing it as inappropriate and a threat to user privacy. On June 3, OpenAI formally asked U.S. District Judge Sidney Stein to vacate the order. The lawsuit, initiated in 2023, accuses OpenAI and Microsoft of using millions of Times articles without authorization to train their AI models. Judge Stein previously ruled that the Times provided enough evidence to support claims of copyright infringement, citing instances where ChatGPT replicated Times' content, and allowed the case to proceed. The Times has yet to comment on the recent appeal.
https://arstechnica.com/tech-policy/2025/11/openai-fights-order-to-hand-over-20-million-private-chatgpt-conversations/ - OpenAI wants a court to reverse a ruling forcing the ChatGPT maker to give 20 million user chats to The New York Times and other news plaintiffs that sued it over alleged copyright infringement. Although OpenAI previously offered 20 million user chats as a counter to the NYT’s demand for 120 million, the AI company says a court order requiring production of the chats is too broad. The logs at issue here are complete conversations: each log in the 20 million sample represents a complete exchange of multiple prompt-output pairs between a user and ChatGPT. OpenAI said today in a filing in US District Court for the Southern District of New York. Disclosure of those logs is thus much more likely to expose private information than individual prompt-output pairs, in the same way that eavesdropping on an entire conversation reveals more private information than a 5-second conversation fragment. OpenAI’s filing said that more than 99.99% of the chats have nothing to do with this case. OpenAI is unaware of any court ordering wholesale production of personal information at this scale, the filing said. This sets a dangerous precedent: it suggests that anyone who files a lawsuit against an AI company can demand production of tens of millions of conversations without first narrowing for relevance. This is not how discovery works in other cases: courts do not allow plaintiffs suing Google to dig through the private emails of tens of millions of Gmail users irrespective of their relevance. And it is not how discovery should work for generative AI tools either. A November 7 order by US Magistrate Judge Ona Wang sided with the NYT, saying that OpenAI must produce the 20 million de-identified Consumer ChatGPT Logs to News Plaintiffs by November 14, 2025, or within 7 days of completing the de-identification process. Wang ruled that the production must go forward even though the parties don’t agree on whether the logs must be produced in whole. OpenAI has failed to explain how its consumers’ privacy rights are not adequately protected by: (1) the existing protective order in this multidistrict litigation or (2) OpenAI’s exhaustive de-identification of all of the 20 million Consumer ChatGPT Logs. OpenAI’s filing today said the court order did not acknowledge OpenAI’s sworn witness declaration explaining that the de-identification process is not intended to remove information that is non-identifying but may nonetheless be private, like a Washington Post reporter’s hypothetical use of ChatGPT to assist in the preparation of a news article. The New York Times provided a statement today after being contacted by Ars. The New York Times’s case against OpenAI and Microsoft is about holding these companies accountable for stealing millions of copyrighted works to create products that directly compete with The Times. In another attempt to cover up its illegal conduct, OpenAI’s blog post purposely misleads its users and omits the facts. No ChatGPT user’s privacy is at risk. The court ordered OpenAI to provide a sample of chats, anonymized by OpenAI itself, under a legal protective order. This fear-mongering is all the more dishonest given that OpenAI’s own terms of service permit the company to train its models on users’ chats and turn over chats for litigation.
https://www.apnews.com/article/cc19ef2cf3f23343738e892b60d6d7a6 - A federal judge in New York has ruled that The New York Times and other newspapers can proceed with a copyright lawsuit against OpenAI and Microsoft. The lawsuit accuses them of using journalists' work to train AI systems without permission, which the media companies claim constitutes widespread copyright infringement and harms their business. While U.S. District Judge Sidney Stein dismissed some claims, the core allegations — including copyright violations — will continue. A consolidated lawsuit also includes MediaNews Group and Tribune Publishing. The Times alleges that generative AI, such as ChatGPT, has regurgitated its content verbatim, which threatens its business model. OpenAI responded positively to the partial dismissal, asserting that its data use aligns with fair use and innovation. Microsoft declined to comment on the ruling. The judge has not yet provided detailed reasoning but stated that it will follow soon. Separately, OpenAI has a licensing deal with the Associated Press, allowing access to part of its text archives.

Noah Fact Check Pro

The draft above was created using the information available at the time the story first emerged. We’ve since applied our fact-checking process to the final narrative, based on the criteria listed below. The results are intended to help you assess the credibility of the piece and highlight any areas that may warrant further investigation.

Freshness check

Score: 10

Notes: The narrative is current, with the latest developments reported on December 3, 2025. The earliest known publication date of substantially similar content is November 12, 2025, indicating recent coverage. The report is based on a court order, which typically warrants a high freshness score. No discrepancies in figures, dates, or quotes were found. No earlier versions show different information. The article includes updated data and does not recycle older material. No republishing across low-quality sites or clickbait networks was identified. No content was found that appeared more than 7 days earlier. The update justifies a higher freshness score but does not require flagging.

Quotes check

Score: 10

Notes: The report includes direct quotes from U.S. Magistrate Judge Ona Wang and OpenAI's Chief Information Security Officer, Dane Stuckey. The earliest known usage of these quotes is from the court order dated December 3, 2025, and OpenAI's blog post dated November 12, 2025. No identical quotes appear in earlier material, indicating original content. No variations in quote wording were found.

Source reliability

Score: 10

Notes: The narrative originates from reputable organisations: Reuters and OpenAI. Reuters is a well-established news agency, and OpenAI is a leading AI research organisation. Both have a strong public presence and legitimate websites, confirming the reliability of the sources.

Plausibility check

Score: 10

Notes: The narrative presents plausible claims, with time-sensitive information verified against recent online data. The claims are covered by multiple reputable outlets, including Reuters and Ars Technica. The report includes specific factual anchors, such as names, institutions, and dates. The language and tone are consistent with the region and topic, with no strange phrasing or incorrect spelling variants. The structure is focused and relevant, with no excessive or off-topic detail. The tone is appropriate for corporate and official language, with no unusual drama or vagueness.

Overall assessment

Verdict (FAIL, OPEN, PASS): PASS

Confidence (LOW, MEDIUM, HIGH): HIGH

Summary: The narrative is current, original, and sourced from reputable organisations. All claims are plausible and supported by specific factual anchors. No credibility risks were identified, and the language and tone are appropriate for the topic.

AI
Copyright Law
OpenAI
Legal Disputes