The Internet Archive’s Wayback Machine is facing a growing backlash from publishers worried that archived material can be repurposed by AI firms, a shift that could make parts of the web’s memory harder to reach. Reporting by Nieman Lab says 241 news sites across nine countries now explicitly block at least one of the Internet Archive’s crawling bots, with the largest share coming from USA Today Co., formerly known as Gannett.

The dispute reflects a collision between two once-compatible internet ideals: preserving public records and protecting content from unauthorised scraping. According to Nieman Lab, The New York Times has confirmed it is actively blocking the Archive’s crawlers, while The Guardian has taken a more selective approach, keeping open some access but tightening restrictions around its material. The Internet Archive itself has acknowledged taking steps to limit bulk access to parts of its libraries, after earlier incidents in which AI companies were said to have overloaded its systems.

The scale of the restriction is striking. Nieman Lab said 87 per cent of the sites in its sample that block the Archive are owned by USA Today Co., and that most of the affected publishers use the same two blocks in their robots.txt files. The report also found that 93 per cent of the publishers studied restrict at least two of the four bots associated with the Archive, while some outlets, including Le Monde and its English-language edition, have gone further and blocked three.

For defenders of the Wayback Machine, the concern is that journalists, historians and ordinary readers could lose access to an increasingly fragile digital record. The Internet Archive has spent nearly three decades building what is effectively a public memory bank for the web, and critics of the new blocking wave argue that limiting it may solve a short-term AI problem at the cost of long-term access. As Nieman Lab notes, there is no federal requirement forcing websites to preserve their material, which leaves the Archive as one of the few robust backstops for online history.

Source Reference Map

Inspired by headline at: [1]

Sources by paragraph:

Source: Noah Wire Services