Amazon Web Services (AWS) has confirmed that its systems have returned to normal after a massive 15-hour outage that disrupted a vast array of internet services worldwide. The failure, centred on AWS's US-EAST-1 data centre in northern Virginia, triggered chaos across industries, affecting payments, websites, apps, and online platforms that heavily rely on AWS infrastructure.

The incident began early on October 20, 2025, and was fully resolved by 6 p.m. Eastern Time, according to Amazon's health dashboard. Engineers started to see recovery within three hours, but the restoration process remained slow and uneven, with some services continuing to deal with backlog processing after the main fault was fixed. Amazon has promised a detailed explanation of the outage in due course.

The root cause was attributed to a malfunction within a subsystem monitoring the health of AWS's network load balancers inside its Elastic Compute Cloud (EC2) internal network. This triggered a Domain Name System (DNS) failure that blocked access to DynamoDB’s API, cascading through numerous essential services. The reach of the outage was vast: over 11 million user issues were reported, with disruptions hitting household names like Snapchat, Reddit, Zoom, Venmo, Netflix, Disney+, Robinhood, Coinbase, as well as Amazon’s own services such as Ring and Alexa. Even educational platforms like Canvas were affected, leaving students at institutions like Ohio State University and the University of California, Riverside unable to access assignments.

In the UK, Lloyds Banking Group customers faced payment difficulties, and the HMRC website was taken offline, highlighting how critical industries were directly impacted. Financial platforms, social media, gaming, streaming, and e-commerce sectors all experienced significant interruptions, underscoring society’s increasing reliance on a concentrated infrastructure dominated by a few cloud providers.

This was not AWS’s first major incident. The US-EAST-1 data centre region has been the site of similar large-scale outages in 2017, 2020, 2021, and June 2023. Previous disruptions have shown how the company’s infrastructure, while vast and technically advanced, remains vulnerable to cascading failures. Industry experts have cautioned that the centralisation of cloud infrastructure creates single points of failure, worsening the impact when problems arise.

Jake Moore, global cybersecurity adviser at ESET, stated that the outage “once again highlights the dependency we have on relatively fragile infrastructures,” a sentiment echoed by BBC technology reporter Shiona McCallum, who noted the increasing pressure on cloud services due to growing demand. Cornell University computer science professor Ken Birman pointed out that many companies relying on AWS have not invested adequately in protection systems or reliable backups, urging firms to bolster their resilience to avoid business paralysis during outages.

The outage also reignited debate over the risks inherent in dependence on a few dominant cloud providers. AWS commands about 30 percent of the global cloud infrastructure services market, far ahead of Microsoft and Google Cloud. Despite the disruption, Amazon’s stock price rose 1.6%, indicating strong investor confidence in the company's overall market strength.

Looking ahead, experts like Bob Venero, CEO of Future Tech Enterprise, warn that as artificial intelligence workloads grow within enterprises using public clouds, outages like this are likely to become more frequent. Venero predicts an increase in cloud service interruptions as AI capabilities expand, a concern aligned with AWS’s recent multi-billion-dollar investments to build AI-focused data centers worldwide, including $20 billion in Pennsylvania and $11 billion in Georgia announced in 2025.

The outage exposed the ripple effects across interconnected services: even those that did not go offline experienced increased latency and elevated error rates, diminishing the user experience. Applications that rely on AWS’s databases, message queues, or serverless functions reported timeouts and errors, highlighting the risks businesses face when critical systems depend too heavily on a single cloud infrastructure.

While AWS has worked to restore services promptly, the incident serves as a wake-up call to the digital ecosystem about the vulnerabilities lying beneath its foundational infrastructure and the importance of building more robust, distributed, and resilient systems.

📌 Reference Map:

Source: Noah Wire Services