Seattle: The Amazon outage this week was caused by a bug in its automation software, leaving services from Signal to smart beds offline for several hours.
In a detailed post, AWS outlined a cascading series of events that triggered the outage, taking down thousands of websites and applications relying on its cloud infrastructure.
The Amazon outage primarily affected DynamoDB, AWS’s database system where customers store data, due to “a latent defect within the service’s automated DNS [domain name system] management system.”
DynamoDB manages hundreds of thousands of DNS records and uses automation to monitor the system, ensuring updates, capacity adjustments, hardware failures, and efficient traffic distribution.

AWS traced the root cause of the Amazon outage to an empty DNS record in its Virginia-based US-East-1 data center. The automation system failed to repair the record automatically, requiring manual operator intervention to resolve the issue.
To prevent further problems, AWS disabled the DynamoDB DNS planner and DNS enactor automation worldwide while addressing the underlying conditions and adding extra protections. Other AWS tools were also impacted during the outage.
Platforms and services affected by the Amazon outage included Signal, Snapchat, Roblox, Duolingo, banking sites, and Ring, with Downdetector reporting over 8.1 million problem reports globally from users across more than 2,000 companies. While services were restored in a matter of hours, the outage caused widespread disruption.
Even connected devices such as Eight Sleep smart beds were impacted. Users were unable to adjust bed temperature or incline via the app during the Amazon outage. CEO Matteo Franceschetti apologized and rolled out an update enabling users to control critical bed functions via Bluetooth in the event of future outages.

Experts noted that the Amazon outage highlighted the world’s dependence on single points of failure in the cloud.
Dr Suelette Dreyfus, a computing and information systems lecturer at the University of Melbourne, said that the outages showed how dependent the world was on single points of failure on the internet.
“That single point isn’t just AWS – they’re the biggest cloud provider with 30 percent or so of the market – but rather the cloud as a whole, which is basically just three companies,” Dr. Dreyfus added.
The Amazon outage serves as a reminder of the vulnerability of cloud infrastructure and the far-reaching effects that automation failures can have on businesses, apps, and connected devices worldwide.

