Full Report
AWS outage has taken down millions of websites, including Amazon.com, Prime Video, Perplexity AI, Canva and more. [...]
Analysis Summary
# Incident Report: Major AWS Service Outage Disrupts Global Services
## Executive Summary
A high-impact service disruption occurred within the Amazon Web Services (AWS) US-EAST-1 Region, starting on October 20, 2025. The primary cause was initially traced to a DNS resolution issue concerning the DynamoDB API endpoint, which later evolved into problems with network load balancers and associated internal monitoring subsystems. This outage severely impacted numerous high-profile services relying on AWS, including Amazon.com, Prime Video, Fortnite, Canva, and Perplexity AI, causing widespread login failures and operational downtime for roughly an hour before significant recovery began.
## Incident Details
- Discovery Date: October 20, 2025 (Approx. 30 minutes before first services confirmation)
- Incident Date: October 20, 2025
- Affected Organization: Amazon Web Services (AWS) and its customers (including Amazon, Epic Games/Fortnite, Canva, Perplexity AI, etc.)
- Sector: Cloud Computing / Technology / E-commerce / Gaming
- Geography: Global, explicitly mentioning impact in the United States and Europe (US-EAST-1 Region primary focus).
## Timeline of Events
### Initial Access
- Date/Time: October 20, 2025 (Start time unclear, initial confirmation approx. 30 minutes prior to 4:24 AM EDT reporting)
- Vector: Internal infrastructure failure (DNS resolution issue).
- Details: Increased error rates and latencies confirmed across multiple AWS Services in the US-EAST-1 Region.
### Lateral Movement
- Not applicable (This was an infrastructure availability incident, not a cyber intrusion).
### Data Exfiltration/Impact
- Impact: Widespread service unavailability, login failures (e.g., Fortnite, Canvas), and operational disruption for dependent services (Amazon.com, Prime Video, Perplexity, Roblox, Hulu, Robinhood, Grammarly).
### Detection & Response
- Date/Time: Discovery confirmed around 4:24 AM EDT, 10/20/25.
- Response actions taken: AWS acknowledged the major disruption, investigated the root cause (initially DNS), applied mitigation steps for the DNS issue, and later applied additional mitigation steps for network load balancer monitoring subsystems. Initial recovery observed after 45 minutes.
## Attack Methodology
- Initial Access: Infrastructure Failure (DNS).
- Persistence: N/A
- Privilege Escalation: N/A
- Defense Evasion: N/A
- Credential Access: N/A
- Discovery: N/A
- Lateral Movement: N/A
- Collection: N/A
- Exfiltration: N/A
- Impact: Denial of Service/Service Unavailability due to API endpoint failures and load balancer monitoring issues.
## Impact Assessment
- Financial: Unknown, but significant due to the scale of affected platforms (e.g., Amazon eCommerce, cloud clients).
- Data Breach: No data breach was indicated in the report; the incident was operational/availability-focused.
- Operational: Significant disruption to hundreds of dependent services globally, including critical functions like eCommerce, streaming, and educational platforms (Canvas).
- Reputational: Negative impact on AWS's reputation for reliability, as major customers publicly confirmed outages linked to AWS infrastructure failures.
## Indicators of Compromise
- Network indicators: Increased error rates and latencies for multiple AWS Services in the US-EAST-1 Region.
- File indicators: N/A
- Behavioral indicators: Failures in DNS resolution for the DynamoDB API endpoint; subsequent issues impacting network load balancer monitoring.
## Response Actions
- Containment measures: AWS worked to mitigate the DNS resolution issue in US-EAST-1.
- Eradication steps: Identification and application of mitigation steps for the internal subsystem responsible for monitoring network load balancers.
- Recovery actions: Services began showing recovery approximately 45 minutes after initial reports. Full restoration claimed later that morning, though secondary load balancer issues persisted temporarily.
## Lessons Learned
- Key takeaways: Critical dependence of high-profile global services on specific, single AWS regions (US-EAST-1) exposes users to massive, cascading availability risks from underlying infrastructure failures.
- What could have been done better: Faster identification and mitigation of the subsequent load balancer monitoring subsystem failure after resolving the initial DNS issue.
## Recommendations
- Prevention measures for similar incidents: Customers should enhance redundancy by leveraging multi-region architecture, especially for mission-critical authentication or transaction services. AWS needs to review resiliency of core internal monitoring systems that affect multiple dependent services.