Full Report
Microsoft is working to address an ongoing incident preventing customers from setting up multi-factor authentication (MFA) or accessing the My Sign-Ins platform. [...]
Analysis Summary
# Incident Report: Microsoft MFA Setup and My Sign-Ins Service Outage
## Executive Summary
Microsoft experienced a significant service disruption preventing users from configuring multi-factor authentication (MFA) and accessing the My Sign-Ins platform. The incident was triggered by a faulty cache configuration change that led to resource exhaustion during a failover event, resulting in 504 Gateway Timeout errors for users. The issue was resolved by rolling back the configuration change and failing back to the original infrastructure.
## Incident Details
- **Discovery Date:** June 1, 2026
- **Incident Date:** June 1, 2026
- **Affected Organization:** Microsoft (Microsoft 365 / Entra ID)
- **Sector:** Information Technology / Cloud Services
- **Geography:** Global (specifically noted during EU peak traffic)
## Timeline of Events
### Initial Access
- **Date/Time:** Approximately 5:00 AM ET, June 1, 2026
- **Vector:** Internal Configuration Change
- **Details:** A recent cache configuration update was deployed to the production environment, which ultimately necessitated a service failover.
### Lateral Movement
- **N/A:** This was a functional service outage/stability incident, not a secondary compromise or breach involving lateral movement.
### Data Exfiltration/Impact
- **Functional Impact:** Users were unable to access hxxp[://]mysignins[.]microsoft[.]com or complete MFA registration.
- **Error Codes:** Affected users encountered "504 Gateway Timeout" errors.
### Detection & Response
- **05:00 AM ET:** Microsoft acknowledges the incident and begins investigation.
- **Discovery:** Telemetry identified high CPU and memory utilization on alternative infrastructure during an EU traffic peak.
- **Response:** Microsoft initially failed over to alternate healthy infrastructure.
- **Final Mitigation:** After identifying the root cause, Microsoft rolled back the cache configuration changes and restored traffic to the original infrastructure.
- **08:41 AM ET:** Microsoft confirms full restoration of services.
## Attack Methodology
*Note: This incident appears to be an operational failure rather than a malicious cyber-attack.*
- **Initial Access:** Authorized administrative configuration change.
- **Persistence:** N/A.
- **Privilege Escalation:** N/A.
- **Defense Evasion:** N/A.
- **Credential Access:** N/A.
- **Discovery:** Tracking of service telemetry and elevated error rates.
- **Lateral Movement:** N/A.
- **Collection:** N/A.
- **Exfiltration:** N/A.
- **Impact:** Resource exhaustion (High CPU/Memory) leading to Denial of Service (DoS) for MFA management features.
## Impact Assessment
- **Financial:** Undisclosed; potential productivity loss for enterprise customers unable to onboard new users.
- **Data Breach:** None reported; no unauthorized access to customer data occurred.
- **Operational:** High; disruption to security synchronization and MFA enrollment workflows.
- **Reputational:** Moderate; follows a series of recent outages affecting Teams and Outlook.
## Indicators of Compromise
- **Network Indicators:** 504 Gateway Timeout errors when accessing hxxp[://]mysignins[.]microsoft[.]com.
- **File Indicators:** N/A.
- **Behavioral Indicators:** High CPU and memory utilization spikes on authentication infrastructure during peak traffic hours.
## Response Actions
- **Containment:** Redirected traffic to alternate infrastructure to isolate the failing nodes.
- **Eradication:** Identification and rollback of the faulty cache configuration change (MO1329260).
- **Recovery:** Restoration of traffic to original infrastructure and monitoring of service telemetry for stability.
## Lessons Learned
- **Scalability of Failover:** Failover infrastructure must be adequately provisioned to handle peak regional traffic (e.g., EU peak hours) to avoid resource exhaustion.
- **Configuration Testing:** Cache configuration changes can have cascading effects on resource utilization that may not be apparent until high-traffic periods.
## Recommendations
- **Load Testing:** Perform stress testing on failover paths specifically simulating peak regional loads before deploying configuration changes.
- **Staged Rollouts:** Implement a more granular "canary" deployment for configuration changes affecting core identity services to limit the "blast radius" of a failure.
- **MFA Resiliency:** Ensure organizations have documented bypass procedures or temporary access passes (TAP) available for emergency user onboarding during MFA service outages.