Full Report
Understand how an AI agent hacked McKinsey’s internal AI platform ‘Lilli’, and the lessons organizations should take from this exercise. The post How an AI Agent Hacked McKinsey’s AI Platform appeared first on Outpost24.
Analysis Summary
# Incident Report: Autonomous Indirect Prompt Injection against McKinsey ‘Lilli’
## Executive Summary
In March 2026, security researchers from CodeWall demonstrated a successful compromise of ‘Lilli,’ McKinsey’s internal AI platform. By leveraging an autonomous AI agent to perform indirect prompt injection, the researchers bypassed security safeguards to exfiltrate proprietary data and internal meeting transcripts. The incident highlights the critical vulnerability of Large Language Model (LLM) platforms when they are granted access to live web content and internal data stores without rigorous input sanitization.
## Incident Details
- **Discovery Date:** March 9, 2026
- **Incident Date:** March 2026
- **Affected Organization:** McKinsey & Company
- **Sector:** Management Consulting
- **Geography:** Global / United States
## Timeline of Events
### Initial Access
- **Date/Time:** March 2026
- **Vector:** Indirect Prompt Injection via External Content
- **Details:** The researchers hosted a malicious "lure" document on a public website. When the McKinsey AI platform (Lilli) was asked to summarize or interact with this external URL, it ingested hidden instructions embedded within the webpage.
### Lateral Movement
- **Details:** The attack did not require traditional network lateral movement. Instead, the "movement" occurred within the LLM’s context window. The injected instructions directed Lilli to search its internal knowledge base, spanning McKinsey's proprietary research and internal databases, effectively pivoting from an external prompt to internal data access.
### Data Exfiltration/Impact
- **Details:** The AI agent utilized "Indirect Data Exfiltration." It instructed Lilli to encode sensitive internal information into the parameters of a URL (e.g., an image source or a tracking pixel). When Lilli attempted to render the summary for the user, it automatically made a request to the attacker-controlled server, transmitting the stolen data in the background.
### Detection & Response
- **How it was discovered:** Disclosed publicly by researchers at CodeWall after a successful demonstration.
- **Response actions taken:** General industry guidance suggests McKinsey and similar organizations remediate by restricting AI access to external URLs and implementing "human-in-the-loop" confirmations for data transmission.
## Attack Methodology
- **Initial Access:** Indirect Prompt Injection (via malicious web content).
- **Persistence:** Not applicable; the attack was session-based.
- **Privilege Escalation:** Exploited the AI's trusted identity to access restricted internal documents.
- **Defense Evasion:** Used hidden text/instructions within legitimate-looking web content to bypass standard prompt filters.
- **Credential Access:** Not required; utilized the AI's existing authenticated session.
- **Discovery:** Automated reconnaissance of internal databases via the AI’s tool-calling capabilities.
- **Lateral Movement:** Contextual pivoting between public web data and private internal repositories.
- **Collection:** Aggregation of internal meeting transcripts and proprietary strategy documents.
- **Exfiltration:** Image-tag exfiltration (encoding data into outbound HTTP GET requests).
- **Impact:** Unauthorized disclosure of highly sensitive intellectual property.
## Impact Assessment
- **Financial:** Not disclosed; potential loss of competitive advantage.
- **Data Breach:** Compromise of proprietary research and internal meeting records.
- **Operational:** Temporary loss of trust in internal AI tools; required revision of AI security architecture.
- **Reputational:** High; demonstrates that even sophisticated internal tools at top-tier firms are vulnerable to emerging AI-specific attack vectors.
## Indicators of Compromise
- **Network indicators:** Outbound requests to unknown/untrusted domains containing long, encoded strings in URL parameters (e.g., `hxxps[:]//attacker-site[.]com/pixel.png?data=[ENCODED_DATA]`).
- **Behavioral indicators:** AI agents requesting access to external URLs followed immediately by high-volume internal database queries not initiated by the user.
## Response Actions
- **Containment measures:** Restricted the AI agent's ability to fetch live web content.
- **Eradication steps:** Updated system prompts to include stricter "System Message" instructions to ignore formatting commands from external data.
- **Recovery actions:** Auditing of all AI logs to determine if other external prompts had triggered similar behavior.
## Lessons Learned
- **The "Confused Deputy" Problem:** LLMs cannot inherently distinguish between developer instructions and data retrieved from the web.
- **Implicit Trust is Fatal:** Granting AI agents broad read/write access to internal data without granular permissions creates a massive single point of failure.
- **Monitoring Gaps:** Standard EDR/NDR tools may not catch prompt injection; AI-specific logging is required.
## Recommendations
- **Treat AI as a Privileged User:** Apply the principle of least privilege to the AI’s service account.
- **Air-Gap External and Internal Loops:** Do not allow an AI to browse the live web and access sensitive internal databases in the same execution context.
- **Sanitize RAG Inputs:** Treat any data retrieved via Retrieval-Augmented Generation (RAG) as untrusted user input.
- **Implement Content Security Policies (CSP):** Prevent AI interfaces from loading images or resources from unauthorized third-party domains to block "pixel tracking" exfiltration.