Full Report
AI goes off the rails … because of shoddy guardrails Researchers at Pen Test Partners found four flaws in Eurostar's public AI chatbot that, among other security issues, could allow an attacker to inject malicious HTML content or trick the bot into leaking system prompts. Their thank you from the company: being accused of "blackmail."…
Analysis Summary
# Incident Report: Eurostar AI Chatbot Vulnerabilities
## Executive Summary
Pen Test Partners discovered four critical flaws in Eurostar's public AI chatbot stemming from inadequate security guardrails. These vulnerabilities allowed researchers to perform prompt injection to leak system prompts and disclose internal HTML generation logic, and also posed a significant risk of Cross-Site Scripting (XSS) via HTML injection in the chat history. The disclosure process was complicated by delayed responses and an accusation of blackmail from Eurostar security personnel.
## Incident Details
- **Discovery Date:** June 11, 2025 (Initial Report)
- **Incident Date:** Ongoing exploitation simulation beginning prior to June 11, 2025.
- **Affected Organization:** Eurostar (High-speed rail service)
- **Sector:** Transportation / Travel Technology
- **Geography:** Not explicitly stated, implied UK/European operations.
## Timeline of Events
### Initial Access
- **Date/Time:** June 11, 2025
- **Vector:** Vulnerability Disclosure Program (VDP) submission via email.
- **Details:** Researchers submitted a report detailing four major flaws found in the public-facing AI chatbot.
### Follow-up and Escalation
- **Date/Time:** June 18, 2025
- **Vector:** Follow-up via VDP channel.
- **Details:** Researchers followed up after receiving no response to the initial June 11 report. Still no response.
- **Date/Time:** July 7, 2025
- **Vector:** Direct contact via LinkedIn.
- **Details:** Managing partner contacted Eurostar's head of security on LinkedIn.
- **Date/Time:** Approx. July 14, 2025
- **Vector:** Direct communication.
- **Details:** Eurostar security instructed the researchers to use the VDP (which they had already done).
- **Date/Time:** July 31, 2025
- **Vector:** Communication regarding report status.
- **Details:** Researchers were informed there was no record of their initial bug report, suggesting reports were lost during an outsourced VDP transition.
### Remediation and Public Disclosure
- **Date/Time:** Post-July 31, 2025 (Implied)
- **Vector:** Internal patching efforts.
- **Details:** Eurostar located the original submission and patched "some" of the flaws.
- **Date/Time:** Week of December 24, 2025
- **Vector:** Public disclosure.
- **Details:** Pen Test Partners published their findings after the protracted disclosure period. During the process, Eurostar's Head of Security allegedly accused the researchers of "blackmail."
### Lateral Movement
- Not applicable in this context, as testing focused on exploiting the application layer (the chatbot API/frontend interaction).
### Data Exfiltration/Impact
- **Techniques demonstrated:** Prompt Injection used to leak system prompts and reveal internal HTML generation logic. HTML Injection demonstrated as a vector for phishing or malicious code delivery.
- **Potential Impact:** Stored or Shared Cross-Site Scripting (XSS) due to lack of backend verification of Conversation/Message IDs combined with HTML injection.
## Attack Methodology
- **Initial Access:** Direct user interaction with the AI chatbot interface.
- **Persistence:** Not applicable; exploit demonstrated successful single-turn execution of injected commands.
- **Privilege Escalation:** N/A.
- **Defense Evasion:** Leveraging the application logic flaw where only the *newest* message is strictly validated, allowing manipulation of *previous* messages in the transmitted chat history.
- **Credential Access:** Not explicitly achieved, but potential for session hijacking via demonstrated Stored XSS vulnerability.
- **Discovery:** Analysis of how the frontend relays chat history to the API and how the server validates messages.
- **Lateral Movement:** N/A.
- **Collection:** System prompts and internal application response logic were collected via Prompt Injection.
- **Exfiltration:** Demonstration of capacity to inject deceptive content (e.g., phishing links).
- **Impact:** Reputational damage, potential for user session hijacking/phishing if exploited by a malicious actor.
## Impact Assessment
- **Financial:** Not explicitly available, costs associated with remediation and reputational damage.
- **Data Breach:** **Potential** for user data compromise if the chatbot were integrated with PII or account details (as warned by researchers), though no actual PII breach was reported. System prompts were exposed.
- **Operational:** Minimal operational disruption reported, though the VDP process itself was severely disrupted.
- **Reputational:** Significant reputational damage following the public disclosure and the subsequent allegation of "blackmail" leveled against the researchers by Eurostar security staff.
## Indicators of Compromise
- *Note: As this was a controlled research disclosure, definitive malicious IOCs are not provided. Observations relate to vulnerable input methods.*
- **Behavioral indicators:** Successful prompt injection resulting in output that bypasses standard safety responses; receipt of system-level instruction feedback from the bot.
- **Input Vectors:** Messages containing specific HTML tags or specially crafted strings within historical chat context designed to trick the parsing mechanism.
## Response Actions
- **Containment measures:** Eurostar patched "some" of the disclosed flaws.
- **Eradication steps:** Unknown if all four flaws were fully eradicated.
- **Recovery actions:** Unknown.
## Lessons Learned
- **Guardrail Shoddy Implementation:** Relying on client-side or partial validation (only validating the latest message signature) while transmitting the entire, mutable chat history allows for manipulation of prior context.
- **VDP Management Critical:** Outsourcing VDP without robust transition planning led to the loss (or severe delay) of critical security reports.
- **Disclosure Process Failure:** The initial lack of response and the subsequent accusation of "blackmail" significantly damaged the relationship between the responsible discoverers and the organization, potentially delaying fixes and harming reputation.
- **Architecture Flaws:** Lack of verification on conversation/message IDs combined with input injection vectors strongly suggests a path to Stored XSS.
## Recommendations
- Implement strict **server-side validation** on all input fields and conversation history segments, ensuring messages are validated individually and their integrity is maintained against manipulation.
- Establish robust, resilient **Vulnerability Disclosure Program (VDP)** handling protocols, including clear redundancy and escalation paths, especially during vendor transitions.
- Immediately remediate **Stored XSS** risks by ensuring all user-supplied data rendered in the application (especially on shared views) is properly encoded/sanitized, and that message IDs are uniquely signed and validated server-side.
- Adopt a **security-first culture** during disclosure processes to ensure researchers are treated professionally, avoiding accusations that could impede remediation or discourage future good-faith reporting.