Full Report
Researchers at Wiz uncovered a publicly accessible database belonging to Chinese GenAI provider DeepSeek that leaked sensitive data, including chat history
Analysis Summary
# Incident Report: DeepSeek Exposed Database Vulnerability
## Executive Summary
Cybersecurity firm Wiz discovered an infrastructure vulnerability in the AI chatbot provider DeepSeek's database. The exposed ClickHouse database contained sensitive internal data, including user chat histories, API keys, and operational backend details. Upon disclosure by Wiz researchers, DeepSeek promptly remediated the exposure.
## Incident Details
- **Discovery Date:** January 29, 2025 (Date of Wiz report publication)
- **Incident Date:** Prior to January 29, 2025
- **Affected Organization:** DeepSeek (AI Chatbot Provider)
- **Sector:** Artificial Intelligence, Technology Services
- **Geography:** China (Origin of the company)
## Timeline of Events
### Initial Access
- **Date/Time:** Not explicitly stated, pre-dating January 29, 2025.
- **Vector:** Infrastructure vulnerability leading to misconfigured database exposure.
- **Details:** A ClickHouse database, used for service usage monitoring and data storage, was left exposed to the public internet without adequate access controls.
### Lateral Movement
- *Not explicitly detailed in the source; the focus was on direct access to the exposed database.*
### Data Exfiltration/Impact
- **Data Exposed:** Sensitive data including user chat histories, API keys, and backend operational details.
### Detection & Response
- **Detection:** Discovered by researchers from cloud security firm Wiz.
- **Response actions taken:** Wiz disclosed the findings to DeepSeek, and DeepSeek promptly secured the exposure.
## Attack Methodology
- **Initial Access:** Configuration/Infrastructure vulnerability (Exposed ClickHouse database).
- **Persistence:** N/A (Likely an unauthenticated data leak rather than persistence mechanism implementation).
- **Privilege Escalation:** N/A
- **Defense Evasion:** N/A
- **Credential Access:** API keys were exposed within the database contents.
- **Discovery:** N/A (Wiz researchers externally identified the exposed resource).
- **Lateral Movement:** N/A
- **Collection:** Direct reading/downloading of data from the open database.
- **Exfiltration:** Direct retrieval of data from the exposed database.
- **Impact:** Data leakage.
## Impact Assessment
- **Financial:** Not specified.
- **Data Breach:** Sensitive internal information, user chat histories, and API keys were exposed.
- **Operational:** Disruption related to the required remediation (promptly secured).
- **Reputational:** Quick scrutiny from cybersecurity experts due to the exposure of core operational and user data during its early stages as a new AI provider.
## Indicators of Compromise
- **Network indicators:** None specified (Defanged: DB_TYPE//ClickHouse_Exposed_Port).
- **File indicators:** None specified.
- **Behavioral indicators:** Unauthorized access/read operations against the ClickHouse database instance.
## Response Actions
- **Containment measures:** DeepSeek secured the database exposure immediately following notification.
- **Eradication steps:** Not detailed, but implied steps included restricting external access to the ClickHouse instance.
- **Recovery actions:** Not detailed, but involved ensuring the integrity and security of the system post-patching.
## Lessons Learned
- **Key takeaways:** Cloud/database configuration management remains a critical point of failure, even for emerging technology providers. Sensitive information (API keys, user data) must be rigorously protected and never exposed via configuration errors.
- **What could have been done better:** Proactive internal security auditing and adherence to secure-by-default database configurations.
## Recommendations
- Implement strict network access controls (e.g., firewall rules, VPC configurations) to prevent direct public internet exposure of backend database systems like ClickHouse.
- Conduct regular external penetration testing and vulnerability assessments focusing on cloud infrastructure configuration drifts.
- Implement automated secret scanning in configuration files and development environments to prevent API keys from reaching production environments improperly secured.