Full Report
Experts for the cloud security firm pulled sensitive data from the service with simple SQL queries. The post Wiz researchers find sensitive DeepSeek data exposed to internet appeared first on CyberScoop.
Analysis Summary
# Incident Report: DeepSeek Sensitive Data Exposure via Public Database
## Executive Summary
Cloud security firm Wiz discovered a significant data exposure at Chinese AI firm DeepSeek due to a publicly accessible, unauthenticated ClickHouse database. This exposure allowed access to over a million lines of sensitive data, including user chat histories and API secrets, dating back to January 6th. DeepSeek secured the database promptly after researchers notified them, but the incident highlights security risks associated with rapid growth in the high-stakes AI industry.
## Incident Details
- Discovery Date: Early January 2025 (Date of publication is Jan 30, 2025, discovery was "earlier this month")
- Incident Date: Data exposure began as early as January 6, 2025.
- Affected Organization: DeepSeek (Chinese AI company)
- Sector: Artificial Intelligence (AI) / Technology
- Geography: China (Inferred, based on company origin)
## Timeline of Events
### Initial Access
- **Date/Time:** On or before January 6, 2025.
- **Vector:** Misconfiguration of an internet-facing ClickHouse database.
- **Details:** Researchers found two non-standard ports (8123 and 9000) leading to an exposed ClickHouse database hosted across two DeepSeek subdomains. **No authentication** was required.
### Lateral Movement
- **Attack Vector:** Not applicable in the traditional sense. The direct exposure of the database allowed researchers (and theoretically, threat actors) to execute arbitrary SQL queries directly against the exposed data store.
### Data Exfiltration/Impact
- **Data Exfiltrated:** Over a million lines of sensitive internal data, including:
- Plaintext chat histories between users and DeepSeek’s AI systems.
- API keys and cryptographic secrets.
- Server directory structures and operational metadata.
- References to internal API endpoints.
- **Potential Impact:** Attackers could theoretically use SQL commands to extract files directly from DeepSeek's servers, potentially leading to privilege escalation or corporate espionage.
### Detection & Response
- **Detection:** Discovered by Wiz researchers during routine reconnaissance of DeepSeek’s internet-facing assets.
- **Response Actions:** DeepSeek secured the database within hours of being notified by the researchers.
## Attack Methodology
This incident was characterized by a configuration oversight rather than active exploitation (though a threat actor could have exploited it).
- **Initial Access:** Public exposure of a database management system (ClickHouse) on non-standard ports without authentication.
- **Persistence:** Not applicable, as the vulnerability provided immediate access to the data store.
- **Privilege Escalation:** Possible, as arbitrary SQL queries could have been used to pivot deeper into the server environment.
- **Defense Evasion:** Not applicable; the exposure was entirely related to network configuration, not bypassing security controls on endpoints/applications.
- **Credential Access:** Direct access to sensitive credentials (API keys, cryptographic secrets) stored within the database layer.
- **Discovery:** Direct querying of the database via SQL.
- **Lateral Movement:** Direct file extraction from servers was theoretically possible via database commands.
- **Collection:** Extraction of structured and unstructured data via SQL queries.
- **Exfiltration:** Data readily available for extraction once queried.
- **Impact:** Exposure of sensitive operational secrets and user interaction data.
## Impact Assessment
- **Financial:** Not disclosed.
- **Data Breach:** Over one million lines of sensitive data, including user conversations, API secrets, and operational metadata.
- **Operational:** No widely reported operational disruption beyond the necessary remediation work.
- **Reputational:** Negative publicity regarding security posture, especially concerning its rapidly growing, cost-efficient R1 model.
## Indicators of Compromise
*Note: Indicators are based on the nature of the exposure.*
- **Network indicators:** traffic directed to ports 8123 or 9000 on specified DeepSeek subdomains.
- **File indicators:** N/A (The compromise was data access, not file-based malware).
- **Behavioral indicators:** Execution of non-standard, high-volume SQL queries against the ClickHouse instance.
## Response Actions
- **Containment measures:** DeepSeek secured the publicly exposed ClickHouse database within hours of notification.
- **Eradication steps:** Reconfiguration of the database access controls based on Wiz's findings.
- **Recovery actions:** Not detailed, assumed internal remediation of exposed secrets.
## Lessons Learned
- Rapid growth in the AI sector can lead to security practices lagging behind product deployment speed, making established security frameworks necessary.
- Publicly exposing core infrastructure data stores (even internal analytical databases like ClickHouse) without access controls poses a severe, immediate risk.
- Relying on vendors with rapidly deployed infrastructure requires increased scrutiny regarding default security configurations.
## Recommendations
- Implement stringent access control policies for all internal/analytical databases, ensuring no externally addressable interfaces remain unauthenticated.
- Conduct regular external and internal configuration audits focusing specifically on non-standard ports.
- Prioritize security framework implementation commensurate with the adoption rate and criticality of the technology being deployed (especially data-handling infrastructure).
- Review and rotate all exposed secrets, API keys, and certificates found within the exposed data sets.