Full Report
Anthropic on Monday said it identified "industrial-scale campaigns" mounted by three artificial intelligence (AI) companies, DeepSeek, Moonshot AI, and MiniMax, to illegally extract Claude's capabilities to improve their own models. The distillation attacks generated over 16 million exchanges with its large language model (LLM) through about 24,000 fraudulent accounts in violation of its terms
Analysis Summary
# Incident Report: Industrial-Scale LLM Capability Extraction via Distillation Attacks
## Executive Summary
Anthropic detected "industrial-scale campaigns" orchestrated by three Chinese AI companies (DeepSeek, Moonshot AI, and MiniMax) aimed at illegally extracting the capabilities of the Claude large language model (LLM). Attackers used approximately 24,000 fraudulent accounts, generating over 16 million exchanges to train competitor models—a breach of Anthropic's terms of service. Anthropic attributed the attacks using metadata correlation and implemented enhanced detection and access controls to mitigate the ongoing threat.
## Incident Details
- Discovery Date: Monday (Date inferred from context: "Anthropic on Monday said it identified...")
- Incident Date: Ongoing/Multiple dates leading up to disclosure (specific start date not provided)
- Affected Organization: Anthropic
- Sector: Artificial Intelligence / Technology
- Geography: Attackers primarily based/attributed to China, using global infrastructure.
## Timeline of Events
### Initial Access
- Date/Time: Not specified, occurred over a period associated with "industrial-scale campaigns."
- Vector: Fraudulent accounts and commercial proxy services.
- Details: Attackers amassed approximately 24,000 fraudulent accounts, often utilizing "hydra cluster" architectures reselling API access to distribute traffic and blend legitimate usage with malicious queries.
### Lateral Movement
- *Not explicitly detailed for traditional network movement.* The progression involved escalating the volume of access across accounts to achieve "industrial-scale" data extraction.
### Data Exfiltration/Impact
- Over 16 million exchanges were performed with Claude.
- The goal was to extract specific, differentiated capabilities, including reasoning, rubric-based grading, agentic reasoning, tool use, and coding proficiency.
- Resulted in illicit capability transfer to competitor LLMs (DeepSeek, Moonshot AI, MiniMax).
### Detection & Response
- Detection Method: Anthropic built classifiers and behavioral fingerprinting systems to identify suspicious distillation attack patterns in API traffic. Attribution was achieved via request metadata, IP address correlation, and infrastructure indicators.
- Response Actions: Banned accounts, strengthened verification for specialized accounts (educational, security research), and deployed enhanced safeguards to reduce the efficacy of model outputs for illicit distillation.
## Attack Methodology
- **Initial Access:** Creation and use of ~24,000 fraudulent accounts, leveraged primarily through commercial proxy services.
- **Persistence:** High volume and distribution across proxy networks ("hydra cluster" architectures) made banning individual accounts ineffective due to rapid replacement.
- **Privilege Escalation:** Not applicable in the traditional sense; the method relied on abusing legitimate API access tiers via volume and scale.
- **Defense Evasion:** Mixing distillation traffic with legitimate, unrelated customer requests to obscure malicious activity.
- **Credential Access:** Not applicable; access was gained via bulk creation of fraudulent accounts, not credential theft.
- **Discovery:** Not applicable; specific capabilities were targeted directly.
- **Lateral Movement:** Not applicable; focus was maximizing extraction volume through distributed access points.
- **Collection:** Performing highly structured, targeted prompts designed to elicit high-quality responses demonstrating specific capabilities (e.g., complex reasoning, coding solutions).
- **Exfiltration:** The model's outputs (the distilled knowledge) served as the "exfiltrated data."
- **Impact:** Direct theft of intellectual property embodied in model capabilities, leading to unsecured models potentially weaponized by authoritarian governments.
## Impact Assessment
- **Financial:** Not quantified, but implied significant cost savings for the attacking firms in R&D time/expense.
- **Data Breach:** No traditional customer data breach, but the intellectual property/proprietary capability set of the Claude model was illicitly copied/extracted.
- **Operational:** Initial operational disruption required significant engineering resources to investigate, attribute, and mitigate the ongoing abuse.
- **Reputational:** Significant; Anthropic publicly disclosed the breach of trust impacting operational security and exposing risks associated with unprotected model capabilities proliferating globally.
## Indicators of Compromise
- **Network Indicators (Defanged):** High-volume request patterns concentrated around specific capability elicitation, coming from known commercial proxy IP ranges.
- **File Indicators:** N/A
- **Behavioral Indicators:** Prompts exhibiting non-typical volume, structure, and focus distinct from legitimate user patterns, specifically targeting differentiation points (agentic reasoning, tool use).
## Response Actions
- **Containment:** Identification and banning of associated fraudulent accounts; deployed classifiers to flag future suspicious API traffic patterns.
- **Eradication:** Work ongoing to block traffic from proxy networks identified as contributors to the attacks.
- **Recovery:** Implemented enhanced safeguards within the model outputs to reduce their utility for illicit distillation. Strengthened account verification processes.
## Lessons Learned
- LLM capability extraction is an operational threat executed at "industrial-scale" using sophisticated proxy infrastructure designed for evasion.
- Competitors are actively attempting to "harvest" high-value model capabilities, posing a competitive risk and a potential national security risk if these unprotected models are deployed maliciously.
- Reliance on simple account bans is insufficient against organized proxy networks; behavioral fingerprinting and output modification are critical defenses.
## Recommendations
- Further development and deployment of behavioral fingerprinting systems specifically tuned for capability extraction patterns.
- Mandatory multi-factor verification or strict validation for high-volume or specialized API users, especially those routed through known commercial proxy providers.
- Continue rigorous analysis via metadata and infrastructure correlation to rapidly attribute future, similar distillation campaigns.