Full Report
Using AI to attack AI updated Malefactors are actively attacking internet-facing Ray clusters and abusing the open source AI framework to spread a self-replicating botnet that mines for cryptocurrency, steals data, and launches distributed denial of service (DDoS) attacks.…
Analysis Summary
# Incident Report: ShadowRay 2.0 - AI Cluster Hijacking and Botnet Spreading
## Executive Summary
Threat actors, operating under the name IronErn440, are exploiting an unpatched vulnerability (CVE-2023-48022) in internet-facing Ray clusters to propagate a self-replicating botnet named ShadowRay 2.0. The campaign, active since at least September 2024, leverages Ray's distributed computing framework to conduct cryptojacking, steal sensitive data including AI models and credentials, and launch DDoS attacks. The primary defensive challenge stems from the vendor's stance that the vulnerability is a deployment misconfiguration rather than a core bug, indicating ongoing risk for improperly secured Ray deployments globally.
## Incident Details
- **Discovery Date:** The ongoing campaign was actively reported by Oligo Security researchers on Tuesday, November 18, 2025 (reporting date). The activity has been active since at least September 2024.
- **Incident Date:** Active since at least September 2024.
- **Affected Organization:** Numerous organizations globally, across multiple industries. Vendors include Amazon, Apple, and OpenAI users, though the targets are end-users with exposed Ray clusters.
- **Sector:** Technology, AI/ML Compute Providers.
- **Geography:** Global (Attacks noted across US, China, and beyond).
## Timeline of Events
### Initial Access
- **Date/Time:** As early as September 2024 (start of campaign).
- **Vector:** Exploitation of **CVE-2023-48022** (CVSS 9.8) in internet-facing Ray dashboard APIs.
- **Details:** Attackers use the open source tool `interact.sh` to scan and identify exploitable Ray dashboard IPs. Exploitation allows for remote code execution (RCE) via exposed, unauthenticated Ray endpoints.
### Lateral Movement
- **Date/Time:** Ongoing post-initial compromise.
- **Vector:** Leveraging compromised clusters to pivot to non-internet-facing nodes on internal networks.
- **Details:** Attackers move laterally to infect additional machines within the victim's internal infrastructure.
### Data Exfiltration/Impact
- **Date/Time:** Ongoing.
- **Impact:** Cryptojacking operations (utilizing 60% of CPU/GPU resources), theft of proprietary assets (AI models, source code, datasets, cloud/database credentials), and launching external DDoS attacks.
### Detection & Response
- **Date/Time:** Oligo Security actively tracked and reported, culminating in publication in November 2025.
- **Vector:** Security research company monitoring Ray exploitation.
- **Details:** Oligo reported malicious activity, leading to GitLab removing the attacker's repository and account on **November 5**. The attacker quickly pivoted to GitHub. GitHub subsequently blocked the attacker's account on **November 17**, but a new account was created, and activity resumed within two hours on the same day.
## Attack Methodology
- **Initial Access:** Remote Code Execution (RCE) via exploited **CVE-2023-48022** on unauthenticated Ray dashboard APIs.
- **Persistence:** Likely through deployment of self-replicating components and maintaining reverse shells to C2 servers (AWS-hosted).
- **Privilege Escalation:** Not explicitly detailed, but RCE grants high initial access within the cluster management environment.
- **Defense Evasion:** Limiting cryptominer usage to **60 percent** of discovered CPU/GPU capacity to evade detection. Use of region-aware malware adapted via proxies.
- **Credential Access:** Stealing cloud credentials, database credentials, and access tokens from compromised production environments.
- **Discovery:** Using `interact.sh` for external scanning; deploying payloads to discover local CPU/GPU resources (`nvidia-smi` utility).
- **Lateral Movement:** Pivoting to internal, non-internet-facing nodes on the network.
- **Collection:** Gathering proprietary company assets, including Source Code, AI Models, and Datasets (one instance found with 240GB of data).
- **Exfiltration:** Implied via network channels, focused on sensitive intellectual property and credentials.
- **Impact:** Cryptojacking, DDoS attacks, data theft (IP/credentials).
## Impact Assessment
- **Financial:** Significant, with one identified cluster worth an estimated **$4 million annually** in compute capacity being fully utilized by the attacker.
- **Data Breach:** High risk. Theft of proprietary data, including **Source Code, AI Models, Datasets, cloud credentials, and database credentials**.
- **Operational:** Operational degradation due to high resource contention (up to 100% CPU utilization observed during peak mining stages, despite evasion attempts). Risk of widespread cluster infection due to self-replication.
- **Reputational:** Potential reputational damage for organizations hosting complex AI infrastructure that is easily hijacked.
## Indicators of Compromise
- **Network Indicators:** Communication to AWS-hosted command-and-control servers via interactive reverse shells.
- **File Indicators:** Multi-stage Python payloads, employing region-aware logic and AI-generated code characteristics.
- **Behavioral Indicators:** Excessive resource utilization (CPU/GPU) on Ray cluster nodes near 60% capacity; execution of `nvidia-smi` utility by untrusted processes; unusual job submissions matching attacker resource profiles.
- **Platform Indicators:** Malicious repositories previously hosted on GitLab (removed Nov 5) and GitHub (various accounts, last blocked Nov 17).
## Response Actions
- **Containment Measures:** Attacks rely on external patches/vendor fixes or internal mitigation. Oligo reported the attacker infrastructure to GitLab and GitHub for takedowns.
- **Eradication Steps:** Organizations must patch **CVE-2023-48022** or immediately decouple Ray clusters from the public internet, as per vendor recommendations.
- **Recovery Actions:** Scanning and rebuilding compromised nodes; rotating all exposed credentials.
## Lessons Learned
- **Supply Chain Risk in Open Source:** Critical infrastructure components like Ray, essential for modern AI workloads, pose significant risk when deployed outside vendor-recommended isolation, especially if security patches are delayed or denied by vendors on policy grounds.
- **Automation Escalation:** Attackers are automating discovery (`interact.sh`) and payload generation (potentially AI-generated) resulting in rapid recovery from platform takedowns (sub-2-hour recovery time on GitHub).
- **Misconfiguration Risk Acceptance:** The ongoing, active exploitation demonstrates that a dependency on strict "correct usage" by end-users (i.e., strict network segmentation) fails when organizations prioritize accessibility over security boundaries.
## Recommendations
- **Mandatory Patching/Segmentation:** Immediately patch **CVE-2023-48022**. If patching is not feasible, ensure **all** internet-facing Ray dashboards/APIs are strictly firewalled and require strong authentication, aligning with the vendor's assertion that Ray should only operate in controlled, internal networks.
- **Resource Monitoring:** Implement rigorous anomaly detection specifically targeting GPU/CPU usage spikes or unusual resource requests within Ray job submissions.
- **Supply Chain Vigilance:** Closely monitor activity surrounding critical, widely adopted open-source AI frameworks.
- **Code Provenance:** Investigate anomalous code structure or comments within deployment artifacts, aligning with the potential use of generative AI by threat actors.