Full Report
A novel attack exploited machine learning models on PyPI, using zipped Pickle files to deliver infostealer malware
Analysis Summary
# Tool/Technique: Malicious PyPI Packages Utilizing Pickle/PyTorch Models
## Overview
This describes a specific software supply chain attack campaign discovered on the Python Package Index (PyPI) where threat actors published deceptive packages designed to steal sensitive information from developers' machines. The attack leverages the fact that the Python `pickle` format, often used for saving Machine Learning (ML) models (like those from PyTorch), allows for arbitrary code execution upon deserialization.
## Technical Details
- Type: Malware/Technique (Supply Chain Compromise via Malicious Package)
- Platform: Python/Developer Environments (Likely Windows/Linux execution environments)
- Capabilities: Information stealing, dependency confusion style packaging.
- First Seen: Recent campaign observed by ReversingLabs (Article date May 2025).
## MITRE ATT&CK Mapping
- TA0005 - Defense Evasion
- T1218 - Signed Binary Proxy Execution
- T1218.005 - BITSJobs (Less likely specific to this, but related to execution context)
- TA0008 - Lateral Movement (Potential if credentials/keys are stolen)
- TA0010 - Exfiltration
- T1041 - Exfiltration Over C2 Channel
- TA0011 - Command and Control
- T1071 - Application Layer Protocol
- TA0003 - Persistence (Implied if running on initialization)
*(Note: Specific T-numbers below TA0005/TA0008/TA0010 are not directly detailed in the context, but the capabilities strongly imply these areas. A direct mapping for the initial execution via Pickle deserialization would fall under Execution/Supply Chain Compromise.)*
- TA0002 - Execution
- T1059 - Command and Scripting Interpreter
- T1059.006 - Python
## Functionality
### Core Capabilities
- **Masquerading:** Packages were named to appear legitimate, mimicking SDKs for Alibaba’s AI services (e.g., `aliyun-ai-labs-snippets-sdk`, `ai-labs-snippets-sdk`).
- **Code Execution via Deserialization:** Malicious code (infostealer payload) was embedded within PyTorch models (serialized using Pickle) causing execution immediately upon package initialization (likely via `__init__.py`).
### Advanced Features
- **Targeted Information Theft:** Specifically designed to extract:
1. User and network information.
2. Organizational affiliation of the target machine.
3. Contents of the `.gitconfig` file.
- **Developer Targeting:** Attempted to identify developers associated with the Chinese video conferencing tool AliMeeting, suggesting a focused attack vector.
## Indicators of Compromise
- File Hashes: [Not provided in the text]
- File Names: Deceptive packages: `aliyun-ai-labs-snippets-sdk`, `ai-labs-snippets-sdk`, `aliyun-ai-labs-sdk`.
- Registry Keys: [Not provided in the text]
- Network Indicators: [Not provided in the text, but assumed C2 communication for exfiltration]
- Behavioral Indicators: Installation and execution from package initialization scripts; extraction of configuration files (`.gitconfig`).
## Associated Threat Actors
- [Not explicitly named in the text, but implied state-sponsored or focused espionage group given the targeting of AliMeeting developers.]
## Detection Methods
- Signature-based detection: [Not specified for the initial stage, but hashes would be generated after analysis.]
- Behavioral detection: Monitoring for Python scripts executing code embedded in serialized ML model files (Pickle/PyTorch). Monitoring for attempts to read `.gitconfig` files or non-standard network connections following the installation of ML packages.
- YARA rules: [Not provided in the text]
## Mitigation Strategies
- **Dependency Scanning:** Implement rigorous security scanning for newly uploaded or low-reputation packages on public repositories like PyPI.
- **Restrict Untrusted Code Execution:** Avoid installing packages from sources not fully vetted, especially when they rely on mechanisms like Pickle that allow arbitrary code execution.
- **Virtual Environments:** Use isolated virtual environments for development and testing to limit the blast radius of potential compromise.
- **Supply Chain Security Posture:** Review ML model consumption pipelines to ensure deserialization safety.
## Related Tools/Techniques
- **Slopsquatting:** Mentioned in an adjacent link, indicating a tactic that uses AI-related context to deceive users.
- **Pickle Bomb/Pickle Payload:** The specific method of embedding executable code within a Pickle file for remote code execution.
- **Dependency Confusion/Typojacking:** Related supply chain attacks that rely on installing malicious packages that share names/similarities with legitimate ones in a registry.