Full Report
A widely used python module for machine-learning developers can be loaded with malware and bypass detection measures. The post Hugging Face platform continues to be plagued by vulnerable ‘pickles’ appeared first on CyberScoop.
Analysis Summary
# Vulnerability: Malicious Code Execution via Untrusted Pickle Files on Hugging Face
## CVE Details
- **CVE ID:** Not specified in the article (The finding appears to be a general disclosure of ongoing risk rather than a specific, officially numbered CVE).
- **CVSS Score:** Not specified.
- **CWE:** CWE-502: Deserialization of Untrusted Data (Implied, based on the use of Pickle).
## Affected Systems
- **Products:** Hugging Face Platform (specifically models utilizing Python's `pickle` functionality).
- **Versions:** Unspecified, generally applicable to any models/files processed by the platform where deserialization of untrusted pickles occurs.
- **Configurations:** Any models uploaded and stored as pickle files that can be deserialized by users.
## Vulnerability Description
The vulnerability stems from the inherent risk of Python's `pickle` module, which allows for the serialization and deserialization of Python objects. Threat actors can embed executable Python code within malicious pickle files. When these untrusted files are deserialized (loaded) by a user, the embedded code executes. Researchers found at least two ML models on the Hugging Face platform containing such malicious code, deploying web shells linked to a hardcoded IP address. This method successfully bypassed Hugging Face's existing security scanner, Picklescan.
## Exploitation
- **Status:** PoC available (Implied operational PoC demonstrated by the identified malicious models).
- **Complexity:** Low (Pickle loading is a natively easy way to share and load ML models, making the attack vector straightforward for those creating malicious payloads).
- **Attack Vector:** Network (Attacker uploads malicious file to the platform; victim downloads/loads the model).
## Impact
- **Confidentiality:** High (Execution of code, including potential data theft from the execution environment).
- **Integrity:** High (Arbitrary code execution allows for modification of the execution environment).
- **Availability:** Medium to High (Potential for denial of service or system compromise).
## Remediation
### Patches
- Hugging Face reportedly pulled the two malicious models immediately after being notified on January 20th.
- Changes were made to the Picklescan tool (referencing a pull request to the Picklescan repository) specifically to better identify malicious code in *broken* pickle files.
### Workarounds
- Hugging Face documentation already warns developers about the dangers of loading untrusted pickle files.
- Users should exercise extreme caution and avoid loading or deserializing models from sources they do not trust explicitly.
## Detection
- **Indicators of Compromise:** Presence of newly deployed web shells originating from environments that loaded untrusted Hugging Face packages.
- **Detection Methods and Tools:** Picklescan (Hugging Face's primary tool) uses a blacklist of "dangerous" functions. However, this specific attack evaded it, highlighting the need for dynamic analysis over static blacklisting, especially when files are compressed or otherwise obfuscated. Improved scanning to interpret opcodes during deserialization (as opposed to pre-validation) is a necessary future detection improvement.
## References
- Vendor advisory: Implied internal reporting to Hugging Face (reported Jan 20th).
- Relevant links - defanged:
- Hugging Face security documentation regarding pickle files: hxxps://huggingface.co/docs/hub/en/security-pickle
- Picklescan repository changes: hxxps://github.com/mmaitre314/picklescan/pull/33