Full Report
[This is the second in a series of posts on Pickle. Link to part one.] In the previous post I introduced Python’s Pickle mechanism for serializing and deserializing data and provided a bit of background regarding where we came across serialized data, how the virtual machine works and noted that Python intentionally does not perform security checks when unpickling. In this post, we’ll work through a number of examples that depict exactly why unpickling untrusted data is a dangerous operation. Since we’re going to handcraft Pickle streams, it helps to have an opcode reference handy; here are the opcodes we’ll use:
Analysis Summary
The provided article excerpt focuses on an educational discussion about the dangers of Python's `pickle` mechanism, specifically demonstrating why unpickling untrusted data is risky by working through handcrafted Pickle stream examples. It does not detail a specific, named malware family, a pre-existing attack tool/framework, or specific TTPs used by known threat actors beyond the fundamental vulnerability being exploited.
Therefore, the summary will focus on the **technique** of Pickle deserialization abuse.
# Tool/Technique: Python Pickle Deserialization Abuse
## Overview
This technique involves exploiting Python applications that accept serialized data via the `pickle` module from untrusted sources and subsequently deserialize (unpickle) that data. Since Python's `pickle` mechanism intentionally lacks built-in security checks during unpickling, an attacker can construct malicious pickle streams that, upon deserialization, execute arbitrary code or reconstruct dangerous objects.
## Technical Details
- Type: Technique
- Platform: Python environments (cross-platform, depending on the hosted application)
- Capabilities: Remote Code Execution (RCE) or arbitrary object instantiation upon deserialization of untrusted data.
- First Seen: Serialization mechanisms have existed for a long time; the vulnerability in `pickle` exploitation is well-documented, gaining significant traction around the time this article was likely written (the preceding post mentions an introduction to the mechanism).
## MITRE ATT&CK Mapping
Since this is an exploitation technique leveraging a language feature rather than a specific tool or malware, the mapping focuses on the resulting execution:
- **TA0002 - Execution**
- **T1059 - Command and Scripting Interpreter**
- T1059.006 - Python
- **TA0004 - Privilege Escalation** (If the application runs with elevated privileges)
- **T1055 - Process Injection** (If the execution leads to spawning a shell or running commands)
## Functionality
### Core Capabilities
- Serialization and deserialization of Python objects.
- Vulnerable deserialization process when handling untrusted input streams, leading directly to code execution via crafted opcode sequences (e.g., `GLOBAL`, `REDUCE` opcodes triggering method calls).
### Advanced Features
- The technique relies on the ability to craft a specific sequence of opcodes within the serialized data stream that forces the Python VM loader to invoke arbitrary functions (e.g., `subprocess.Popen`, built-in system functions) upon loading.
## Indicators of Compromise
As this summary pertains to a proof-of-concept technique involving handcrafted streams, standard IoCs like hashes or C2s are not applicable unless a specific payload was delivered.
- File Hashes: N/A (Applies to the stream data, not the application itself)
- File Names: N/A
- Registry Keys: N/A
- Network Indicators: N/A (The initial payload is the stream data itself)
- Behavioral Indicators: An application process that reads arbitrary byte streams and invokes the `pickle.load()` function without validating the source. Indicators include unexpected process spawning or file manipulation immediately following deserialization.
## Associated Threat Actors
This is a generic exploitation technique demonstrable against any insecure Python application. It is not attributed exclusively to one specific threat actor group, but is a known staple in generic penetration testing and vulnerability research.
## Detection Methods
- Signature-based detection: Difficult against handcrafted streams unless specific, predictable opcode patterns for known exploits are captured.
- Behavioral detection: Monitoring Python processes for unusual subprocess creation or system calls immediately after reading or receiving input data destined for `pickle.load()`.
- YARA rules: Potentially usable to flag streams containing known malicious pickle opcodes/structures if signatures are developed for common known exploits targeting this vulnerability.
## Mitigation Strategies
- **Prevention:** **Never** use `pickle.load()` on data received from untrusted or unauthenticated sources.
- **Hardening Recommendations:** If object serialization is absolutely required, use safer, language-agnostic formats like JSON or XML, which do not possess arbitrary code execution capabilities upon deserialization.
- If Pickle must be used, implement a whitelist of acceptable, safe classes that may be loaded via custom unpicklers, or manually inspect the stream opcodes before loading.
## Related Tools/Techniques
- General Deserialization Exploitation (e.g., Java, PHP, YAML deserialization flaws)
- Python Pickle Exploit Gadget Libraries (though not explicitly mentioned, these frameworks automate the creation of malicious pickles, such as `pickletools` utilization or other proof-of-concept scripts).