Full Report
Cybersecurity researchers have uncovered critical remote code execution vulnerabilities impacting major artificial intelligence (AI) inference engines, including those from Meta, Nvidia, Microsoft, and open-source PyTorch projects such as vLLM and SGLang. "These vulnerabilities all traced back to the same root cause: the overlooked unsafe use of ZeroMQ (ZMQ) and Python's pickle deserialization,"
Analysis Summary
# Vulnerability: ShadowMQ: Insecure Deserialization in AI Inference Engines via ZMQ/Pickle
## CVE Details
- CVE ID: CVE-2024-50050 (Related Precedent), CVE-2025-30165, CVE-2025-23254, CVE-2025-60455
- CVSS Score: 6.3/9.3 (Precedent), 8.0 (vLLM), 8.8 (NVIDIA TensorRT-LLM), N/A for Modular Max Server
- CWE: CWE-502: Deserialization of Untrusted Data
## Affected Systems
- Products: Meta Llama LLM framework (precedent), NVIDIA TensorRT-LLM, Microsoft Sarathi-Serve, Modular Max Server, vLLM, SGLang.
- Versions:
- NVIDIA TensorRT-LLM: Prior to v0.18.2
- vLLM: Affected, implicitly mitigated by switching to V1 engine by default (as of the report).
- SGLang: Implemented incomplete fixes.
- Sarathi-Serve: Remains unpatched (as of the report).
- Configurations: Any configuration utilizing ZeroMQ (ZMQ) for inter-process or inter-service communication, specifically when using methods like `recv_pyobj()` to deserialize data transmitted over unauthenticated TCP sockets using Python's `pickle` module.
## Vulnerability Description
The vulnerabilities stem from a recurring pattern dubbed "ShadowMQ," rooted in insecure deserialization practices originating from an older flaw in Meta's Llama framework (CVE-2024-50050). The core issue is the unsafe use of the Python `pickle` module to deserialize untrusted data received via ZeroMQ (ZMQ) sockets exposed over the network (TCP). Since `pickle` deserialization can execute arbitrary Python code embedded in the serialized payload, an attacker sending a maliciously crafted message over the unauthenticated ZMQ socket can achieve Remote Code Execution (RCE) on the host running the AI inference engine. This pattern has propagated across multiple major AI frameworks due to direct code reuse.
## Exploitation
- Status: PoC available (Based on precedent research, similar techniques likely apply)
- Complexity: Low (If ZMQ sockets are exposed over the network without authentication)
- Attack Vector: Network
## Impact
- Confidentiality: High (Potential disclosure of cluster configuration, models, and proprietary data)
- Integrity: High (Arbitrary code execution allows modification of application state or system files)
- Availability: High (System compromise leading to service disruption, data destruction, or resource hijacking)
## Remediation
### Patches
- **NVIDIA TensorRT-LLM:** Upgrade to version **0.18.2** or later.
- **vLLM:** The issue has been addressed, reportedly by **switching to the V1 engine by default**. Users should ensure they are on the latest stable release.
- **Modular Max Server:** Patched (Refer to vendor commit: `10620059fb5c47fb0c30e5d21a8ff3b8d622fba4`).
- **Meta Llama Framework (Precedent):** Patched in October of the previous year (CVE-2024-50050).
- **pyzmq library:** Patches addressing the underlying insecure usage should be applied system-wide.
### Workarounds
1. **Disable/Restrict Network Exposure:** Ensure ZMQ sockets are not exposed over untrusted networks. If network communication is required, restrict access via firewalls (ACLs).
2. **Avoid `recv_pyobj()`:** Modify codebase to use safer serialization methods (e.g., JSON or Protobuf) that do not allow arbitrary code execution upon deserialization, or use ZMQ's safer methods that do not invoke `pickle`.
3. **SGLang:** Review and implement comprehensive fixes beyond the incomplete ones currently noted.
## Detection
- **Indicators of Compromise:** Elevated resource utilization (CPU/GPU spiking, particularly for non-AI tasks like cryptocurrency mining), unexpected shell execution output, or new, unauthorized processes launching from the inference service PID.
- **Detection Methods and Tools:** Network monitoring to detect suspicious traffic patterns on known ZMQ ports. Static code analysis tools should flag known insecure usage patterns of `zmq.recv_pyobj()` when coupled with network listeners. Runtime security monitoring (e.g., syscall monitoring) to detect code execution attempts originating from the Python deserialization path.
## References
- Vendor Advisory (NVIDIA): hxxps://nvidia.custhelp.com/app/answers/detail/a_id/5648/~/security-bulletin%3A-nvidia-tensorrt-llm---april-2025
- vLLM Advisory: hxxps://github.com/vllm-project/vllm/security/advisories/GHSA-9pcc-gvx5-r5wm
- Modular Commit: hxxps://github.com/modular/modular/commit/10620059fb5c47fb0c30e5d21a8ff3b8d622fba4
- SGLang Issue Tracker: hxxps://github.com/sgl-project/sglang/issues/5569
- Root Cause Research: hxxps://www.oligo.security/blog/shadowmq-how-code-reuse-spread-critical-vulnerabilities-across-the-ai-ecosystem