Full Report
New critical vulnerability with 9.0 CVSS presents systemic risk to the AI ecosystem, carries widespread implications for AI infrastructure.
Analysis Summary
# Vulnerability: Critical Container Escape in NVIDIA Container Toolkit (#NVIDIAScape)
## CVE Details
- CVE ID: CVE-2025-23266
- CVSS Score: 9.0 (Critical)
- CWE: Not explicitly stated, stems from a misconfiguration in OCI hook handling.
## Affected Systems
- Products: NVIDIA Container Toolkit (NCT), NVIDIA GPU Operator.
- Versions:
- NVIDIA Container Toolkit (NCT): All versions up to and including v1.17.7 (CDI mode only for versions prior to 1.17.5).
- NVIDIA GPU Operator: All versions up to and including 25.3.1.
- Configurations: Most acute risk in managed AI cloud services allowing customers to run custom AI containers on shared GPU infrastructure.
## Vulnerability Description
The vulnerability, dubbed #NVIDIAScape, is a critical container escape flaw stemming from a misconfiguration in how the NVIDIA Container Toolkit handles OCI hooks, specifically tied to the `enable-cuda-compat` hook. Successful exploitation allows a malicious container to bypass isolation measures and gain full root access to the host machine.
## Exploitation
- Status: PoC available (Exploit demonstrated with a simple three-line Dockerfile).
- Complexity: Low (Implied by the simplicity of the Dockerfile needed).
- Attack Vector: Local (Requires the ability to execute a container image on the system).
## Impact
- Confidentiality: High (Ability to access, steal, or manipulate the sensitive data and proprietary models of all other customers on the same shared hardware).
- Integrity: High (Ability to manipulate data and models).
- Availability: High (Potential to take down underlying host infrastructure).
## Remediation
### Patches
- **NVIDIA Container Toolkit:** Upgrade to the latest version (implied to be v1.17.8 or newer for users of older GPU Operator versions, and the absolute latest version otherwise).
- **NVIDIA GPU Operator:** Upgrade to version 25.3.2 or later (implied by the NCT patch version guidance).
### Workarounds
Systems that cannot be immediately upgraded should disable the source of the exposure, the `enable-cuda-compat` hook.
**For NVIDIA Container Runtime (Legacy Mode):**
1. Edit `/etc/nvidia-container-toolkit/config.toml`.
2. Set `features.disable-cuda-compat-lib-hook` flag to `true`.
**For NVIDIA GPU Operator:**
1. Disable the hook by adding `disable-cuda-compat-lib-hook` to the `NVIDIA_CONTAINER_TOOLKIT_OPT_IN_FEATURES` environment variable when installing or upgrading using Helm (as a comma-separated list if other flags are present).
## Detection
- **Indicators of Compromise:** Look for abnormal process execution or unexpected privilege escalation originating from within containerized environments, followed by attempts to pivot to the host filesystem or affect other tenant data.
- **Detection Methods and Tools:** Wiz customers can use a pre-built query in the Wiz Threat Intel Center to find instances running vulnerable versions of the toolkit. Prioritization guidance suggests focusing on hosts running containers built from untrusted or public images. Runtime validation can help focus patching efforts where the toolkit is actively in use.
## References
- Vendor Advisory: [https://nvidia.custhelp.com/app/answers/detail/a_id/5659](https://nvidia.custhelp.com/app/answers/detail/a_id/5659)
- Exploit Demonstration Video: [https://youtu.be/TkDsnzlPJAg?si=_EbKus_CQWR4oB5_](https://youtu.be/TkDsnzlPJAg?si=_EbKus_CQWR4oB5_)
- Additional Context (PEACH Framework): [https://www.google.com/search?q=https://www.wiz.io/blog/the-peach-framework-for-cloud-native-application-security](https://www.google.com/search?q=https://www.wiz.io/blog/the-peach-framework-for-cloud-native-application-security)