Full Report
The report discusses three vulnerabilities found in runc, the underlying containerization used by Docker and Podman. All of them allow for writing to the /proc file system to escape the container. runc will mask several files. In practice, this means that the value just points to /dev/null in the local container. However, there is a race condition around this. It's possible to use the race condition on the creation of a bind-mount to create a symlink for the target on the host system. The ol' switcheroo! By getting read/write to /proc/sys/kernel/core_pattern via this trick, it's possible to get a container escape with the coredump privileged upcalls. There was a second variant to this issue. If /dev/null is deleted on the container, then runc would ignore the error, and the masking process becomes a no-op. In practice, this means that an attacker could read the /proc files. This was found after the first one and was also fixed. The second full issue is similar to the first: a TOCTOU issue with /dev/console bind-mounts. When creating the bind mount to /dev/pts/$n, an attacker can replace /dev/pts/$n with a symlink. Naturally, this allows for writing to files on the host machine. This bug is after the pivot from root but the core_pattern trick from above can still be used. The author also found some issues around os.Create() that were stress-inducing. Although not directly exploitable, they decided to provide fixes for them anyway. Around race conditions on /dev/pts/$n writes, they added additional protections. A single bug should really trigger a large set of security improvements while you are there. The final vulnerability is a more sophisticated variant of CVE-2019-16884. Linux Security Modules (LSM) put labels or metadata to every process and file on the system. The original vulnerability was able to trick the LSM to write these labels to a dummy tmpfs instead of the correct location. This led to a bypass of the protections put in place. The trick was to have the images startup instructions mount /proc to a tmpfs. The patch for the original vulnerability ensured that it was applied to a real procfs file system before performing the LSM label write. The new variant allowed for using a real tmpfs file that would effectively be a no-op. For instance, force it to write to /proc/self/sched instead of the proper one. This was done via a symlink. runc thinks that it was writing to /proc/self/attr/exec but it wrote to another file instead. This bug makes the write into a no-op. An attacker could also redirect the write to a malicious target on the host system. Using this file write, it's likely that a container escape is possible. The development team was concerned that other write operations might be redirected in this way. They conducted further analysis on the system to determine if this was possible. They hope to write some custom linters in the future to try to prevent this. youki, LXC, and crun were found to have very similar flaws, requiring patch coordination between all of them. Interestingly enough, LXC doesn't consider these attacks in its threat model because non-user-namespaced containers are fundamentally insecure. All of these attacker require startup-time exploits as opposed to being in an already-running container. Overall, a great set of bugs!
Analysis Summary
# Vulnerability: runc Container Escape via Arbitrary /Proc Writes
## CVE Details
- **CVE ID:** CVE-2025-31133, CVE-2025-52565, CVE-2025-52881
- **CVSS Score:** 7.3 (High) - *Note: Scores may be higher in Docker/Kubernetes environments.*
- **CVSS Vector:** CVSS:4.0/AV:L/AC:L/AT:P/PR:L/UI:A/VC:H/VI:H/VA:H/SC:H/SI:H/SA:H
- **CWE:** CWE-367 (TOCTOU), CWE-59 (Symlink Follow), CWE-61 (UNIX Symbolic Link Following)
## Affected Systems
- **Products:** runc (and related runtimes: youki, LXC, crun)
- **Versions:** All versions prior to v1.4.0-rc.3, v1.3.3, and v1.2.8.
- **Configurations:** Systems running untrusted container images or building untrusted Dockerfiles (via `RUN --mount`).
## Vulnerability Description
Three distinct vulnerabilities allow an attacker to bypass container restrictions and write to arbitrary `/proc` files on the host, leading to container escape:
1. **CVE-2025-31133 (Masked Path Abuse):** A race condition during the "masking" process. When runc attempts to bind-mount `/dev/null` over sensitive `/proc` files, an attacker can replace `/dev/null` with a symlink. This tricks runc into mounting the target `/proc` file as read-write.
2. **CVE-2025-52565 (Console Bind-Mount TOCTOU):** A Time-of-Check to Time-of-Use flaw during the setup of `/dev/console`. An attacker can replace the target `/dev/pts/$n` with a symlink to host files, allowing direct writes to the host.
3. **CVE-2025-52881 (LSM Label Bypass):** A sophisticated variant of CVE-2019-16884. An attacker can use symlinks and custom mount configurations (like mounting `/proc` to a `tmpfs`) to trick Linux Security Modules into writing labels to incorrect locations or making the protection a "no-op," facilitating a write to sensitive host files.
## Exploitation
- **Status:** PoC templates available; coordinated disclosure by multiple researchers.
- **Complexity:** Medium (Requires winning specific race conditions during container startup/mount phase).
- **Attack Vector:** Local (Exploited via malicious container images or Dockerfiles).
## Impact
- **Confidentiality:** High (Ability to read sensitive host-level `/proc` files like `/proc/kcore`).
- **Integrity:** High (Container escape via `/proc/sys/kernel/core_pattern` allows arbitrary command execution on the host).
- **Availability:** High (Ability to trigger host crashes via `/proc/sysrq-trigger`).
## Remediation
### Patches
Update to the following versions immediately:
- **runc v1.4.0-rc.3**
- **runc v1.3.3**
- **runc v1.2.8**
*Note: Patches for `youki`, `LXC`, and `crun` are also required due to shared logic flaws.*
### Workarounds
- **User Namespaces:** Configure containers to use user namespaces where the host root user is not mapped.
- **Rootless Operation:** Do not run containers as the root user.
- **AppArmor/SELinux:** Use the default profiles provided by Docker/Podman, which block unexpected writes to `/proc` and `/sys` (though these may still be bypassed by certain variants).
- **Trusted Images:** Only run verified container images and avoid building untrusted Dockerfiles.
## Detection
- **Indicators of Compromise:** Unexpected symlinks targeting `/dev/null` or `/dev/pts/*` within a container's layer. Logs showing unexpected `pivot_root` or mount failures during container initialization.
- **Detection methods:** Audit system calls related to `mount` and `symlink` during container creation phases.
## References
- **Mailing List:** hxxps://seclists[.]org/oss-sec/2025/q4/138
- **Official Repository:** hxxps://github[.]com/opencontainers/runc
- **Author Site:** hxxps://www[.]cyphar[.]com/