Full Report
The SFQ Qdisc system in Linux distributes fair packet throughput between different network data flows. If the packet amount exceeds the limit, it's dropped. Eventually, a type confusion occurs through a complex interaction of 3 packets. This type-confusion leads to an integer underflow on an index, leading to an out-of-bounds write with the value 0x0000. In practice, it writes 262636 bytes (4 * 0xFFFF) after the vulnerable Qdisc object. The initial vulnerability was patched by not allowing a limit of 1 on the QDisc limit. However, it can get SET to 1 because of a min operation performed later. This was discovered through Google Syzkaller fuzzer. Is a two-byte uncontrolled location and uncontrolled set of bytes even useful for a primitive? Memory corruption is a powerful beast! The focus of the article is on the exploitation of this issue. The first goal was to reduce the number of crashes. Right after the OOB write occurs, two invalid pointer accesses occur. One of them can be conquered by spraying valid objects with pointers in the proper malloc slab. The other case was a little more tricky - they solved it using application-level setups to ensure that the path that led to a crash never happened. Now, the OOB is stable! The offsets where this exploit could take place were very limited. They had previously made a tool that converts the Linux structures to a Python-queryable interface and searched for all fields within these ranges that could be useful. After a lot of review, they came across the pip_inode_info.files field in the kmalloc-192 slab. From reading this code, they could set a counter to 0 to trigger a page-level use after free! With a page-level write, the author figured out how to overwrite the process credentials with zeros to get root privileges. This exploit worked about 35% of the time. To make this more reliable, there are likely side channels to work around one of the main crashes. On the mitigation instance, they found guard pages reduced the exploitation substantially. Overall, a super technical and excellent blog post on exploiting the vulnerability. At first glance, this seems unexploitable but this just proves how powerful memory corruption can be. I found the section on making the aftermath not crash to be super interesting as well.
Analysis Summary
# Vulnerability: Linux Kernel SFQ Qdisc Out-of-Bounds Write
## CVE Details
- **CVE ID:** CVE-2025-37752
- **CVSS Score:** Not explicitly listed, but estimated High (Local Privilege Escalation)
- **CWE:** CWE-787 (Out-of-bounds Write) / CWE-843 (Type Confusion)
## Affected Systems
- **Products:** Linux Kernel
- **Versions:** All versions prior to the maintainer's fix (Specific versions tested: LTS 6.6.84 and 6.6.86).
- **Configurations:** Systems with `CONFIG_NET_SCH_SFQ` and `CONFIG_USER_NS` enabled.
## Vulnerability Description
The vulnerability exists in the Stochastic Fairness Queueing (SFQ) network packet scheduler. While the `sfq_change()` function has an initial check to prevent the `limit` parameter from being set to 1, this check can be bypassed through subsequent `min_t` operations during parameter updates.
When the limit is successfully set to 1 and used in conjunction with a Token Bucket Filter (TBF) Qdisc, a complex interaction between the two during a packet burst leads to a type confusion. This causes the `sfq_dec()` function to underflow the queue length (`qlen`) from zero. This underflowed value is then used as an index in the `dep` array within `sfq_link()`, resulting in a two-byte `0x0000` (NULL) value being written approximately 256KB (262,636 bytes) out of bounds at a misaligned offset.
## Exploitation
- **Status:** PoC available; successfully exploited against Google kernelCTF instances (COS and LTS).
- **Complexity:** High (Requires complex heap spraying, pointer alignment, and application-level state manipulation to prevent kernel crashes).
- **Attack Vector:** Local (Requires the ability to configure network disciplines, typically via `CAP_NET_ADMIN` or within a user namespace).
## Impact
- **Confidentiality:** High (Full system access)
- **Integrity:** High (Unauthorized modification of kernel memory/credentials)
- **Availability:** High (Can lead to immediate kernel panic/DoS if improperly triggered)
## Remediation
### Patches
The vulnerability has been addressed in the Linux upstream kernel by the following commits:
- `8c0cea59d40cf6dd13c2950437631dd614fbade6` (Implements temporary area for Qdisc parameter processing)
- `b3bf8f63e6179076b57c9de660c9f80b5abefe70` (Moves the limit check to the end of the input validation)
### Workarounds
- Disable unprivileged user namespaces (`kernel.unprivileged_userns_clone = 0`).
- Use `modprobe` to blacklist the `sch_sfq` module if not required.
## Detection
- **Indicators of Compromise:** Unexpected kernel crashes in `net/sched/sch_sfq.c`. Evidence of `tc` (traffic control) commands being used to set SFQ limits to 1.
- **Detection Methods:** Audit logs for network configuration changes; use of memory forensic tools to detect corrupted `pipe_inode_info` or `f_cred` structures.
## References
- [syst3mfailure Blog Post](https://syst3mfailure.io/sfq-out-of-bounds)
- [Linux Kernel Git Commit 1](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8c0cea59d40cf6dd13c2950437631dd614fbade6)
- [Linux Kernel Git Commit 2](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b3bf8f63e6179076b57c9de660c9f80b5abefe70)
- [Google Project Zero: Exploiting via packet sockets](https://googleprojectzero.blogspot.com/2017/05/exploiting-linux-kernel-via-packet.html)