Full Report
Signals in Linux are a mechanism for telling a process to do something. It's a common mechanism for inter-process communication (IPC) put simply. Notably, it's possible to have the code pause at some point (because of a signal) then trigger some other code being executed. This matters since the point in which the signal was interrupted could put the program into an inconsistent state, similar to reentrancy attacks in web3. The authors were looking at signal handlers in SSH when they noticed the handling of SIGALRM was calling not async-signal-safe code, such as syslog(), when closing a packet after a timeout. This was a regression of a vulnerability from 2006 in a change in 2020 to SSH. Their initial idea for exploiting this was using free(). In particular, the path would be triggering the SIGALRM while in the middle of free. Then, get the handler to go into malloc with the inconsistent state. They started by trying to exploit this on a 2006 system, which doesn't have ASLR, NX or the glibc malloc unlink protections in place. The goal of unlink is to remove a chunk from the linked list in order to consolidate the space next to it. If glibc is interrupted at the point where a chunk is free above it but not added to the linked list quiet yet, then the unlink will be attempted on attacker controlled data. This gives an arbitrary 4 byte write, which the author decided to put into the __free_hook of malloc to redirect code execution. Sadly, this didn't work right away; the race window was just too tight. So, they decided to increase their chances. The DSA parsing code for public keys has 4 places where free is called and sshd allows for six user authentications at a time, giving us 24 free calls to be interrupted at the perfect time within SIGALRM. After a month, it worked but they wanted further optimization. They started to time the presence of the alarm happening - one failure led to it being triggered to early while other showed it was too late. The original post mentioned an exploit in Kerberos which made them interested in PAM, an authentication module. They found a spot where a cleanup function is not yet initialized but will be soon. So, if they could interrupt the code with an alarm while this initialization was happening then uninitialized data from the heap could be used to control the function pointer. Although this didn't work, they found a similar missing initialization with leftover heap data that could lead to an arbitrary free vulnerability. They decided to use the House of Mind to overwrite the _exit() entry to shellcode on the stack. Pretty neat! Now, to 2024! The only interesting code to hit was syslog(), which calls into malloc. My first thought was "isn't there a lock on this? However, glibc removed the lock on single threaded code, which makes it not async-signal safe. Within libc, they used the same leftover trick from before. When splitting a chunk into multiple parts, the FREE chunk is added back to the link list BEFORE the new size is set. Since the memory can be controlled from the previous call (it's not cleared), then we can overlap this chunk with our addresses! This is sick because it's relative and doesn't require any knowledge of ASLR. Their goal was to corrupt a FILE vtable pointer by using the async alarm to corrupt a function pointer within it. Since there are many, many protections on FILE pointers from the years of abuse, this took some pretty crazy object faking but was doable. This took some pretty crazy heap grooming and timing in order to get this VERY specific case to happen at a VERY specific time. This exploit takes about 8 hours to win the race because of ASLR on 32 bit and the timing window. There is no exploit on 64 bit at this time. Awesome blog post on an RCE in SSH of all things. A fuzzer could have never found this. To me, there are a few takeaways. Add regression tests for previous vulnerabilities. If something was written in the past, it is likely to come back via a developer who doesn't understand why something exists. Primitives are hard to find but are there! Taking the time to understand the constraints of your bug and working around it can still lead to big results. Esoteric knowledge leads to esoteric bugs.
Analysis Summary
# Vulnerability: regreSSHion - Remote Code Execution in OpenSSH Server
## CVE Details
- **CVE ID:** CVE-2024-6387
- **CVSS Score:** 8.1 (High) - *Note: While some assessments vary, the impact is critical due to root-level RCE.*
- **CWE:** CWE-364 (Signal Handler Race Condition), CWE-479 (Signal Handler Calls Function that is not Async-Signal-Safe)
## Affected Systems
- **Products:** OpenSSH Server (sshd) on glibc-based Linux systems.
- **Versions:**
- OpenSSH versions 8.5p1 through 9.7p1 (inclusive).
- *Note:* Versions older than 4.4p1 are also vulnerable unless patched for CVE-2006-5051.
- *Note:* Versions between 4.4p1 and 8.5p1 are **not** vulnerable due to a previous fix that was later regressed.
- **Configurations:** Default configuration (specifically where `LoginGraceTime` is enabled/non-zero).
## Vulnerability Description
This is a regression of CVE-2006-5051. The flaw resides in `sshd`'s `SIGALRM` handler. If a client fails to authenticate within the allotted `LoginGraceTime` (default 120s), a signal is raised. The handler calls functions that are not async-signal-safe, specifically `syslog()`.
In modern glibc versions, `syslog()` may internally call `malloc()` or `free()`. If the signal interrupts a `malloc` or `free` operation already in progress in the main thread, the heap's internal structures can be left in an inconsistent state. When the signal handler then attempts its own memory operation via `syslog()`, it can lead to heap corruption and arbitrary code execution.
## Exploitation
- **Status:** PoC available for 32-bit (x86) systems; 64-bit (x64) exploitation is theoretically possible but significantly more difficult and time-consuming.
- **Complexity:** High (Requires precise timing/race condition winning and extensive "heap grooming").
- **Attack Vector:** Network (Remote, unauthenticated).
## Impact
- **Confidentiality:** Total (Attacker gains root access).
- **Integrity:** Total.
- **Availability:** Total.
## Remediation
### Patches
- **OpenSSH 9.8p1:** Released on July 1, 2024, specifically to address this vulnerability.
- **Distro Patches:** Linux distributions (Debian, Ubuntu, RHEL, etc.) have released backported security patches for their respective versions of OpenSSH.
### Workarounds
- **Configuration Change:** Set `LoginGraceTime 0` in `sshd_config`. This prevents the timer-based signal handler from triggering, effectively closing the attack vector.
- *Caution:* This may make the server vulnerable to Denial of Service (DoS) by exhausting `MaxStartups` connections.
## Detection
- **Indicators of Compromise:** Multiple failed connection attempts from the same IP, often lasting hours (due to the time required to win the race condition). Large amounts of "Timeout before authentication" logs in syslog.
- **Detection Methods:** Vulnerability scanners (Nessus, OpenVAS) can identify vulnerable version strings. Monitor for unusual `sshd` child process crashes (Segfaults) which may indicate failed exploitation attempts.
## References
- **Vendor Advisory:** hxxps://www.openssh[.]com/txt/release-9.8
- **Qualys Security Advisory:** hxxps://www.qualys[.]com/2024/07/01/cve-2024-6387/regresshion.okt
- **GitHub Commit:** hxxps://github[.]com/openssh/openssh-portable/commit/81c1099d22b81ebfd20a334ce986c4f753b0db29