Full Report
The Apache HTTP server is constructed with modules, with 136 listed in the documentation and about half that are in normal use. To the author this, there was a bad code smell: a giant request_rec structure is passed around to each module. if there was a difference between the understanding of two modules on this, it'd be bad. This is what the research is about. The structure contains a field called filename to represent the filesystem path. However, some of the modules treat this as a full URL, which can lead to security issues. This can be used to truncate entries using a ? in the path. For instance, mod_rewrite allows sysadmins to easily rewrite a path pattern with the RewriteRule directive. By providing a question mark here, the rewritten path will be truncated, resulting in a bad access. Another example of the truncation being useful is with a RewriteRule on the path. The other interesting issue with the filename confusion is an ACL bypass. It's common to use the File directive to add authentication to a file access. Using the confusion on the file path with the URL encoded question mark, we can get one path verified but another actually used. For instance, admin.php%3Fooo.php would be verified by the ooo.php at the end but used with admin.php. The next bug is crazy. When Httpd is processing a request, it first looks at that exact spot on the file system with specific rewrite rules. Then, it attempts to go to the specified document root. Most of the time, the root directory isn't there so it doesn't matter though. This means that if the prefix of a RewriteRule is controllable then the entire file system can be accessed! Well, sorta. Because of the rewrite rule having an ending attached to it (like .html), we can only access what this allows. Additionally, Apache has a built in protection for protecting against the access of some files. Using the first primitive allows us to truncate the path though, creating a super primitive. Using this bug, the author found they could disclose arbitrary source code. Even though there are restrictions on where can be accessed by default, we can use gadgets. The LibreOffice file at /usr/share/libreoffice/help/help.html contains an XSS. Some libraries, such as Wordpress plugins, could be used for LFI via tutorials. They mention a few other ways to exploit this, including abusing symbolic links. In Apache, there are two directives that do the same thing: AddHandler and AddType. Under the hood, there is some magic from 1996 to allow for both to be used by using the content_type field as the module handler when the handler field is empty. This new primitive is the ability to overwrite the function handler. The first instance of this being exploited was in mod security. When an error occurred in processing of a path, it wasn't being handled correctly by the Content-Type was being overwritten. As a result, the wrong handler was being executed, resulting in source code for PHP instead of the result of PHP being returned. This technique could be used in conjunction with other content type changes as well. Next, if an attacker can control the Content-Type header in the response then we can invoke ANY handler. Even though this processing happens after receiving, server side redirect make this exploitable to hit any CGI implementation on the server. The author mentions an SSRF with controlled headers or CRLF injection as potential ways to do this. How does this become exploitable? Getting an image file to be processed as a PHP script can quickly lead to RCE. mod_proxy leads to a full SSRF or direct access to unix sockets. Finally, they found that PEAR.php included with Docker can be used to get RCE by using PHP even. At the end of the article, they say this is promising for more research. The author only focused on issues in a few impactful fields but there may be other fields that cause as much havoc. The more complex a code base is the more unique vulnerabilities are likely lurking there. Amazing research, as always by Orange Tsai :)
Analysis Summary
# Research: Confusion Around `request_rec`: Vulnerabilities in Apache HTTP Server
## Metadata
- **Authors**: Orange Tsai
- **Institution**: DEVCORE
- **Publication**: Black Hat USA / Orange Tsai’s Technical Blog
- **Date**: August 2024 (Approximate based on presentation timeline)
## Abstract
This research investigates the architectural complexities of the Apache HTTP Server (httpd), specifically focusing on the `request_rec` structure. By identifying "impedance mismatches" in how various modules interpret shared fields—most notably the confusion between filesystem paths and URLs—the author demonstrates a series of high-impact vulnerabilities. These range from ACL bypasses and source code disclosure to Server-Side Request Forgery (SSRF) and Remote Code Execution (RCE).
## Research Objective
The study aims to answer: Can the modular overhead and legacy code of a complex system like Apache lead to security-critical "semantic gaps" between disparate modules sharing the same internal data structures?
## Methodology
### Approach
- **Code Audit**: A deep dive into the Apache C source code, particularly the interaction between the core and modules like `mod_rewrite`, `mod_proxy`, and `mod_security`.
- **Differential Analysis**: Comparing how different modules treat the `filename` field within the `request_rec` structure (e.g., as a local path vs. a URI).
- **Primitive Chaining**: Combining minor logic flaws (like path truncation) with legacy features (like 1996-era handler logic) to escalate privileges.
### Dataset/Environment
- Apache HTTP Server (httpd) core and standard modular distributions.
- Common web environments including PHP-FPM, Dockerized PEAR, and WordPress installations.
### Tools & Technologies
- C Source Code Analysis.
- HTTP Fuzzing and manual payload crafting.
- Debugging tools (gdb) to trace `request_rec` transformations.
## Key Findings
### Primary Results
1. **Filename Confusion**: Modules inconsistently treat the `filename` field, leading to path truncation using characters like `?`.
2. **ACL Bypass**: Using URL-encoded characters (`%3F`) to trick access control modules into validating one file while the server executes another.
3. **Handler Overwrites**: Exploiting legacy logic from 1996 that allows an attacker to hijack the execution handler via the `Content-Type` field.
4. **Information Disclosure & RCE**: Utilizing filesystem "gadgets" and SSRF to achieve code execution or leak sensitive source code.
### Supporting Evidence
- Demonstrated that `admin.php%3Fooo.php` allows a user to pass an ACL check for `ooo.php` while actually accessing `admin.php`.
- Discovered that an error in `mod_security` could leak PHP source code by incorrectly handling `Content-Type` during error states.
### Novel Contributions
- Identified the "Super Primitive": Combining a path-controllable `RewriteRule` with the `?` truncation bug to access files outside the intended web root.
- Uncovered legacy biological "DNA" in Apache where `AddHandler` and `AddType` are interchangeable under specific conditions (the `content_type` as handler fallback).
## Technical Details
The heart of the research lies in the `request_rec` structure. When `mod_rewrite` processes a rule, it updates the `filename`. If an attacker introduces a `?`, some subsequent modules treat everything after the `?` as a query string (URL-style), effectively truncating the filesystem path.
Furthermore, the research highlights a "handler confusion." In Apache, if the `handler` field is empty, the server looks at the `content_type` field to decide how to process the request. If an attacker can control the response `Content-Type` (via SSRF or CRLF injection), they can force the server to pass a file to a dangerous handler (like `proxy-server` or a CGI script), even if that wasn't the original intent.
## Practical Implications
### For Security Practitioners
- **Standardized Parsing**: This research highlights that even "battle-tested" software suffers from inconsistent parsing logic.
- **SSRF Escalation**: SSRF is no longer just about internal scanning; in Apache, it can be used to overwrite handlers and gain RCE.
### For Defenders
- **Review Rewrite Rules**: Ensure `RewriteRule` targets are not partially user-controllable.
- **Update Apache**: Apply patches related to CVEs identified by Orange Tsai.
- **Hardening**: Disable unnecessary modules (e.g., `mod_proxy`, `mod_rewrite`) to reduce the attack surface of the `request_rec` structure.
### For Researchers
- **Data Structure Auditing**: Focus on large, shared structures in modular software (like Nginx or IIS) to find similar semantic gaps.
## Limitations
- **Configuration Dependent**: Many of the vulnerabilities require specific directives (like `RewriteRule`) or specific filesystem structures to be exploitable.
- **Default Permissions**: Apache often has "Require all denied" on the root directory, which may mitigate some of the "entire filesystem access" primitives unless misconfigured.
## Comparison to Prior Work
While previous research into Apache has focused on buffer overflows or memory safety, this work shifts the focus to **logic and semantic vulnerabilities**—a trend in modern web security research. It builds on the concept of "Request Smuggling" but applies it internally between modules.
## Real-world Applications
- **Source Code Disclosure**: Bypassing PHP handlers to read the raw `.php` files.
- **Cloud Metadata Theft**: Using the `mod_proxy` SSRF to reach internal cloud metadata endpoints.
- **RCE**: Using Docker’s default PEAR.php or image-to-PHP handler confusion to run commands.
## Future Work
- **Unexplored Fields**: The `request_rec` structure has dozens of other fields (e.g., `args`, `uri`, `user`) that have yet to be fully audited for similar confusion.
- **Other Proxies**: Applying this "structure confusion" methodology to other reverse proxies or load balancers.
## References
- Orange Tsai, "Confusion at the Core: Bypassing Barriers in the Apache HTTP Server."
- [Link to Black Hat presentation - defanged] hxxps[://]www[.]blackhat[.]com/us-24/briefings/schedule/#confusion-at-the-core-bypassing-barriers-in-the-apache-http-server-39339