The minefield between syntaxes: exploiting syntax confusi...

Full Report

The author discusses how different syntaxes by different parsers can lead to security issues. URLs, URIs, content disposition headers, Unicode, etc. are great examples of this. In Python, the urlopen function can read local files, for instance. CVE-2023-24329 showed that a space at the beginning of a URL could trigger a SSRF if using blocklisting. The point is that parser differentials can lead to horrible security issues. They have several examples in their bug bounty life. They had a cache poisoning issue where only the URL port was being cached. When sending specific ports, like 80 or 443, the application removed the port. When using a huge port number, the port was kept on the domain though. The goal was to get the server-side parser to treat the port as invalid before normalization but for the client/browser to see it as valid. When using leading zeros on the port, they noticed this had some weird effects. For instance, the server would use http://example.com:000123:443, parse out http://example.com:000123, and then the browser would interpret this as http://example.com:123. The difference here was between the browser and the PHP backend. The next vulnerability took 3 months of work to exploit. They had control over a URL, and this would return a response from a PHP CURL request. They learned that providing the @ character and a path that started with /tmp allowed them to read files from the file system in the file upload code. However, the data was BLIND, since the file contents were being added to the $_FILES global variable. If sent with multipart/form-data, the contents go into the $_POST variable but with no control of the file name. They messed around with the Content-Disposition header to make this possible. They had the source code for this application, so they were able to see the sinks of this. The confusion happens in the second request. By adding a double quote to the request in the name, it reads the contents of /etc/passwd. Since the username parameter was the closest thing to the file contents, the file was added to the variable and returned in PHP. The rest of the data is effectively ignored because it's a very nice parser. This would eventually return the contents of /etc/passwd to the user, demonstrating a full file read via SSRF. The key was bypassing the $_FILES variable restriction to inject the file contents directly into the $_POST parameter. To mitigate these types of issues, they had a few suggestions. First, have a single consistent parser for handling input. Realistically, this is impossible to do. Some companies may use Python for one thing and NodeJS for another. Now what? The parsing will be different. Anytime there's a check and a use with different components, it's really hard to get correct. Another suggestion is to just error out when parsing fails. Things should NOT fail open. If syntax is wrong, a failure should occur. A final good one is just input validation. If you have a file name, only allow for alphabetic characters and an extension - nothing else. Good post!

Analysis Summary