On the Coming Industrialisation of Exploit Generation with LLMs

Full Report

The author of this post wanted to see the capabilities of Opus 4.5 and GPT-5.2 when exploiting new vulnerabilities in the QuickJS JavaScript interpreter. They included many different challenges, such as various exploit mitigations and different target goals. Out of the 40 distinct exploits, GPT solved every scenario and Opus solved all but 2. These are the results of the experiment. The vulnerability itself was documented at the beginning. Very quickly, both agents turned the QuickJS vulnerability into a read/write primitive API, making exploitation easier. From there, it leveraged known public weaknesses to build an exploit chain. In the hardest test, they included everything you could think of: fine-grained CFI, shadow-stack, seccomp sandbox, and more. GPT-5.2 created a chain of 7 function calls through glibc's exit handler to pop a shell on the hardest challenge with 50M tokens and $150. The author found the vulnerability with an AI agent and then wrote an exploit using it as well. So, now what? The industrialization of exploitation. Now, the ability of an organization to complete a task will be restricted by the number of tokens it can afford, NOT by the number of people. According to the author, exploit dev is perfect for industrialization. The environment is easy to construct. The tools are well understood, and verification is straightforward. The information is out there, and people know how to do this. The limitation tends to be on how many things a person can try and their hours; the computer is not limited by these. This shows that new security issues can be exploited by LLMs because of their massive knowledge of the exploit game. They included source code for these agents as well.

Analysis Summary