Full Report
In the middle of 2025, Sean Heelan made a post describing the usage of an LLM to find a use-after-free vulnerability in the Linux Kernel that was similar to an existing bug. In the post, Sean explains how the LLM found the bug and the limitations of LLMs on security reviews at the moment. The goal of the author was to rediscover this vulnerability consistently using LLM tools without prior knowledge of the codebase. They wanted to reach the code without needing to build/compile anything. Luckily for them, CodeQL had just launched a system that could analyze code without the need for building the code. Additionally, they didn't want to use an agenetic framework. Initially, they used a simple cross-function UAF detector in CodeQL that generated 1700 possible UAFs. With this, they used the tree sitter tool to improve the information as a plugin to CodeQL. Additionally, this allowed for a maximum call depth that limited the findings to 217. They pointed their code at the entire /fs/smb/server/ implementation of the Linux kernel. This is 26.4K lines of code, for context. With these 217 findings, they came up with a plan. First, triage with a smaller model. This would define the "use" from the downstream free() to avoid the larger model token burning. The second step, analyze would ask more profound questions about how the free memory might be accessed, exploitability, and potential fixes for the code. To do this, they created a tool called Slice that can take SAST input and feed it to an LLM for usage. Between both of the run types, it costs $1.75 for triage and $1.64 for analysis in a total of 7 minutes. By running this on GPT-5, it found the vulnerability 10 out of 10 times! They found the bug they were looking for but don't mention the amount of false positives that came along with it. From reading the JSON output, it may have been zero! The usage of LLMs is going to revolutionize many things. By providing the LLM with the proper data and a specific task (finding UAFs in SMB Linux server code), it performed pretty well. In the context of bug bounty, where you want to find a bug and don't need to find every bug, this seems very useful to me. This tool appears fantastic, and it's something I'll likely use in the future for my own needs.
Analysis Summary
# Tool/Technique: Slice (SAST + LLM Interprocedural Context Extractor)
## Overview
**Slice** is an open-source security research tool designed to automate the discovery of complex vulnerabilities (specifically Use-After-Free) by combining Static Application Security Testing (SAST) with Large Language Models (LLMs). Its primary purpose is to filter massive amounts of codebase data into high-signal "slices" of code that an LLM can analyze for exploitability without exceeding context windows or incurring excessive API costs.
## Technical Details
- **Type:** Research & Vulnerability Discovery Tool / Framework
- **Platform:** Linux Kernel (specifically tested on SMB server implementations/`ksmbd`), but extensible to any C/C++ codebase.
- **Capabilities:** Build-free code analysis, interprocedural context extraction (call graph depth up to 3), automated triage, and deep vulnerability analysis.
- **First Seen:** August 20, 2025 (Public release/blog post).
## MITRE ATT&CK Mapping
*Note: As a vulnerability discovery tool, Slice maps to the "Reconnaissance" and "Development" stages of an attack lifecycle.*
- **[TA0043 - Reconnaissance]**
- **[T1592 - Gather Victim Host Information]**
- **[T1592.002 - Software Vulnerabilities]** (Searching for 0-day or unpatched flaws)
- **[TA0022 - Tool Development]**
- **[T1588.006 - Vulnerabilities]** (Automated generation of vulnerability reports for exploit dev)
## Functionality
### Core Capabilities
- **Build-Free Scanning:** Leverages CodeQL’s build-free capabilities to analyze C/C++ code without requiring a functional build environment or preprocessor macro resolution.
- **Intelligent Triage:** Uses a two-step process to save costs: first, a smaller/cheaper model identifies the "use" relative to a "free" location; then, a larger model (GPT-5) performs deep analysis.
- **Tree-Sitter Integration:** Uses Tree-Sitter as a plugin to CodeQL to improve AST (Abstract Syntax Tree) parsing and limit call depth traversal.
- **Context Slicing:** Extracts only the relevant code paths (upstream/downstream functions) related to a potential bug, keeping the LLM prompt lean and relevant.
### Advanced Features
- **Cross-Function UAF Detection:** Capable of tracking freed objects across multiple function calls (interprocedural analysis).
- **Macro Evasion:** Automatically replaces `#ifdef` with `#ifndef` via `sed` to force the static analyzer to exercise code paths that might otherwise be hidden during build-free scanning.
- **Automated JSON Output:** Produces structured data detailing the affected object, the "free" location, and the "use" location with code snippets.
## Indicators of Compromise
*As Slice is a legitimate research tool used by analysts, "compromise" indicators are not applicable in the traditional sense. However, its usage in an environment might be identified by:*
- **File Names:** `slice` (CLI tool), `transport_rdma.c` (targeted file in Linux tests).
- **Behavioral Indicators:** High volume of API calls to LLM providers (OpenAI/Anthropic) containing snippets of proprietary or kernel source code. Large-scale execution of `codeql` queries against uncompiled source trees.
## Associated Threat Actors
- **Security Researchers:** Caleb Gross (noperator), Sean Heelan.
- **Potential Use:** While developed for defensive bug hunting and bug bounties, similar frameworks are leveraged by advanced persistent threats (APTs) for automated 0-day discovery.
## Detection Methods
- **Behavioral Detection:** Monitoring for unauthorized exfiltration of source code to LLM API endpoints.
- **Tooling Audit:** Detecting the installation of the `slice` CLI tool or associated Python dependencies on research workstations.
## Mitigation Strategies
- **Static Analysis Security:** Ensuring that internal source code is not accessible to unauthorized automated scraping or LLM-triage tools.
- **Code Hardening:** Proactive use of memory-safe wrappers and automated SAST/DAST pipelines to find UAF vulnerabilities before external researchers do.
- **API Monitoring:** Rate-limiting or auditing the flow of internal code snippets to third-party AI providers.
## Related Tools/Techniques
- **CodeQL:** The underlying SAST engine used for initial bug candidate generation.
- **Tree-Sitter:** Used for incremental parsing and syntax highlighting.
- **GPT-5 / Claude 3.7:** The LLM backends used for the "Analysis" phase.
- **o3:** The model family initially cited by Sean Heelan for finding CVE-2025-37899.