Full Report
Can AI speed up writing vulnerability checks without sacrificing quality? Intruder put it to the test. Their researchers found where AI helps, where it falls short, and why human oversight is still critical. See what they discovered in practice. [...]
Analysis Summary
# Research: Can We Trust AI To Write Vulnerability Checks? Here's What We Found
## Metadata
- Authors: [Implied to be the Security Team at Intruder, based on the text]
- Institution: Intruder
- Publication: BleepingComputer (Sponsored Post/Technical Analysis)
- Date: September 29, 2025
## Abstract
This research investigates the feasibility and efficacy of leveraging Artificial Intelligence (AI), specifically Large Language Models (LLMs), to accelerate the creation of high-quality vulnerability check templates (specifically for the Nuclei scanning tool). The study compares simple, one-shot prompting against a more sophisticated, agentic approach augmented with curated data and rules. The findings suggest that while basic prompting yields poor results, an agentic system, guided by established rules and examples, can significantly boost productivity while maintaining required quality standards, transitioning AI from a fully automated tool to a powerful engineering assistant.
## Research Objective
The primary objective was to determine if AI could be utilized to generate new vulnerability checks faster than traditional manual methods without compromising the quality and accuracy of the detections (i.e., avoiding excessive false positives or false negatives).
## Methodology
### Approach
The team conducted a comparative study between two main AI interaction paradigms:
1. **One-shot Approach (Chatbot):** Using standard LLM interfaces (ChatGPT, Claude, Gemini) to directly prompt for Nuclei template generation.
2. **Agentic Approach:** Utilizing an AI agent (specifically mentioning Cursor's agent) capable of using tools, referencing external material, and adhering to specific rulesets. This approach involved indexing a curated repository of existing, high-quality Nuclei templates to guide the agent.
### Dataset/Environment
The environment involved generating Nuclei templates for real-world scenarios, including:
* General attack surface checks (e.g., detecting exposed admin panels).
* Specific vulnerability checks (e.g., detecting unsecured Elasticsearch instances).
The "dataset" used to ground the agentic workflow included engineer-provided inputs (pages to hit, matcher type, data extraction needs) and a repository of known-good Nuclei templates.
### Tools & Technologies
* **Target Output Format:** Nuclei vulnerability scan templates.
* **AI Models/Platforms:** ChatGPT, Claude, Gemini (for initial attempts).
* **Agent Platform:** Cursor's agent.
* **Vulnerability Scanner Context:** Nuclei.
## Key Findings
### Primary Results
1. **One-shot/Chatbot Failure:** Simple prompting across major LLMs (ChatGPT, Claude, Gemini) resulted in messy, low-quality outputs, referencing non-existent features, using invalid syntax, and employing weak matching logic.
2. **Agentic Success:** The agentic approach, when provided with rules and exemplary templates, showed immediate and significant quality improvement, producing outputs far closer to those written by human engineers.
3. **Human Oversight Required:** Full automation ("set-and-forget") was not viable; the agent still required human course correction, shifting the goal to using AI as a productivity multiplier rather than a replacement for engineers.
### Supporting Evidence
* The quality jump was "immediate" upon switching to the agentic workflow.
* The final agent-generated templates, with clear prompting, looked like they had been written manually.
* **Use Case Validation (Exposed Admin Panels):** The agentic workflow proved highly effective in creating checks for previously uncovered attack surfaces quickly and at scale, addressing gaps in major public scanners.
* **Use Case Validation (Unsecured Elasticsearch):** The agent successfully generated a refined Nuclei template to detect instances left "wide open" (read access unrestricted), surpassing the capabilities of a pre-existing public template.
### Novel Contributions
* Demonstration of the critical difference between basic LLM interaction and **tool-augmented, context-aware agentic workflows** for generating specialized security code (Nuclei templates).
* Establishment of a practical, rule-based methodology for leveraging AI to augment security engineering workflows for vulnerability check creation, focusing on speed *without* sacrificing quality assurance.
* Systematic identification of specific security tasks (like scalable attack surface mapping) where AI augmentation provides disproportionate value due to the tedious nature of manual generation.
## Technical Details
The agentic flow involved:
1. **Input Gathering:** The engineer supplies necessary technical constraints (endpoints, matcher type, expected extraction data).
2. **Contextual Grounding:** The agent utilizes a curated repository of existing, high-quality Nuclei templates as a reference library to adhere to syntax, structure, and best practices.
3. **Process Iteration:** The agent iterates based on explicit instructions (e.g., "detect X, check endpoint A, then check endpoint B for confirmation of vulnerability").
4. **Refinement:** Human engineers provide minimal prompts and rules to guide the agent toward the desired level of fidelity, particularly for complex multi-step checks like the Elasticsearch validation.
## Practical Implications
### For Security Practitioners
* AI (specifically agentic frameworks) can drastically reduce the time spent writing boilerplate or routine vulnerability checks, accelerating the coverage update cycle.
* It provides a scalable solution for discovering and checking for vulnerabilities on niche or proprietary systems where public scanner coverage is often lacking.
### For Defenders
* Organizations can gain faster detection coverage against emerging threats or custom exposure vectors (like internal-facing admin interfaces) by quickly generating bespoke Nuclei checks.
* The efficiency gained translates to a smaller window between vulnerability introduction and detection capability.
### For Researchers
* Confirms that the success of LLMs in technical domains is highly dependent on providing them with executable tools, curated reference materials, and structured, iterative feedback loops (agentic design).
* Suggests that "trust" can be established in AI-generated security artifacts if the generation process is tightly constrained by human-defined verification standards and context.
## Limitations
* The quality is still critically dependent on the quality of the provided examples and rules.
* The process is not fully autonomous; human engineers are still required for final validation and course correction.
## Comparison to Prior Work
Prior attempts using basic chatbots resulted in fundamentally unusable code, aligning with general findings about the limitations of "vibe coding" for critical infrastructure tooling. This research advances the state-of-the-art by demonstrating that by integrating agents with domain-specific, high-quality reference code (the curated Nuclei repo), the output quality can cross the threshold of practical engineering utility.
## Real-world Applications
* **Rapid Template Development:** Quickly spinning up bespoke detectors for proprietary software or newly discovered misconfigurations.
* **Attack Surface Reduction:** Mass-generating simple HTTP checks to find services like management panels or databases inadvertently exposed to the internet.
## Future Work
* Determining the optimal framework and ruleset that maximizes the agent's efficiency before human intervention is required.
* Exploring how these agentic workflows can be applied to other critical security code generation tasks beyond Nuclei template writing.
## References
* Prior research related to LLM performance in coding tasks (implied by referencing "vibe coding" disasters).
* Documentation or examples related to the Nuclei scanning framework (implied).