Full Report
Single-tool LLM analysis produces reports that look authoritative but aren't. A serial consensus pipeline catches artifacts and hallucinations at source.
Analysis Summary
# Research: Building an Adversarial Consensus Engine | Multi-Agent LLMs for Automated Malware Analysis
## Metadata
- **Authors:** Phil Stokes
- **Institution:** SentinelOne (S1Labs)
- **Publication:** SentinelOne Blog / AI Research
- **Date:** March 19, 2026
## Abstract
This research introduces a multi-agent architectural framework designed to overcome the unreliability of single-tool LLM malware analysis. While standard LLMs often hallucinate or misinterpret decompiler artifacts, the proposed **Serial Consensus Pipeline** utilizes multiple reverse engineering tools (Radare2, Ghidra, Binary Ninja, and IDA Pro) as independent, skeptical analysts. By forcing subsequent agents to verify or reject the findings of their predecessors within a structured "Shared Context," the system significantly reduces noise and ensures capability claims are anchored to verifiable binary data.
## Research Objective
The research addresses the high false-positive and hallucination rates in LLM-generated malware reports. Specifically, it seeks to solve the problem of "noisy data" (decompiler artifacts, compiler stubs, and mangled strings) being treated as ground truth by LLMs, leading to authoritative-looking but technically inaccurate reports.
## Methodology
### Approach
The researchers developed a **Serial Consensus Pipeline** operating in three distinct phases:
1. **Discovery:** Four tool-specific subagents run in sequence, passing a "Shared Context" table that accumulates findings.
2. **The Gauntlet:** An adversarial peer-review phase where subagents (in a different order) are explicitly tasked with debunking or verifying previous assertions.
3. **Synthesis:** A final report-writer agent converts the verified "Shared Context" into a final document.
### Dataset/Environment
- **Targets:** macOS malware samples (e.g., WizardUpdate, FinderRAT, SysJoker, Go Infostealer).
- **Architecture:** Agentic workflow managed by an Orchestrator.
### Tools & Technologies
- **Framework:** OpenClaw (open-source agent framework).
- **LLMs:** Anthropic Claude 3.5/4.6 Opus (Orchestrator/Reporter) and Sonnet (Subagents); Qwen2.5 32b (fallback).
- **Security Tools:** radare2, Ghidra, Binary Ninja, IDA Pro.
- **Integration:** Custom deterministic bridge scripts (Python-to-Tool) rather than Model Context Protocol (MCP).
## Key Findings
### Primary Results
1. **Reliability via Redundancy:** Accuracy is derived from the pipeline structure and "rejection mandate" rather than the inherent reasoning of a single model.
2. **Tool Divergence:** Different tools (e.g., Ghidra vs. IDA) often disagree on compiler stubs; the consensus model successfully filtered these as "non-malicious artifacts."
3. **Architecture over Scale:** High-quality results were achieved without fine-tuning or vector databases (RAG), relying instead on the "Shared Context" held in the LLM's RAM (context window).
### Supporting Evidence
- Successful identification and cross-validation of capabilities in known malware samples (e.g., WizardUpdate).
- Significant reduction in hallucinations compared to "single-shot" LLM prompts by requiring virtual address anchoring for every claim.
### Novel Contributions
- **Adversarial Peer Review:** The "Gauntlet" phase introduces a formal mechanism for LLM agents to reject the work of other agents.
- **Deterministic Bridging:** A departure from the trend of using LLM-native protocols (like MCP) in favor of rigid Python bridge scripts to ensure data integrity.
## Technical Details
The system utilizes a **Shared Context** table—an in-memory construct passed between agents. Instead of using a database, the Orchestrator injects this text block into the systematic prompt of the next agent. The bridge scripts are critical; they convert LLM requests into precise tool commands (CLI/API) and return structured output. This prevents the LLM from "guessing" command syntax, which often leads to execution failures.
## Practical Implications
### For Security Practitioners
- Automated reporting becomes viable when findings are mapped to specific virtual addresses and cross-verified by multiple engines.
- Reduced manual "sanity checking" of LLM outputs by shifting the verification burden to the multi-agent pipeline.
### For Defenders
- Faster triage of unknown binaries.
- Capability to distinguish between genuine malicious logic and standard library code elided by different decompilers.
### For Researchers
- Proof that agentic "skepticism" is a more effective safety check than sophisticated prompting for technical tasks.
## Limitations
- **Model Degradation:** Performance drops significantly when falling back to smaller models (e.g., Qwen 32b) due to context compaction issues.
- **Latency:** The serial nature of the pipeline (Phase 1 -> Phase 2) is slower than parallel processing.
- **Mac-Centric:** The current implementation focuses heavily on macOS malware analysis tools.
## Comparison to Prior Work
Unlike prior research that focuses on "Better Prompting" or RAG (Retrieval-Augmented Generation), this work focuses on **process-driven consensus**. It moves away from treating the LLM as a lone expert and instead treats it as a manager of existing, highly precise legacy security tools.
## Real-world Applications
- **Automated SOC Triage:** Rapidly generating high-confidence malware summaries.
- **Threat Intel:** Scaling the analysis of thousands of samples where human reverse-engineers are a bottleneck.
- **Validation:** Using the "Gauntlet" phase as a standalone rigorous check for human-written reports.
## Future Work
- Optimizing for lower latency without sacrificing the "Gauntlet" verification phase.
- Expanding the toolset to include dynamic analysis (sandboxing) artifacts in the consensus pipeline.
- Testing the resilience of the pipeline against LLM-aware obfuscation in malware.
## References
- **Framework:** [https://github.com/nichochar/openclaw](https://github.com/nichochar/openclaw)
- **Analyzed Samples:**
- WizardUpdate: `60c8128c48aac890a6d01448d1829a6edcdce0d2`
- FinderRAT: `ad7d2eb98ea4ddc7700db786aadb796b286da04`