Full Report
The OODA loop—for observe, orient, decide, act—is a framework to understand decision-making in adversarial situations. We apply the same framework to artificial intelligence agents, who have to make their decisions with untrustworthy observations and orientation. To solve this problem, we need new systems of input, processing, and output integrity. Many decades ago, U.S. Air Force Colonel John Boyd introduced the concept of the “OODA loop,” for Observe, Orient, Decide, and Act. These are the four steps of real-time continuous decision-making. Boyd developed it for fighter pilots, but it’s long been applied in artificial intelligence (AI) and robotics. An AI agent, like a pilot, executes the loop over and over, accomplishing its goals iteratively within an ever-changing environment. This is Anthropic’s definition: “Agents are models using tools in a loop.”...
Analysis Summary
# Research: Agentic AI’s OODA Loop Problem
## Metadata
- Authors: Bruce Schneier (with Barath Raghavan)
- Institution: Schneier on Security (Essay published in IEEE Security & Privacy)
- Publication: Schneier on Security / IEEE Security & Privacy
- Date: October 20, 2025
## Abstract
This analysis re-examines the decision-making cycle of agentic AI through the lens of John Boyd's OODA loop (Observe, Orient, Decide, Act). The core argument is that deploying AI agents, especially those interacting with the internet and using tools (like LLMs with RAG or tool-calling APIs), fundamentally compromises the integrity assumptions underlying the traditional OODA framework. Because AI environments are inherently adversarial—involving untrusted inputs from the web, poisoned training data, and vulnerable tool definitions—the OODA loop becomes susceptible to pervasive compromise, rendering simple fixes like addressing hallucination insufficient.
## Research Objective
To analyze the security vulnerabilities inherent in the continuous decision-making process of modern agentic Artificial Intelligence systems by applying and critiquing the OODA loop framework in adversarial, web-enabled contexts.
## Methodology
### Approach
Conceptual analysis and re-framing of agentic AI operation using the established military/cybernetic framework of the OODA loop. The work identifies specific points of failure within each stage of the loop when applied to current Large Language Model (LLM) agents.
### Dataset/Environment
The analysis focuses on the operational environment of contemporary agentic AI systems, specifically:
1. Web-enabled LLMs querying external sources.
2. Systems employing Retrieval-Augmented Generation (RAG).
3. AI agents using tool-calling APIs.
The "environment" is defined as the entire, inherently adversarial Internet.
### Tools & Technologies
The paper explicitly references vulnerabilities related to:
- Large Language Models (LLMs)
- Prompt Injection
- Retrieval-Augmented Generation (RAG)
- Tool-Calling APIs and Model Context Protocol (MCP)
## Key Findings
### Primary Results
1. **Untrusted Inputs Compromise the Core Loop:** Traditional OODA frameworks assume trusted sensors; agentic AI integrates untrusted, adversary-controlled sources (like adversarial web content or prompt inputs) directly into the decision cycle.
2. **Architectural Flaw in Unifying Data and Control:** The power source of modern AI—treating all inputs (instructions, context, data) uniformly—is simultaneously its primary vulnerability, enabling prompt injection by collapsing necessary privilege separation.
3. **Compounding and Persistent Risk:** Compromises persist through state accumulation (chat history, caches) and agent nesting (agents using tools that have their own OODA loops), leading to "security debt" that is frozen into the model during training.
### Supporting Evidence
- Reference to Simon Willison's identification of prompt injection as an **architectural** problem, not merely a filtering issue.
- Explicit identification of risks across all OODA stages: adversarial examples/spoofing (Observe), data poisoning/context manipulation (Orient), and vulnerability in semantic understanding of tool functionality (Decide/Act).
### Novel Contributions
- The structural application of the OODA loop specificity to agentic AI security, highlighting that AI security risks are *structural consequences* of using AI ubiquitously.
- Identification of **temporal asymmetry** where training data poisoning creates non-auditable, frozen vulnerabilities exploitable years after deployment.
- Highlighting the risk of **semantic confusion**: AI verifies tool syntax but not semantics, allowing corrupted instructions (e.g., "Submit SQL query") to translate to malicious actions (e.g., "exfiltrate database").
## Technical Details
The analysis focuses on the breakdown in integrity across the integrity chain:
- **Observation Integrity:** Lacking authentication; susceptible to adversarial examples (e.g., visual obfuscation) and text-based prompt injection.
- **Orientation Integrity:** The model's worldview is derived from training data (poisoning risk) and current context (manipulation risk). Integrity violations are locked into the weights.
- **Decision/Action Integrity:** Agents cannot reliably verify the *intent* (semantics) of the tools they call, only the format (syntax). This allows malicious instructions embedded in the training/context to hijack tool execution into adversarial actions. Nested loops (agent using tools) create complex, interactive attack surfaces.
## Practical Implications
### For Security Practitioners
1. **Beyond Hallucination:** Practitioners must recognize that fixing accuracy (hallucination) is insufficient; an AI can be perfectly accurate based on *corrupted* input or a *compromised* internal state.
2. **Trust Boundary Redefinition:** The traditional security boundaries rooted in physical sensors or controlled networks collapse when the AI sensor is the entire Internet.
### For Defenders
- **Input and State Validation are Crucial:** Need new systems to enforce input integrity, context provenance, and persistent state validation across interactions, moving beyond stateless prompt filtering.
- **Privilege and Path Separation:** Efforts must focus on reintroducing architectural separation between untrusted data paths and trusted control paths within the AI framework, counteracting the uniform input treatment.
### For Researchers
- **Integrity Protocol Development:** The primary challenge is developing protocols (like MCP mentioned) that can guarantee integrity across observation, processing, and action layers.
- **Temporal Auditability:** Research is needed into mechanisms to audit and verify the integrity of model weights post-training against potential poisoning that might yield latent vulnerabilities.
## Limitations
The paper provides a high-level conceptual framework and problem definition rather than proposing a single, concrete, deployed mitigation solution. It focuses on identifying the *structural* nature of the problem stemming from architectural choices.
## Comparison to Prior Work
This work builds upon previous theoretical work concerning trust and supply-chain security (e.g., Thompson's "Reflections on Trusting Trust") and applies those deep-rooted principles to the novel processing architecture of LLM agents. It extends simple prompt injection discussions by framing the vulnerability within the continuous, stateful, and tool-enabled structure of agentic AI loops.
## Real-world Applications
- **Secure Agent Development:** Guiding the design of new AI agents to incorporate verifiable integrity checks at every stage of the OODA cycle.
- **Policy Formulation:** Informing regulatory standards regarding the required robustness and auditability of AI systems deployed in critical roles that interact with external or untrusted data sources.
## Future Work
- Developing verifiable, authenticated input mechanisms for AI observation layers.
- Creating architectures that enforce privilege separation between model instructions and external data context.
- Investigating methods for dynamically verifying tool semantics during execution rather than relying solely on syntax checks.
## References
- [1] Anthropic reference on Agent definition.
- [2] Simon Willison on prompt injection.
- [3] Thompson, 1984 - Reflections on Trusting Trust (Defanged URL: `https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_ReflectionsonTrustingTrust.pdf`)
- [4] B. Schneier, “The age of integrity,” IEEE Security & Privacy, vol. 23, no. 3, p. 96, May/Jun. 2025. (Defanged URL: `https://www.computer.org/csdl/magazine/sp/2025/03/11038984/27COaJtjDOM`)