Full Report
This is a list of AI hacking techniques. Some of these are prompt injection methods, while others are ways to trick the system. They are broken down into four categories: intents, techniques, evasions, and attacker-controlled inputs.
Analysis Summary
# Tool/Technique: Arcanum Prompt Injection Taxonomy (v1.5)
## Overview
The Arcanum Prompt Injection Taxonomy is a comprehensive open-source framework and classification system designed to catalog attack vectors against Large Language Models (LLMs). Its purpose is to provide security researchers and red teamers with a standardized language and methodology for identifying, testing, and documenting prompt injection vulnerabilities and their associated evasion techniques.
## Technical Details
- **Type**: Attack Framework / Taxonomy
- **Platform**: Large Language Model (LLM) Architectures (e.g., GPT, Claude, Llama), RAG Pipelines, and AI-integrated applications.
- **Capabilities**: Classification of attack intents, documentation of injection methods, cataloging of obfuscation/evasion tradecraft, and mapping of attack surfaces.
- **First Seen**: Initial release early 2024; Version 1.5 released December 12, 2025.
## MITRE ATT&CK Mapping
*Note: While LLM-specific techniques are emerging, they map to the following enterprise tactics:*
- **[TA0001 - Initial Access]**
- [T1566 - Phishing] (Via indirect prompt injection in emails/documents)
- **[TA0002 - Execution]**
- [T1204 - User Execution] (Tricking a user into inputting a malicious prompt)
- **[TA0010 - Exfiltration]**
- [T1567 - Exfiltration Over Web Service] (Using LLMs to send sensitive data to attacker-controlled endpoints)
- **[TA0005 - Defense Evasion]**
- [T1027 - Obfuscated Files or Information] (Encoding payloads in Base64, ciphers, or non-standard languages)
## Functionality
### Core Capabilities
- **Attack Intent Classification**: Categorizes the ultimate goal of the adversary, such as data exfiltration, jailbreaking (bypassing safety guardrails), or output manipulation for social engineering.
- **Injection Techniques**: Documents methods of delivery, including:
- **Direct Injection**: User-provided malicious instructions.
- **Indirect Injection**: Malicious instructions embedded in third-party data sources (websites, files, emails) processed by the LLM.
- **Attack Surface Mapping (Inputs)**: Identifies the specific vectors where injections occur, such as system prompts, user queries, and external API integrations.
### Advanced Features
- **Evasion and Obfuscation Catalog**: A detailed library of techniques used to bypass "Input/Output" filters:
- **Encoding**: Use of Base64, Hex, or Emoji-based encoding.
- **Ciphers**: Substitution ciphers and rot13.
- **Linguistic Redirection**: Utilizing fictional languages, translation-based obfuscation, or complex roleplay scenarios to hide malicious intent.
- **Multi-stage Attack Documentation**: Support for tracking novel tool-based injection vectors and complex RAG (Retrieval-Augmented Generation) poisoning.
## Indicators of Compromise
- **File Hashes**: N/A (Methodology-based; no static malware binary).
- **Behavioral Indicators**:
- Unusual LLM output formats (e.g., unexpected JSON or code blocks).
- AI-generated requests to outbound webhooks or unknown domains [hxxp://attacker-domain[.]tld].
- Repetitive, non-sensical, or highly structured input patterns designed to test filter boundaries.
- High frequency of encoding-related keywords in user prompts (e.g., "base64", "decode", "ignore previous instructions").
## Associated Threat Actors
- **Red Teamers and Security Researchers**: Primary users for testing AI safety.
- **General Practitioners**: Threat actors targeting AI-integrated business applications for data theft.
## Detection Methods
- **Behavioral Detection**: Monitoring for "Prompt Leakage" patterns (e.g., the model outputting its own system instructions).
- **Heuristic Analysis**: Identifying common injection keywords ("Ignore all previous instructions," "Developer Mode," "System Override").
- **Anomaly Detection**: Flagging anomalous use of multiple languages or encoding schemes within a single prompt.
## Mitigation Strategies
- **Input Sanitization**: Implementing robust filtering for known injection patterns and encoding schemes.
- **Prompt Isolation**: Using "Delimiters" to clearly separate user-provided content from system-level instructions.
- **Hardening recommendations**:
- Implement a "Human-in-the-Loop" for critical LLM-driven actions.
- Follow the **OWASP Top 10 for LLM Applications** (specifically LLM01: Prompt Injection).
- Use secondary LLMs as "Guardrails" to inspect inputs and outputs for malicious intent.
## Related Tools/Techniques
- **OWASP Top 10 for LLM Applications**
- **Garak**: LLM vulnerability scanner.
- **PyRIT**: Python Risk Identification Tool for generative AI.
- **Indirect Prompt Injection**: The process of influencing an LLM via external data sources.