Full Report
A top technologist at the U.K.’s National Cyber Security Centre said “there’s a good chance” that prompt injection attacks against AI will never be eliminated, and he warned of the related risks of embedding generative AI into digital systems globally.
Analysis Summary
Based on the provided context, the article discusses a strategic threat concern regarding Large Language Models (LLMs) rather than a specific, singular software vulnerability with a traditional CVE identifier. Prompt Injection is framed as a fundamental architectural risk. Therefore, many sections requiring specific CVE/CVSS data cannot be populated.
***
# Vulnerability: Inherent Risk of Prompt Injection in Generative AI Models
## CVE Details
- CVE ID: N/A (This is a class of architectural design flaw in LLMs, not a single dated vulnerability)
- CVSS Score: N/A (No specific scoring applies as it is a systemic design risk)
- CWE: CWE-1938: Improper Neutralization of Special Elements in an Input Sequence for a Large Language Model used in an Autonomous System (Conceptual mapping, not formal designation)
## Affected Systems
- Products: General Large Language Models (LLMs), Generative AI applications, and digital systems embedding generative AI.
- Versions: Applicable across generations of LLMs that process user-supplied text inputs as sequential tokens.
- Configurations: Any configuration where user-supplied text (data) is treated identically to system instructions (commands) by the model.
## Vulnerability Description
Prompt Injection is a threat where an attacker manipulates an AI system, typically an LLM, by crafting malicious input text designed to override or confuse the model's original system instructions. The core issue stems from the LLM's fundamental design, where it treats all input (both intended instructions and external data) as a sequential stream of tokens to predict the next logical step, thus failing to strictly differentiate between **data** and **command**. This similarity to a "Confused Deputy" vulnerability arises because the LLM executes user-supplied content as if it were a trustworthy, internal instruction.
## Exploitation
- Status: Real-world examples documented (e.g., affecting Microsoft's New Bing search engine, stealing secrets). PoC is inherent to the input mechanism.
- Complexity: Low (Requires crafting specific natural language or structured text inputs).
- Attack Vector: Input/Data Channel (Injected via user prompts or embedded data documents).
## Impact
- Confidentiality: High (If the model is tricked into revealing system prompts, sensitive data it processed, or performing unauthorized data retrieval).
- Integrity: High (If the model is forced to execute unintended actions, such as approving a misqualified job application, as cited in the example).
- Availability: Moderate (Potential for service disruption or misalignment depending on the resulting actions).
## Remediation
### Patches
- **None definitive:** The NCSC advises that prompt injection cannot be **fully mitigated** with a traditional product or appliance, unlike SQL injection which benefits from parameterized queries. Mitigation requires fundamental architectural redesign.
### Workarounds
- **Careful Design, Build, and Operation:** Risk management must be prioritized through system design.
- **Limiting Capabilities:** Restricting the scope and functionality of AI agents to minimize potential harm if an injection occurs.
- **Instruction vs. Data Layering:** Researchers are attempting to overlay mechanisms to differentiate instruction from data, but this is acknowledged as an incomplete solution.
## Detection
- **Indicators of Compromise:** Unexpected outputs, refusal to adhere to initial safety guidelines, data leakage in responses, or execution of seemingly random internal commands within user-facing outputs.
- **Detection Methods and Tools:** Current research focuses on input sanitization and model training to internally flag or reject prompts that exhibit command-like behavior when they should be treated as data (though this difficulty is explicitly noted).
## References
- NCSC Blog Post: hxxps://www.ncsc.gov.uk/blog-post/prompt-injection-is-not-sql-injection
- Related Concept: OWASP Confused Deputy Attack: hxxps://cornucopia.owasp.org/taxonomy/attacks/confused-deputy-attack