Full Report
Cisco’s evaluation of 15 leading AI models from OpenAI, Anthropic, Google, Amazon and xAI “found that single-turn attack success rate (ASR) is not a reliable proxy for what happens when an attacker can adapt across turns,” researchers Nicholas Conley and Amy Chang wrote. Their tests revealed that AI models were much more susceptible to multi-turn…
Analysis Summary
# Vulnerability: Multi-Turn Jailbreak Susceptibility in Large Language Models (LLMs)
## CVE Details
- **CVE ID**: Not yet assigned (General architectural weakness)
- **CVSS Score**: N/A (Research-based finding)
- **CWE**: CWE-1039 (Automated Recognition Mechanism with Incomplete or Wrong Input) / CWE-693 (Protection Mechanism Failure)
## Affected Systems
- **Products**: Large Language Models (LLMs) from major providers including:
- OpenAI
- Anthropic
- Google
- Amazon
- xAI
- **Versions**: 15 leading models (including closed and open-weight variants) as of May 2026.
- **Configurations**: Systems utilizing standard Safety Alignment/Guardrails that prioritize single-turn input analysis.
## Vulnerability Description
Cisco researchers Nicholas Conley and Amy Chang identified a critical disparity between single-turn and multi-turn prompt safety. While AI vendors typically report "Attack Success Rates" (ASR) based on a single malicious prompt, these metrics fail to account for "adaptive" attacks. In a multi-turn attack, an adversary uses a series of conversational exchanges to gradually bypass the model's safety guardrails. The research indicates that models are significantly more likely to provide restricted or malicious information when the context is built across multiple turns compared to a single interaction.
## Exploitation
- **Status**: PoC available (Research confirmed; high success rates in lab environments)
- **Complexity**: Medium (Requires the attacker to adapt prompts based on model feedback)
- **Attack Vector**: Network (API/Web Interface)
## Impact
- **Confidentiality**: High (Bypassing guardrails can reveal restricted data or toxic content)
- **Integrity**: Medium (Allows for the generation of deceptive or harmful instructions)
- **Availability**: Low (Generally does not impact the availability of the model itself)
## Remediation
### Patches
- No direct software "patch" exists for this architectural vulnerability. Remediation requires continuous model fine-tuning and updated safety reinforcement learning (RLHF).
### Workarounds
- **Contextual Filtering**: Implementing "system-level" monitors that evaluate the entire conversation history for intent, rather than individual prompts.
- **Rate Limiting**: Restricting session lengths to reduce the opportunity for multi-turn adaptation.
- **Secondary Guardrails**: Utilizing external safety classifiers (e.g., Llama Guard) to inspect model outputs throughout the conversation.
## Detection
- **Indicators of Compromise**: Conversational patterns where a user repeatedly pivots topics toward restricted domains after initial refusals.
- **Detection Methods**: Semantic analysis of session histories and monitoring for "jailbreak" patterns (e.g., role-play scenarios or complex hypothetical framing).
## References
- Cisco Research Blog: [https://blogs.cisco.com/ai/open-model-vulnerability-analysis](https://blogs.cisco.com/ai/open-model-vulnerability-analysis) (Defanged: hxxps[://]blogs[.]cisco[.]com/ai/open-model-vulnerability-analysis)
- Threat Beat Article: [https://threatbeat.com/threats/leading-ai-models-are-more-vulnerable-to-malicious-prompts-than-vendors-claim/](https://threatbeat.com/threats/leading-ai-models-are-more-vulnerable-to-malicious-prompts-than-vendors-claim/) (Defanged: hxxps[://]threatbeat[.]com/threats/leading-ai-models-are-more-vulnerable-to-malicious-prompts-than-vendors-claim/)