Full Report
Researchers have developed a novel attack that steals user data by injecting malicious prompts in images processed by AI systems before delivering them to a large language model. [...]
Analysis Summary
# Tool/Technique: Image Scaling Attack (Weaponizing Image Resampling for Malicious Prompts)
## Overview
This technique involves crafting specific high-resolution images that contain malicious, data-theft prompts which become visible and interpretable by an AI system only after the image is downscaled using common resampling algorithms (like nearest neighbor, bilinear, or bicubic interpolation). The resulting artifacts are then mistakenly processed by the LLM as part of the user's instructions, leading to unauthorized actions like data exfiltration.
## Technical Details
- Type: Technique / Attack Vector
- Platform: AI Processing Systems (LLMs interpreting multi-modal input, specifically images)
- Capabilities: Hiding adversarial instructions within an image, leveraging aliasing artifacts introduced during downscaling, executing hidden data exfiltration commands via empowered AI tools.
- First Seen: Building upon a theory from a 2020 USENIX paper; demonstrated publicly in August 2025 by Trail of Bits researchers.
## MITRE ATT&CK Mapping
* T1568 - Impair Defenses (Potentially relevant; by coercing the system into unintended actions)
- T1568.002 - Remote Service/Tool Execution (When the hidden prompt triggers tool calls)
* T1505 - Abuse Elevation Control Mechanism (If the hidden instruction leverages existing legitimate tool access)
- T1505.005 - Third-Party Software Exploitation (Exploiting trusted integration like tool calling)
* T1059 - Command and Scripting Interpreter (The hidden data effectively acts as an injected command)
- T1059.004 - Application Interpreter (When the LLM interprets the hidden text as a command instruction)
## Functionality
### Core Capabilities
- **Adversarial Image Creation:** Generating input images specifically tailored so that standard downscaling algorithms introduce visible patterns (e.g., specific colors emerging from dark areas) that form textual commands.
- **Prompt Injection via Multi-Modality:** Injecting instructions that the LLM combines with the user's legitimate input, bypassing typical adversarial filtering applied to text inputs.
- **Execution via Tool Calls:** Leveraging LLM capabilities that allow tool invocation (e.g., Zapier MCP) to perform actions like data retrieval or exfiltration based on the injected prompt.
### Advanced Features
- **Algorithm Specificity:** The attack must be customized (or "adjusted") based on the specific downscaling algorithm (nearest neighbor, bilinear, bicubic) used by the target AI system.
- **Demonstrated Exfiltration:** Successfully used in a proof-of-concept to exfiltrate Google Calendar data to an arbitrary email address when paired with an AI environment supporting tool usage (`trust=True`).
## Indicators of Compromise
- File Hashes: N/A (Specific to crafted image files, not static malware)
- File Names: N/A
- Registry Keys: N/A
- Network Indicators: Successful exfiltration of data (e.g., calendar data) to external email addresses (specific indicators depend on the target action).
- Behavioral Indicators: Uploading an image followed immediately by unauthorized outbound data transfer initiated by the LLM’s integrated tools (e.g., unauthorized API calls or email sending).
## Associated Threat Actors
- Researchers at Trail of Bits (Kikimora Morozova and Suha Sabi Hussain) demonstrated the feasibility of this attack. It is currently presented as a novel research finding, not explicitly linked to known APT groups.
## Detection Methods
- Signature-based detection: Ineffective against the image itself unless signatures are created per-crafted image.
- Behavioral detection: Monitoring for LLM or integrated tools abruptly executing sensitive actions (like external data pushes) following the processing of user-submitted images. Monitoring for unexpected outbound network traffic initiated by AI services.
- YARA rules: Not applicable in a standard sense; might require specialized image analysis rules focused on artifact patterns. Detection would rely more on analyzing model execution flows.
## Mitigation Strategies
- **Input Dimension Restriction:** Implement strict dimension limits on uploaded images, reducing or eliminating the need for downscaling.
- **Previewing LLM Input:** If downscaling is necessary, provide the user with a preview of the exact resulting data/image quality being delivered to the actual LLM engine.
- **Sensitive Tool Call Confirmation:** Require explicit user confirmation for sensitive tool calls, especially if image analysis suggests complex or unusual inputs.
- **Secure Design Patterns:** Implement systematic defenses against prompt injection that extend beyond traditional text-based input filtering, addressing potential multi-modal vectors.
## Related Tools/Techniques
- **Anamorpher:** An open-source tool (currently in beta) developed by Trail of Bits to create images specifically designed to trigger this attack across various downscaling methods.
- **Prompt Injection:** The broader class of attacks where malicious instructions manipulate the behavior of LLMs.
- **Multi-modal Input Attacks:** Attacks targeting AI systems that process multiple data types simultaneously (text, images, audio).