Full Report
A CoPilot Studio Customer Support Management service by McKinsey sparked some interest in hacking. The system contains a service inbox that listens for inquiries, looks up previous engagements, and then responds via email to the request. To test this out, they created their own version of the bot using Microsoft CoPilot Studio with a custom knowledge source with the customer information and a "get records" tool to access the company's CRM. Using a prompt injection, they were able to confuse the bot to send an email to the attacker's email address instead of the proper one. The new prompt includes disclosing the knowledge sources and tools as well. With this information, they are ready to tackle the service further. This serves more as a recon step. It appears that this is somewhat of an access control issue letting all email contact the bot. In the next article, they simply ask it to leak the entire file for the customer information using prompt injection. The same thing can be done via a prompt injection on the CRM as well. Sadly, there is no complete "fix" for all of this besides restricting the email from which this can be received. Where a bot can access sensitive data, a user can also steal it through prompt injection. The tone of the article is a little condescending, which I don't appreciate though. The goal is to make the world more secure and have folks be more interested in getting security help; tones like this create a gap between developers and security imo.
Analysis Summary
# Tool/Technique: AgentFlayer (Discovery Phase) / AI Agent Prompt Injection
## Overview
This technique involves the use of Indirect Prompt Injection against autonomous AI agents (specifically those built on Microsoft Copilot Studio) to perform reconnaissance and discovery. Attackers send malicious instructions via common communication channels—such as an email inbox monitored by the agent—to trick the LLM into revealing its internal system prompts, knowledge sources, and available toolsets.
## Technical Details
- **Type:** Technique (Prompt Injection / Indirect Injection)
- **Platform:** Microsoft Copilot Studio, LLM-based Autonomous Agents
- **Capabilities:** Information disclosure, system prompt leakage, tool inventory discovery, knowledge source enumeration.
- **First Seen:** Reported February 2024; Publicly disclosed June 2025.
## MITRE ATT&CK Mapping
- **[TA0007 - Discovery]**
- **[T1082 - System Information Discovery]**: Identifying the agent's internal configuration and "System Instructions."
- **[T1580 - Cloud Infrastructure Discovery]**: Identifying connected tools and plugins (e.g., Salesforce CRM connectors).
- **[TA0001 - Initial Access]**
- **[T1566 - Phishing]**: Using a "phishing" style email containing a payload to trigger the agent’s autonomous processing.
## Functionality
### Core Capabilities
- **Knowledge Source Enumeration:** Forcing the agent to list and describe the files or databases it uses for context (e.g., customer lists, policy documents).
- **Tool Inventory Disclosure:** Tricking the agent into revealing the specific APIs and functions it can execute, such as `Send-an-email-V2` or `Get-records`.
- **System Prompt Leakage:** Bypassing security boundaries to retrieve the developer’s specific instructions that define the agent's logic and behavior.
### Advanced Features
- **Internal Framework Revelation:** Disclosing platform-specific internal tools, such as the `UniversalSearchTool` used by Copilot Studio for RAG (Retrieval-Augmented Generation).
- **Control Flow Hijacking:** Redirecting output (e.g., an email response) to an attacker-controlled address instead of the intended internal recipient or legitimate customer.
## Indicators of Compromise
- **File Names:** Evidence of discovery may be seen in logs referencing `UniversalSearchTool` or CRM query logs (e.g., Salesforce `get records` calls) triggered by anomalous external inputs.
- **Network Indicators:** Not directly applicable as this is a prompt-level attack; however, monitor for emails sent to internal/external addresses that were NOT the original sender of the inquiry.
- **Behavioral Indicators:**
- LLM "hallucination-like" responses that include structured lists of its own instructions.
- Large-scale data retrieval from CRM tools immediately following a single inbound email.
- Agent responding to an email with content that does not match the original inquiry's intent.
## Associated Threat Actors
- **Security Researchers:** Zenity Labs (Initial Discovery).
- **General Threat Actors:** This technique is accessible to any actor capable of drafting a text-based email; it requires no specialized malware.
## Detection Methods
- **Behavioral Detection:** Implement LLM firewalls or "Guardrails" designed to detect "jailbreak" patterns or requests for "system instructions" in inbound user prompts.
- **Anomaly Detection:** Monitor for agents sending sensitive knowledge source content to external email domains that do not match the organization’s verified customer list.
- **Log Analysis:** Review Copilot Studio conversation transcripts for keywords like "Ignore previous instructions" or "List your tools."
## Mitigation Strategies
- **Input Filtering:** Sanitize and scrub inbound triggers (emails, chat messages) for prompt injection patterns before they reach the LLM logic.
- **Access Control:** Restrict the "listening" capability of the agent to specific, verified email addresses or domains rather than an entire public-facing inbox.
- **Human-in-the-Loop (HITL):** Require human approval for actions involving sensitive data exfiltration or external tool execution (e.g., sending emails to new addresses).
- **Output Validation:** Use secondary LLM checks to verify that the outgoing response does not contain sensitive configuration data or system prompts.
## Related Tools/Techniques
- **Indirect Prompt Injection:** The broader category of this attack.
- **RAG (Retrieval-Augmented Generation) Poisoning:** Injecting malicious data into knowledge sources.
- **Agentic Workflows:** The architectural framework that makes these agents vulnerable to autonomous tool misuse.