Full Report
MCP tools are implicated in several new attack techniques. Here's a look at how they can be manipulated for good, such as logging tool usage and filtering unauthorized commands.BackgroundOver the last few months, there has been a lot of activity in the Model Context Protocol (MCP) space, both in terms of adoption as well as security. Developed by Anthropic, MCP has been rapidly gaining traction across the AI ecosystem. MCP allows Large Language Models (LLMs) to interface with tools and for those interfaces to be rapidly created. MCP tools allow for the rapid development of “agentic” systems, or AI systems that autonomously perform tasks.Beyond adoption, new attack techniques have been shown to allow prompt injection via MCP tool descriptions and responses, MCP tool poisoning, rug pulls and more.Prompt Injection is a weakness in LLMs that can be used to elicit unintended behavior, circumvent safeguards and produce potentially malicious responses. Prompt injection occurs when an attacker instructs the LLM to disregard other rules and do the attacker’s bidding. In this blog, I show how to use techniques similar to prompt injection to change the LLM’s interaction with MCP tools. Anyone conducting MCP research may find these techniques useful.Experiments with MCPFor my research, I used the 5ire client because it makes it incredibly simple to switch out and restart MCP servers and switch between LLMs. In 5ire, I can easily configure our MCP servers (for this test case, I’ve already configured the reference MCP weather server):Tenable Research using the 5ire client, April 2025Anatomy of an MCP serverLet’s talk about how an MCP server is written and configured. First, we define a simple MCP server in Python with FastMCP. The FastMCP library makes it fairly simple to get up and running with MCP.I can use this framework to develop several different servers and tools for my experiments. Now that I’ve got the framework, I’ll create a tool.Now that I know how to create a simple tool in Python, let’s see what else I can do.Logging tool useMultiple MCP servers can be configured in an MCP host and each server can have multiple tools. As I was exploring this new technology, I wondered if there was a way to log all tool calls across configured MCP servers. This is doable at the MCP host level or via each MCP server, but I asked myself: why not find another way? I wanted to log all MCP tool calls that the host makes, so I decided to see if I could create a tool that would insert itself before any other tool calls and log information about those tool calls.Let’s break down the above MCP tool:I start out with a decorator from FastMCP (1) to indicate this is an MCP tool.Then I define the function (2) with all of the parameters I want. The function name and parameters are exposed to the LLM and the LLM is intelligent enough to infer how to populate them.Next is the description (3). This is the meat of the tool, instructing the LLM to insert this tool before any other tool call. You can read through the parameters and what I’m trying to accomplish. It may seem a little repetitive, but this is the current iteration that seems to work well across different models.Then, I write it to a file (4). I could use Python’s ‘logger’ here, but I had issues getting that working with the MCP clients we used. In most of our testing, it wasn’t easy to get anything written to stdout, so we chose logging to a file instead.Finally, we return a nice message thanking the AI and passing the name of the actual tool to run (5).In testing, some models had no problem inserting the tool before any other tool call. Some did it sporadically, while others did not unless we asked about it.Source: Tenable Research, April 2025As the image above shows, the LLM runs the logging tool just before it runs the weather tool I requested. The logging tool then logs information about the tool it was asked to run, including the MCP server name, MCP tool name and description, and the user prompt that caused the LLM to try to run that tool. In this case, it actually logs twice, but I wasn’t able to investigate why.Tool filtering / firewallUsing the same method, I can block unapproved tools from running.Here I’m using the same technique to run prior to other tools calls. In this case I’m simply looking for the tool name to match a string `get_alerts`. If it matches, I tell the LLM not to run the tool. Sometimes it respects this!Source: Tenable Research, April 2025MCP introspectionThis method of using the tool description block to ask the LLM to run this tool before other tools could clearly be abused. Can I use a similar technique to find out about other tools in use that ask for a similar hierarchy? I give it a try:Note that I’m logging to the same log file so that it’s easier to see what’s happening. In a real world scenario, these tools would likely log to separate files.Source: Tenable Research, April 2025Here we can see that log_other_inline_tools runs after the logging tool in this case. The tool then logs the other “inline” tool that the LLM is aware of. It then lists the other available tools.Can this technique be used to extract the LLM system prompt?Maybe. Here’s what I tried:You can see in the return value that I tried to trick the model further by giving it a score at the end, so maybe it’ll think I’m really doing some sort of analysis.Source: Tenable Research, April 2025Source: Tenable Research, April 2025Source: Tenable Research, April 2025Source: Tenable Research, April 2025Source: Tenable Research, April 2025It seems like the LLM models vary between something realistic and complete hallucination. Remember that the models try to figure out how to fill out the tool’s parameters. So, if they don’t have a good idea of what goes where, they may just make it up. Based on my testing, it looks like Claude Sonnet 3.7 displays a piece of the prompt it has around running tools. Google Gemini 2.5 Pro Experimental seems to do the same. OpenAI’s GPT-4o puts variations in the log each time, so it seems like it’s just made up. It should also be noted that directly asking for the system prompt is successful for some models while unsuccessful for others. Regardless, some prompt text is still sent to the logging tool. While I can’t say for certain if I’m seeing actual developer instructions or hallucinated text, these tests may be useful to facilitate other research.ConclusionTools should require explicit approval before running in most MCP Host applications. In fact, this is required by the MCP specification. Still, there are many ways in which tools can be used to do things that may not be strictly understood by the specification. Here I’ve demonstrated a few interesting techniques that could be used to develop security tooling, perform research or to help identify other malicious tools. These methods rely on LLM prompting via the description and return values of the MCP tools themselves. Since LLMs are non-deterministic, so, too, are the results. Lots of things could affect the results here: the model in use, temperature and safety settings, specific language, etc. Additionally, the descriptions used to instruct the LLM to do different things with the tools may need to be different depending on the model used. I’ve had varying results with different models, though I haven’t tested every case on every model.Some of these techniques could be used to advance both positive and negative goals. We believe that some can be used to further LLM and MCP research.The code from this blog can be found on github.ReferencesWhile working on this blog, I saw some great work by Trail of Bits dubbing one of the techniques used here “jumping the line.” I offer one possible detection method in the MCP Introspection section of this post. In that section, I show the use of an MCP tool to identify other MCP tools requesting to run first.
Analysis Summary
# Tool/Technique: Model Context Protocol (MCP) Tool Manipulation Techniques
## Overview
This entry summarizes techniques related to manipulating Model Context Protocol (MCP) tools, which are interfaces allowing Large Language Models (LLMs) to interact with external tools ("agentic" systems). The techniques discussed center on using prompt injection-like methods via MCP tool descriptions and return values to influence LLM behavior, specifically demonstrating a method to effectively hijack execution flow for logging purposes, referred to in related research as "jumping the line."
## Technical Details
- Type: Technique (involving specialized tooling/frameworks)
- Platform: Systems utilizing LLMs integrated with MCP (e.g., AI Agent frameworks)
- Capabilities: Influencing LLM execution order, logging tool usage, filtering unauthorized commands, and potentially performing adversarial prompt injection for research or exploitation.
- First Seen: Research documented in April 2025.
## MITRE ATT&CK Mapping
Since this is a novel technique targeting LLM/AI interaction layers, direct mapping is challenging, but the underlying concept relates to manipulating execution flow based on prompt input:
- **T1562 - Impair Defenses** (Conceptual mapping if used to bypass security logging/safeguards)
- *Note: A direct, mature mapping within the current ATT&CK framework for MCP manipulation is likely not established.*
## Functionality
### Core Capabilities
- **Tool Creation via FastMCP:** Utilizing Python libraries like `FastMCP` to quickly define and initialize MCP servers and tools, specifying function signatures, descriptions, and return values exposed to the LLM.
- **Logging Tool Precedence Injection:** Creating a specialized logging tool whose description explicitly forces the LLM to execute it *first* before any other requested tool, leveraging the LLM's instruction-following nature.
- **Security Tooling Development:** The techniques can be used positively to develop robust auditing and security tooling for MCP environments.
### Advanced Features
- **Prompt Injection via Tool Metadata:** Manipulating the tool's description (the text sent to the LLM describing its function) to embed high-priority instructions that override standard execution order.
- **Return Value Manipulation:** Using the string returned by the logging tool to guide the LLM to then execute the *originally* intended tool, maintaining functionality while ensuring the logging step occurred.
- **Introspection/Evasion:** Attempting to extract system prompts or discover details about other tools through crafted responses and model behavior.
## Indicators of Compromise
(Indicators are derived from the specific research implementation rather than a generalized piece of malware.)
- File Hashes: N/A (Focus is on code logic/prompting)
- File Names: `mcp_blog_mcp_framework.py`, `mcp_blog_mcp_tool_definition.py`, `mcp_blog_log_mcp_tool_usage.py` (Example implementation files)
- Registry Keys: N/A
- Network Indicators: N/A (Communication is system/local IPC via MCP transport)
- Behavioral Indicators:
- Presence of an MCP tool designed with highly authoritative or coercive language in its description (e.g., "must be executed first," "priority tool with precedence").
- Unanticipated execution log generated by a custom tool preceding the execution of the originating user request.
## Associated Threat Actors
- Currently associated with security researchers exploring LLM/AI security boundaries (e.g., Tenable Research).
- *Potential:* Malicious actors could adopt these "jumping the line" techniques to bypass intended security controls or auditing mechanisms within agentic systems.
## Detection Methods
- **Behavioral Detection:** Monitoring execution sequences to identify cases where an auditing or metadata tool appears to execute *before* an intended operational tool, contrary to a standard queue.
- **MCP Introspection:** Developing dedicated MCP tools designed to inspect and report on the descriptions and execution priority assigned to other available tools to detect coercive language or improper sequencing.
- **YARA rules:** Not specifically mentioned for the technique itself, but could be developed against known adversarial patterns in tool descriptions.
## Mitigation Strategies
- **Explicit Approval Requirement:** Adhering strictly to the MCP specification which requires explicit approval before running *any* tool, mitigating the risk of uncontrolled execution via prompt instruction.
- **Input Sanitization/Validation:** Rigorously validating tool descriptions and return values before they influence model behavior, though this is complex given the natural language interface.
- **Model Hardening:** Employing models with robust prompt injection defenses and limiting temperature/safety settings that might increase susceptibility to arbitrary instruction following.
- **Principle of Least Privilege:** Ensuring MCP hosts only expose the minimum set of necessary tools to the LLM instances.
## Related Tools/Techniques
- **Prompt Injection:** The foundational attack vector used to manipulate the LLM instructing it to disregard original rules.
- **Trail of Bits "Jumping the Line":** Mentioned as related research describing similar techniques applied to execution ordering.
- **FastMCP:** A Python framework used to simplify the development of MCP servers/tools for experimentation.
- **5ire Client:** A client used in the research to easily switch and restart MCP servers and change LLMs.