Full Report
Google on Monday announced a set of new security features in Chrome, following the company's addition of agentic artificial intelligence (AI) capabilities to the web browser. To that end, the tech giant said it has implemented layered defenses to make it harder for bad actors to exploit indirect prompt injections that arise as a result of exposure to untrusted web content and inflict harm. Chief
Analysis Summary
# Tool/Technique: User Alignment Critic (AC)
## Overview
The User Alignment Critic (AC) is a security feature implemented in Google Chrome, specifically designed to provide layered defenses against vulnerabilities arising from agentic Artificial Intelligence (AI) capabilities exposed to untrusted web content. Its purpose is to ensure that the AI agent's planned actions align with the user's stated goals, independent of potentially malicious prompts embedded in web content.
## Technical Details
- Type: Technique / Security Feature (Component of Chrome's Agentic AI Security)
- Platform: Google Chrome (Agentic AI context)
- Capabilities: Independent evaluation of proposed AI agent actions, vetoing misaligned actions, and providing feedback for plan reformulation.
- First Seen: December 2025 (Based on article publication date)
## MITRE ATT&CK Mapping
*Note: Since this is a defensive mechanism introduced by a vendor (Google) against emerging threats, direct, established MITRE ATT&CK mappings are often retrospective or conceptual against the attack being defended against. The primary attack being mitigated is manipulating LLM/Agent behavior.*
- **TA0001 - Initial Access** (Relevant if the injection leads to unauthorized access/action)
- **T1566 - Phishing** (Indirectly, if the injection relies on user interaction with compromised content)
- **TA0005 - Defense Evasion** (Relevant as the attack attempts to bypass existing safety mechanisms)
- **T1590 - Inhibit System Recovery** (Conceptual, relating to hijacking control flow)
- **TA0011 - Command and Control** (If the agent is tricked into sending stolen data)
- **T1071 - Application Layer Protocol** (Network communication initiated by the compromised agent)
*The most direct mapping relates to the underlying attack vector: **Prompt Injection**.*
- **T1531 - Prompt Injection** (Conceptual mapping for the attack this defends against)
- **N/A (Indirect Prompt Injection)**
## Functionality
### Core Capabilities
- **Action Evaluation:** The AC runs *after* the AI agent's planning phase is complete to scrutinize each proposed action.
- **Task Alignment Check:** Its primary function is determining if the proposed action serves the user's explicitly stated goal.
- **Veto Mechanism:** If an action is deemed misaligned with the user's goal (likely due to malicious instruction poisoning), the AC will veto the action.
### Advanced Features
- **Isolated Context:** The Alignment Critic views *only* metadata about the proposed action and is explicitly prevented from accessing untrustworthy web content that might contain malicious prompts, preventing its own poisoning.
- **Feedback Loop:** When an action is rejected, the Critic provides specific feedback to the planning model, forcing it to re-formulate its plan.
- **Escalation Path:** If repeated failures occur after reformulation attempts, the planner can return control entirely to the user.
- **Complementary Security:** It complements existing techniques like "spotlighting" (instructing the model to prioritize system instructions over web page instructions).
## Indicators of Compromise
*Note: The summary details a *defense* implemented in Chrome, not a specific malware sample or IOC generated by an attacker. Therefore, standard IOCs are not applicable to the User Alignment Critic itself.*
- File Hashes: N/A (Feature within Chrome binary)
- File Names: N/A
- Registry Keys: N/A
- Network Indicators: N/A
- Behavioral Indicators: N/A
## Associated Threat Actors
Google is focused on blocking various threat actors who would attempt to leverage agentic AI for malicious purposes, including those attempting data exfiltration or goal hijacking via prompt injection.
## Detection Methods
*Note: This describes a *detection/prevention* mechanism, not artifacts to detect.*
- Signature-based detection: N/A
- Behavioral detection: The mechanism itself is a form of behavioral analysis on the AI planner's output.
- YARA rules: N/A
## Mitigation Strategies
The User Alignment Critic is itself a mitigation strategy, layered alongside other defenses:
- **User Alignment Critic:** Vetoing misaligned actions based on metadata review.
- **Agent Origin Sets (AOS):** Enforcing strict data access boundaries (Read-only vs. Read-writable) based on task relevance to prevent cross-origin data leaks.
- **Gating Function:** A component used by AOS that prevents the planner from adding unapproved origins without explicit approval.
- **Transparency and User Control:** Requiring the agent to create work logs and seek user approval before navigating to sensitive sites, signing into password managers, or completing financial/communication actions.
- **Prompt Injection Classifier:** Running in parallel to the planning model to block actions based on content attempting to subvert the model.
- **Existing Browser Defenses:** Operating alongside Safe Browsing and on-device scam detection.
## Related Tools/Techniques
- Spotlighting (An existing technique used alongside the AC)
- Agent Origin Sets (AOS) (A related data access governance technique)
- Prompt Injection Classifier (A parallel defense mechanism)
---
# Tool/Technique: Agent Origin Sets (AOS)
## Overview
Agent Origin Sets is a security mechanism implemented in Chrome designed to restrict the data access scope of the agentic AI model. It enforces that the agent can only interact with data and origins relevant to the current user task or data explicitly shared by the user, mitigating site isolation bypasses used for data exfiltration.
## Technical Details
- Type: Technique / Security Feature (Component of Chrome's Agentic AI Security)
- Platform: Google Chrome (Agentic AI context)
- Capabilities: Categorizing and limiting origin access to Read-only or Read-writable sets based on task relevance.
- First Seen: December 2025 (Based on article publication date)
## MITRE ATT&CK Mapping
*The primary threat mitigated here is unauthorized data interaction/exfiltration.*
- **TA0010 - Exfiltration**
- **T1041 - Exfiltration Over C2 Channel** (Restricting where the agent can send data)
- **TA0003 - Persistence** (If the agent were tricked into maintaining broad access)
- **T1537 - Screen Capture** (If data gathered from one site is exfiltrated via another)
## Functionality
### Core Capabilities
- **Origin Categorization:** A gating function analyzes the task context and divides website origins into two defined sets: Read-only and Read-writable.
- **Data Flow Limitation:** The Gemini AI model is permitted to consume content only from Read-only origins and can only interact (type or click) with Read-writable origins.
- **Bounding the Threat Vector:** Ensures that only data from a limited set of origins is available to the agent, minimizing the vector for cross-origin data leaks.
### Advanced Features
- **Gating Function Independence:** Like the AC, the gating function is not exposed to untrusted web content.
- **Approval for New Origins:** The planner must obtain the gating function's explicit approval before adding any new origins, although it can use context from explicitly user-shared web pages in the current session.
## Indicators of Compromise
N/A (Defensive mechanism)
## Associated Threat Actors
Threat actors aiming to leverage compromised agents to perform unauthorized lateral movement or data exfiltration across sites where the user is logged in.
## Detection Methods
N/A (Defensive mechanism)
## Mitigation Strategies
The implementation of AOS is a mitigation strategy focusing on strict data access control for AI agents.
- **Task-Relevant Scoping:** Limiting agent functionality strictly to data related to the current task.
- **Explicit Write Permissions:** Requiring distinct permission levels for reading versus writing/interacting.
- **Gating Function Enforcement:** Utilizing a protected mechanism to approve all origin additions.
## Related Tools/Techniques
- User Alignment Critic (Works in concert to ensure proposed actions are safe)
- Shadow AI in the Browser (The environment susceptible to this type of control)
---
# Tool/Technique: Indirect Prompt Injection (via Untrusted Content)
## Overview
Indirect Prompt Injection is an attack technique wherein an attacker places malicious instructions within content (e.g., a website, a document, an email) that a generative AI agent—operating in the user's context—will ingest or process, leading the agent to perform actions against the user's explicit intent or for the attacker's benefit.
## Technical Details
- Type: Technique
- Platform: Agentic AI systems integrated into web browsers (e.g., Chrome Agent)
- Capabilities: Causing an AI agent to execute unauthorized actions, exfiltrate sensitive data, or hijack intended goals by reading compromised instructions embedded in untrusted sources.
- First Seen: Varied, but the specific exploitation in agentic browsers is emerging (context suggests imminent exploitation).
## MITRE ATT&CK Mapping
*This maps directly to attacks designed to manipulate AI models.*
- **TA0001 - Initial Access**
- **T1566.001 - Spearphishing Attachment** (If the untrusted content is delivered via a document preview)
- **TA0005 - Defense Evasion**
- **T1590 - Inhibit System Recovery** (If the injection bypasses standard controls)
- **TA0011 - Command and Control**
- **T1071 - Application Layer Protocol** (Exfiltration via agent action)
## Functionality
### Core Capabilities
- **Content Poisoning:** Embedding instructions designed to manipulate the AI model's output or subsequent actions within data that the model will process.
- **Goal Hijacking:** Forcing the agent to deviate from the user's intended goals to serve the attacker's objective.
- **Silent Execution:** Carrying out rogue actions without explicit user confirmation (if controls fail).
### Advanced Features
- **Bypassing Mitigations:** Successfully executed injections bypass existing security mitigations designed to prevent direct, prompt-based manipulation.
- **Data Exfiltration Path:** Creating unauthorized paths for sensitive data to leave the user's environment via the compromised agent's execution capabilities.
## Indicators of Compromise
IOCs are generally specific to the content placed on the web page or document, not a persistent malware artifact.
- Behavioral Indicators: Agent exhibiting actions contradictory to user task goals (e.g., navigating to an unknown external site, attempting to copy session data).
## Associated Threat Actors
Any threat actor seeking to leverage the new capabilities of agentic AI to compromise user data or compromise organizational security boundaries where these agents operate.
## Detection Methods
Detection relies on runtime analysis of AI behavior and the content being processed.
- **Prompt Injection Classifier:** Real-time detection of known or anomalous injection patterns in processed content.
- **Behavioral Monitoring:** Watching for unauthorized changes in data access scope (AOS) or unexpected command execution.
## Mitigation Strategies
Google's layered defenses are the primary mitigation:
1. User Alignment Critic review of proposed actions.
2. Agent Origin Sets restricting data interaction.
3. Prompt Injection Classifier running in parallel.
4. User approval gates for sensitive actions.
## Related Tools/Techniques
- Claude-Style Attacks (Reference to similar AI manipulation concepts)
- Shadow AI (The general domain where these actors operate)