New TokenBreak Attack Bypasses AI Moderation with Single-Character Text Changes

[email protected] (The Hacker News) 2025-06-12T19:22:00+00:00 View Original

Full Report

Cybersecurity researchers have discovered a novel attack technique called TokenBreak that can be used to bypass a large language model's (LLM) safety and content moderation guardrails with just a single character change. "The TokenBreak attack targets a text classification model's tokenization strategy to induce false negatives, leaving end targets vulnerable to attacks that the implemented

Analysis Summary