Full Report
The author of this post was writing a Go implementation of ML-DSA, a post-quantum signature algorithm done by NIST last summer. After 4 days of trying to create the implementation, the code was rejecting some valid signatures. They tried debugging it for several hours but were unable to resolve the issue. So, they asked Claude Code to check it out and left their computer for a bit. The prompt explains what the code does and the issue they were dealing with. They granted it access to run the tests and implement the changes, as well as access to the source code for reading. They topped it off with ultrathink to make it go hard on the problem. To their surprise, an issue popped up! AI excels at well-scoped tasks like this one. The issue was subtle in the math. They had merged HighBits and w1Encode into a single function for using it within Sign. This function was used in Verify(), which had already produced the high bits. So, they were effectively taking the high bits twice. Claude found the issue immediately without using any exploratory tool use! Was this a fluke? They had two bugs prior to this that took an hour to debug. One was around incorrectly hardcoded constants. The other was an encoding being 32 bits instead of 32 bytes. In both cases, it was able to identify the issue through extensive debugging and multiple runs. Still, this was faster than the author of the post! I love seeing use cases of AI and the prompts used. It helps me utilize the tooling better. Thanks for the article!
Analysis Summary
# Tool/Technique: Claude Code (with Opus 4.1 & Ultrathink)
## Overview
Claude Code is an agentic AI developer tool used for automated software engineering tasks, including code exploration, debugging, and testing. In the documented context, it demonstrated high proficiency in identifying subtle, low-level logic and mathematical errors in cryptographic implementations (ML-DSA) that are typically difficult for human analysts to spot during manual peer review.
## Technical Details
- **Type:** Agentic Development/Debugging Tool
- **Platform:** Cross-platform (CLI-based interaction with local source code)
- **Capabilities:** Autonomous file reading, test execution, "ultrathink" (extended reasoning), printf-based debugging, and automated patch generation.
- **First Seen:** Usage documented circa November 2025.
## MITRE ATT&CK Mapping
*While this is a legitimate developer tool, its capabilities map to how an attacker might automate the discovery of vulnerabilities or "bugs" in security software.*
- **[TA0007 - Discovery]**
- **[T1083 - File and Directory Discovery]**: Identifying and reading local source code files to understand logic.
- **[TA0040 - Impact]**
- **[T1495 - Firmware Corruption]**: (Analogy) In this context, it identifies "corrupted" logic in low-level cryptographic primitives.
- **[Technique - Vulnerability Research]**
- Automated analysis of code to find logic flaws that lead to security bypasses (e.g., signature verification failure).
## Functionality
### Core Capabilities
- **Autonomous Debugging:** Analyzes failing test vectors and traces logic through complex mathematical functions without human intervention.
- **Context Awareness:** Can ingest entire local codebases (e.g., Go standard library internal FIPS modules) to understand function dependencies.
- **Test-Driven Refactoring:** Executes local binaries (e.g., `go test`) to validate whether its proposed fixes resolve reported issues.
### Advanced Features
- **Ultrathink Mode:** A high-reasoning state where the tool performs deep analysis of complex problems (like post-quantum cryptography) before suggesting code changes.
- **Tool Use Integration:** Ability to use a shell to run debuggers, print statements, and environment-specific commands to verify internal variable states.
## Indicators of Compromise
*Note: As this is a legitimate tool, these indicators refer to the presence of AI-assisted code modifications.*
- **File Names:** `mldsa.go`, `mldsa_test.go` (files modified during the documented session).
- **Behavioral Indicators:** Rapid generation of localized "printf" debugging statements followed by immediate removal and replacement with a functional patch.
- **Network Indicators:** Outbound HTTPS traffic to `api.anthropic.com` or associated AI provider endpoints (defanged: hxxps[://]api[.]anthropic[.]com).
## Associated Threat Actors
- **Red Teams/Security Researchers:** Used for rapid prototyping and vulnerability discovery in complex math-heavy code.
- **Legitimate Developers:** Proactive bug hunting in FIPS-compliant cryptographic modules.
## Detection Methods
- **Behavioral Detection:** Monitoring shell history for repeated, rapid execution of testing commands (e.g., `go test`) followed by instant source code modifications.
- **Code Review:** Detecting "AI-style" fixes that may resolve the immediate bug but deviate from established codebase styling (the author noted the initial fix as "mediocre" before manual refactoring).
## Mitigation Strategies
- **Manual Review of AI Patches:** Always refactor or manually verify AI-generated cryptographic fixes to ensure they meet performance and clarity standards.
- **Environment Isolation:** Run agentic tools in containers or restricted environments to prevent unintended shell command execution during the debugging process.
## Related Tools/Techniques
- **GitHub Copilot Workspace:** Similar agentic coding environment.
- **Fuzzing:** While Claude Code uses logic, fuzzing provides a complimentary brute-force approach to finding the same encoding errors.
- **ML-DSA (Dilithium):** The specific post-quantum signature algorithm being targeted/debugged in this case.