Full Report
This article demonstrates how AI can be used to modify and help detect JavaScript malware. We boosted our detection rates 10% with retraining. The post Now You See Me, Now You Don’t: Using LLMs to Obfuscate Malicious JavaScript appeared first on Unit 42.
Analysis Summary
# Tool/Technique: LLM-Assisted Adversarial Code Rewriting
## Overview
This refers to the technique of using Large Language Models (LLMs) to automatically generate novel, obfuscated, or rewritten variants of existing malicious JavaScript code. The primary purpose is to evade signature-based and static analysis detection mechanisms while preserving the original malware's malicious behavior.
## Technical Details
- Type: Technique (Adversarial ML/Code Obfuscation)
- Platform: JavaScript environments (Web browsers, script execution contexts)
- Capabilities: Automated code transformation, obfuscation via natural-looking changes, preservation of functionality during transformation.
- First Seen: Not explicitly specified, but the application using LLMs for systematic rewriting is a modern advancement leveraging recent LLM capabilities.
## MITRE ATT&CK Mapping
This technique primarily targets detection models and aims for evasion.
- **TA0005 - Defense Evasion**
- T1027 - Obfuscated Files or Information
- T1027.006 - Deobfuscation/Active Defenses (While not strictly run-time deobfuscation, the output is designed to defeat static analysis 'deobfuscation' attempts by classifiers)
- **TA0011 - Command and Control** (If the underlying malware involves this stage, as rewriting facilitates C2 communication evasion)
- T1071 - Application Layer Protocol (If network indicators are rewritten)
## Functionality
### Core Capabilities
- **Variable Renaming:** Automatically renaming variables to patterns that bypass static analysis detection.
- **Dead Code Insertion:** Introducing non-functional code sections to disrupt pattern matching.
- **Whitespace Removal/Modification:** Altering formatting in ways that are structurally valid but change the signature.
- **Step-by-Step Transformation:** Iteratively applying rewrites to maximize evasion effectiveness against classifiers.
### Advanced Features
- **Natural-Looking Transformations:** Unlike traditional obfuscators that use predefined patterns, LLMs generate transformations that appear more human-written, making them harder for ML classifiers to flag as systematically obfuscated.
- **Behavior Preservation Verification:** Using behavioral analysis tools concurrently to ensure the modified code retains its original malicious functionality despite transformation layers.
## Indicators of Compromise
The focus is on the *technique*, not a specific deployed malware instance, thus IOCs are generally related to the *generation process* results rather than known fixed hashes.
- File Hashes: N/A (Since thousands of unique variants are generated)
- File Names: N/A
- Registry Keys: N/A
- Network Indicators: N/A (Depends on the underlying malware payload, but the technique itself does not specify C2)
- Behavioral Indicators: Code exhibiting unusual structural entropy combined with consistent malicious execution signatures (e.g., malicious API calls despite major structural changes).
## Associated Threat Actors
Adversaries leveraging LLMs for malware generation and obfuscation. (No specific named groups mentioned in the context, but implies use by sophisticated criminal elements or state actors).
## Detection Methods
The article highlights the failure of existing methods and the solution developed:
- **Signature-based detection:** Significantly reduced effectiveness against LLM-rewritten samples.
- **Behavioral detection:** Used during the generation process to maintain functionality, but static/ML classifiers failed.
- **LLM-Trained Classifiers:** Retraining malicious JavaScript classifiers on adversarially generated, LLM-rewritten samples greatly improves detection rates (10% improvement cited).
## Mitigation Strategies
- **Retraining/Diversifying ML Models:** Retrain existing detection models frequently using LLM-generated adversarial samples to build robustness against novel obfuscation patterns.
- **Adversarial Testing:** Continuously test detection engines against LLM-generated variations.
- **Behavioral Focus:** Increase reliance on execution monitoring and behavioral indicators over purely static heuristics, especially for common scripting languages like JavaScript.
## Related Tools/Techniques
- Traditional Obfuscation Tools (Which produce predefined patterns)
- Generative Adversarial Networks (GANs) used in adversarial ML contexts.