Full Report
If you’ve ever cracked a hash with hashcat, you’ll know that sometimes it will give you a $HEX[0011223344] style clear. This is done to preserve the raw byte value of the clear when the encoding isn’t known (or there’s a colon “:” character). Investigation Driven by an inability to crack the majority of a certain set of hashes I suspected were in a foreign charset, I decided to have a closer look at what was going on. Let’s take a look at the following examples:
Analysis Summary
# Tool/Technique: Hashcat Hex Output Decoding/Encoding Issue
## Overview
This summary focuses on the challenges encountered when decoding plaintext recovered from password hashes cracked by **Hashcat**, specifically when the output is presented in `$HEX[0011223344]` format. This format is used by Hashcat to preserve raw byte values when the original encoding is unknown (or when characters like colons are present in the plaintext). The core issue investigated is determining the correct character encoding (like UTF-8, UTF-16LE, or Windows Code Pages) used during the original password storage or hashing process to successfully recover the cleartext.
## Technical Details
- Type: Technique (Investigative/Decoding Procedure)
- Platform: Multi-platform (Focus on Windows NT Hashing environment implications)
- Capabilities: Tool usage revolving around decoding hex representations of potentially encoded passwords, involving shell scripting and external utilities for byte manipulation and character set conversion.
- First Seen: Not explicitly stated, but the context is related to Hashcat functionality observed up to August 2020.
## MITRE ATT&CK Mapping
Since this is focused on analysis tools and decoding methods rather than an adversary TTP itself, direct mapping is difficult. However, the underlying activity relates to credential harvesting analysis:
- **T1003 - OS Credential Dumping**: The recovery and decoding of hashes implies an attempt to reverse the credential security mechanism.
- **T1003.001 - LSASS Memory**: (Indirectly, as hashes are often obtained this way before cracking)
## Functionality
### Core Capabilities
- **Hex String Extraction**: Utilizing `cut` and `tr` in shell scripts to isolate the hexadecimal chunk from Hashcat's `$HEX[...]` output within a potfile (`potfile`).
- **Byte Reversal/Conversion**: Using `xxd -ps -r` to convert the space-separated hexadecimal string back into raw bytes.
- **Encoding Trial**: Iteratively attempting to decode these raw bytes using common charsets like `utf8`, `utf-16le`, and `utf-32` via the `iconv` utility to find the correct plaintext.
### Advanced Features
- **Python Decoding Automation**: Development of a Python script to automate the trial-and-error decoding process across standard encodings (UTF-8, UTF-16, UTF-32) and error handling.
- **Code Page Investigation**: Exploration of Windows Code Pages (specifically CP1251 and CP1252) using `iconv` for decoding, revealing instances where assumed UTF-8 mappings were actually locale-specific encodings (e.g., non-Japanese characters resulting from incorrect CP1251 decoding).
- **UTF-8 Byte Space Bruteforcing**: Understanding and replicating complex Hashcat commands using `--hex-charset` and defining specific byte ranges (e.g., Arabic UTF-8 space `d880-ddbf`) for targeted password cracking attempts on complex character sets, referencing external guides.
## Indicators of Compromise
The article does not list specific IOCs associated with malware, but focuses on artifacts from the cracking process:
- File Hashes: N/A (Focus is on cracked password recovery artifacts)
- File Names: `potfile` (file containing recovered cracked hashes/results)
- Registry Keys: N/A
- Network Indicators: N/A
- Behavioral Indicators: Using Hashcat in targeted modes (`-a 3`, `-m #type`) with `--hex-charset` and defining custom character sets (`?1`, `?2`).
## Associated Threat Actors
- Analysis author: Dominic White (SensePost)
- Referenced external work by Netmux.
- No specific threat actor groups are explicitly named as using this *decoding technique*, but the context relates to password recovery efforts on Windows systems (NT Hashing context).
## Detection Methods
Detection focuses on identifying misuse of analysis tools or attempting to reconstruct original passwords from potentially obfuscated hash outputs:
- **Signature-based detection**: Monitoring for execution patterns of `iconv`, `xxd`, and complex `hashcat` command lines involving `--hex-charset` and custom byte ranges.
- **Behavioral detection**: Observing scripts or processes that systematically read output files (like potfiles) and attempt multiple encoding conversions on binary-looking data.
- **YARA rules**: N/A
## Mitigation Strategies
Mitigation targets the source of the issue: proper handling and storage of credentials:
- **Prevention measures**: Ensuring systems use modern, standardized, and well-defined character sets for password storage (preferably strong hashing algorithms that don't rely on simple encoding output representation).
- **Hardening recommendations**: Understanding modern system encoding standards (like UTF-8 being dominant) and ensuring login mechanisms (like GINA/Winlogon) adhere to consistent standards, reducing reliance on legacy code pages that Hashcat may need to brute-force byte by byte. Use of robust `.hccr` charset files for targeted brute-forcing instead of relying on generic full code page exploration.
## Related Tools/Techniques
- **Hashcat**: The primary tool whose output format is being investigated.
- **iconv**: Utility used for character set conversion.
- **xxd**: Utility used for hex/byte manipulation.
- **Windows Code Pages (e.g., CP1251, CP1252, CP1256)**: The underlying character encodings discovered to be the source of the initial decoding failures.
- **NT Hashing (LM/NTLM)**: The hashing algorithm type implied by the context of Windows password analysis.