Full Report
Slashdot picked up on the blog post from Light Blue TouchPaper commenting on the fact that a researcher was suprised to discover that simply putting an md5 hash into google returned a hit with a mapping to the original word.. This is an interesting concept.. A while back, we decided to fiddle with the concept of using googles indexing and spidering as a new take on the time/space trade-off for password cracking..
Analysis Summary
# Tool/Technique: Google as an MD5 Cracker (Proof of Concept)
## Overview
This describes a proof-of-concept technique that leverages Google's indexing and spidering capabilities as a novel approach to perform pre-computed hash lookups, effectively mimicking aspects of a time/space trade-off used in password cracking (rainbow tables) but utilizing an external, public index. The researcher set up a CGI script to generate strings and their corresponding MD5 hashes, hoping Google would index these pairs. A subsequent search for an MD5 hash on Google would then return the path leading to the original plaintext string.
## Technical Details
- Type: Technique / Proof of Concept Tool
- Platform: Web Service/Server hosting the CGI script (Targeting Google Spider)
- Capabilities: Generates MD5 hashes for character sets (a..ZZZZZ) and outputs them coupled with the plaintext into indexed web pages. Enables lookup of plaintext given an MD5 hash if that hash has been indexed by Google.
- First Seen: Information published November 21, 2007.
## MITRE ATT&CK Mapping
This technique relates to credential access via offline password cracking, adapted to use external search indexing.
- **TA0006 - Credential Access**
- **T1003 - OS Credential Dumping** (Analogous to the goal of cracking hashes)
- *Note: A direct mapping for using a public index for hash cracking is not explicitly defined in older ATT&CK versions, but the underlying goal is credential disclosure.*
## Functionality
### Core Capabilities
- **Hash Generation:** A CGI script generates sequential character strings (starting at 'a..ZZZZZ') and calculates the MD5 hash for each.
- **Seeding Content:** The script outputs the generated plaintext and its corresponding hash onto a specific page (e.g., `site:secure.sensepost.com + adog`).
- **Index Reliance:** The entire process relies on Google's web crawler indexing the generated content, making the resulting indexed pages function as a lightweight, distributed rainbow table.
### Advanced Features
- **Crawler-Friendly Rewriting:** URL rewriting was used to make the CGI script appear less like a script and more appealing to crawlers.
- **Progressive/Delegated Generation:** The script employs self-referencing links (URL parameters) to continue generating further character sets indefinitely once the indexed page limit (around 100k characters per document) is reached, allowing for chunked generation across multiple pages or potential delegation to other crawlers.
## Indicators of Compromise
Since this is a conceptual test environment, IoCs are specific to the researcher's setup:
- File Hashes: N/A (The technique uses existing infrastructure/Google indexing)
- File Names: CGI Script (specific implementation details not fully detailed, likely generic names)
- Registry Keys: N/A
- Network Indicators: `secure.sensepost.com` (Defanged: `secure[.]sensepost[.]com`)
- Behavioral Indicators: High volume of generated web pages containing plaintext/hash pairs being indexed by a public search engine.
## Associated Threat Actors
- This specific activity was conducted by researchers (Haroon Meer at SensePost, referencing Light Blue TouchPaper). No known malicious threat actors are associated with employing this exact Google-indexing technique for production attacks based on this context.
## Detection Methods
- **Signature-based detection:** Low, as it relies on legitimate search engine indexing services.
- **Behavioral detection:** Monitoring for web servers generating a massive volume of pages that systematically pair plaintext strings with their cryptographic hashes.
- **YARA rules:** N/A
## Mitigation Strategies
- **Prevention measures:** Ensure sensitive hashes or sensitive data derived from cracking attempts are not exposed in publicly indexable web content generated by internal services.
- **Hardening recommendations:** Limit the depth and scope of public search engine indexing for application or service URLs that might be manipulated to reveal internal data structures (e.g., using `robots.txt` or appropriate HTTP headers to restrict indexing).
## Related Tools/Techniques
- Rainbow Tables (Classic pre-computation attack)
- Brute-forcing (Direct computational comparison)
- Credential Dumping (The ultimate goal of the cracking effort)