Full Report
Mitigating factors include typing style, multi-case passwords, uncommon laptops.
Analysis Summary
# Tool/Technique: Acoustic Side Channel Attack on Keystrokes via Deep Learning
## Overview
This summary details a technique where deep learning models are used to analyze audio captured during video conferencing calls (specifically Zoom) to accurately infer the keystrokes being typed by the target user. The researchers claim an accuracy of up to 93%.
## Technical Details
- Type: Technique (Acoustic Side Channel Attack)
- Platform: Laptops targeted during VoIP/Video conferencing sessions (tested on Zoom). Requires an attacker to record audio during a session.
- Capabilities: High-accuracy inference of typed characters based solely on the sound profile of the keyboard input transmitted over an audio channel.
- First Seen: While general acoustic side-channel attacks (ASCA) are older, this specific application leveraging modern deep learning, including self-attention layers, on ubiquitous platforms like Zoom appears to be a novel finding documented in this research (Aug 2023 paper).
## MITRE ATT&CK Mapping
While this specific research isn't explicitly mapped by MITRE for an offensive tool, the underlying concept maps to reconnaissance and data exfiltration techniques:
- **TA0043 - Impact** (If used for malicious data capture)
- **T1560.001 - Archive via Compression: Archive via Library** (Conceptual overlap in extracting information discreetly)
- **TA0048 - Inhibit System Recovery** (Not directly applicable, but general reconnaissance)
- **TA0046 - Resource Development** (Creating new attack methods)
*Note: A more direct mapping might fall under T1560 - Archive Collected Data, or potentially reconnaissance if the data gathered is sensitive information.*
## Functionality
### Core Capabilities
- **Audio Capture Analysis:** Training a deep learning model to recognize the unique sound profile associated with striking individual keys on a target keyboard.
- **High Accuracy Inference:** Achieving upwards of 90% accuracy (claimed up to 93% in tests) on identifying keystrokes recorded over Zoom audio.
- **Leveraging Modern ML:** Utilizing contemporary neural network architectures, including self-attention layers, to enhance side-channel analysis compared to prior attempts.
### Advanced Features
- **Platform Specificity:** The attack is highly effective against modern laptops, especially in quieter environments like coffee shops or offices, due to their uniform, non-modular keyboard designs.
- **Post-processing Correction:** The technique can be combined with language modeling (e.g., Hidden Markov Models applied in previous studies) to correct likely transcription errors and achieve even higher effective accuracy (as seen in prior work jumping from 72% to 95% accuracy on printer attacks).
## Indicators of Compromise
This is a purely procedural attack technique, not malware, so traditional IOCs like hashes or C2s are not applicable.
- File Hashes: N/A
- File Names: N/A
- Registry Keys: N/A
- Network Indicators: N/A (The attack relies on recording the victim's existing network traffic/audio stream, not initiating new C2 connections.)
- Behavioral Indicators: Recording audio input during a video conference session from the victim's endpoint or co-opted session stream.
## Associated Threat Actors
The researchers in the paper (Joshua Harrison, Ehsan Toreini, and Marhyam Mehrnezhad) are affiliated with UK institutions (IEEE/Durham University). No known threat actor groups are associated with this specific, academic finding yet.
## Detection Methods
Detection focuses on blocking the required input conditions or monitoring for unauthorized audio access during sensitive operations.
- Signature-based detection: Not applicable for detecting the model itself unless the model's inference executable is present.
- Behavioral detection: Monitoring for applications analyzing local microphones for periodic, high-frequency sounds characteristic of typing, especially during remote sessions.
- YARA rules: Not applicable.
Mitigating factors mentioned include typing style, use of multi-case passwords, and using uncommon/non-standard keyboards.
## Mitigation Strategies
- Prevention measures: Ensuring background noise levels are high during sensitive typing sessions.
- Hardening recommendations: Using hardware or software keyboards for entering highly sensitive data (passwords). Avoiding typing sensitive data during high-gain audio sessions. Utilizing physical barriers or acoustic dampening if possible in quiet environments.
## Related Tools/Techniques
- Prior Acoustic Side Channel Attacks: Previous VoIP keylogging attacks achieving 91.7% top-5 accuracy over Skype (2017) and 74.3% accuracy on VoIP calls (2018).
- Acoustic Side Channel Attacks on Dot-Matrix Printers (using HMMs for correction).
The key differentiator for this technique is the application of modern deep learning architectures (self-attention) to dramatically improve accuracy over standard VoIP channels like Zoom.