Full Report
0. Introduction John Lambert, a distinguished researcher specializing in threat intelligence at Microsoft, once said these words that changed perspectives:... The post McAfee ATR Thinks in Graphs appeared first on McAfee Blog.
Analysis Summary
# Research: Analyzing Threat Intelligence using Graph Representations to Identify Actor Trends and Technique Usage Patterns
## Metadata
- Authors: [Not explicitly listed in the provided excerpt; likely researchers affiliated with McAfee Advanced Threat Research (ATR)]
- Institution: McAfee Advanced Threat Research (ATR)
- Publication: [Not explicitly listed in the introduction/methodology excerpt]
- Date: [Not explicitly listed in the provided excerpt]
## Abstract
This research addresses the challenge of deriving meaningful patterns from massive, disparate sets of structured threat intelligence data, which is often cumbersome when handled in traditional list-based formats. The authors advocate for and detail a methodology rooted in graphical representations, aligning with the concept that "Attackers think in graphs." The core objective is to transform relational threat data—covering campaigns, tools, actors, and targets—into effective graph structures. This allows for the application of graph algorithms to efficiently uncover trends in threat actor behavior and the usage frequency and popularity of specific MITRE ATT&CK techniques.
## Research Objective
The primary goal is to develop a methodology for quickly connecting and visualizing threat intelligence information stored in a large relational database to identify systematic patterns, specifically focusing on:
1. **Frequency:** Determining the most observed techniques and the most active threat actors.
2. **Popularity:** Identifying the most common techniques across various actors.
3. **Patterns:** Identifying clusters of actors who use similar sets of techniques or employ techniques in similar sequences.
## Methodology
### Approach
The methodology involves translating existing, normalized threat intelligence data from a relational database into graph structures. This process requires careful consideration of the underlying data structure and the chosen perspective (e.g., event-centric vs. actor-centric) to ensure the resultant graphs are analytically useful. Validation hinges on successfully applying graph algorithms to answer the specific research questions regarding frequency, popularity, and patterns.
### Dataset/Environment
The dataset is derived from threat intelligence collected and shared with the McAfee ATR team, initially piped into an internal [MISP instance](https://www.misp-project.org/). This data encapsulates information about ongoing campaigns, crimeware, and nation-state activities, structured around the four Ws and H: what, who, where, and how (e.g., actors, targeted sectors, leveraged techniques).
### Tools & Technologies
- **Data Ingestion/Storage:** MISP instance, scalable relational database instance.
- **Visualization/Analysis:** A proprietary tool designated as the "Graph Playground," which focuses on client-side, browser-based graph generation for rapid rendering, even for graphs with thousands of nodes, to avoid server-side performance bottlenecks.
## Key Findings
### Primary Results
1. Graph-based analysis successfully transforms scattered relational intelligence into interconnected structures, enabling pattern recognition that is difficult to discern from simple lists.
2. The methodology proves that translating threat intelligence datasets into adequate graphical representations yields significant analytical gains over list-based approaches.
3. Client-side graph generation (via the Graph Playground) is effective in maintaining high performance during visualization, even with large datasets.
### Supporting Evidence
- The research confirms the premise that meaningful insights can be extracted by applying graph algorithms to the threat intelligence dataset. (Specific statistical results were omitted in the provided text.)
### Novel Contributions
- Establishing a practical methodology for Threat Intelligence analysts to systematically build and exhaustively utilize graph representations derived from enterprise threat data.
- Development of a client-side visualization engine ("Graph Playground") optimized for rendering large, complex threat intelligence graphs quickly.
## Technical Details
The research emphasizes the transition from a "redundant, scalable relational database" to a graph structure. The specific architecture of the generated graphs (node types, edge definitions, and the mapping from relational attributes) is not fully detailed but is implied to need flexibility (e.g., switching between event-centric and actor-centric views). A critical technical consideration is ensuring that the resulting graphs are not overly dense (highly connected) such that they become unusable for analysis.
## Practical Implications
### For Security Practitioners
- Provides a compelling case for shifting analytical perspectives from simple threat lists (IOCs, TTPs lists) to interconnected graph models to better understand the **relationships** between various threat elements.
### For Defenders
- Allows for quicker identification of emerging trends in actor behavior or shifts in favored attack techniques by visualizing actor clusters.
- Helps in assessing the **popularity** and **frequency** of specific TTPs in a quantifiable, visual manner.
### For Researchers
- Offers a structured approach for validating the utility of graph analysis in cybersecurity threat modeling.
- Highlights the need for future research into refining granularity to better distinguish between subtly different threat actors.
## Limitations
- **Granularity Issues:** The current model lacks the necessary level of granularity to sufficiently differentiate between threat actors whose observed behavior, based on the available data, appears superficially similar.
- **Visualization Performance:** While the Graph Playground is optimized, large graphs can still present rendering challenges dependent on the visualization engine overheads.
## Comparison to Prior Work
This work builds directly upon the foundational observation that attackers utilize graph-like structures in their operations (as cited by John Lambert). Unlike systems that might only map out known campaigns onto static graphs, this research focuses on applying **graph algorithms** to the *entire dataset* to discover unknown, evolving patterns and validate the methodology itself as a superior intelligence consumer workflow.
## Future Work
- **Enhanced Granularity:** Incorporating analysis results from Endpoint Detection and Response (EDR) engines to pinpoint the exact context (e.g., process lineage associated with an ATT&CK technique) where a technique was used, thus differentiating otherwise similar actors.
- **Contextual Expansion:** Shifting the focus to include geography (country) and sector information, which requires excluding sector/country-agnostic crimeware events to ensure the resulting visualizations remain relevant.
- **Dataset Fusion:** Merging the proprietary dataset with external, open-source datasets, such as the Intezer OST Map, to enrich the graph analysis.
## References
- John Lambert's statement: [Defenders think in lists. Attackers think in graphs. As long as this is true, attackers will win most of the time.](https://github.com/JohnLaTwC/Shared/blob/master/Defenders%20think%20in%20lists.%20Attackers%20think%20in%20graphs.%20As%20long%20as%20this%20is%20true%2C%20attackers%20win.md)
- MITRE ATT&CK techniques: [https://attack.mitre.org/](https://attack.mitre.org/)
- MISP Project: [https://www.misp-project.org/](https://www.misp-project.org/)
- Intezer OST Map (referenced for future work): [https://www.intezer.com/ost-map/](https://www.intezer.com/ost-map/)