Full Report
This white paper presents a concrete case study demonstrating the creation of a heap overflow vulnerability through the exploitation of the DICOM file format.
Analysis Summary
# Research: DICOM, Pydicom, GDCM, and Orthanc: A technical tour of what really happens in the heap
## Metadata
- **Authors:** Emmanuel Tacheau
- **Institution:** Cisco Talos Intelligence Group
- **Publication:** Talos Intelligence Blog / White Paper
- **Date:** May 28, 2026
## Abstract
This research provides a deep-dive technical analysis of the DICOM (Digital Imaging and Communications in Medicine) file format's attack surface. By examining the interaction between the Orthanc PACS server and its underlying decoding libraries (GDCM and Pydicom), the author demonstrates a concrete heap overflow vulnerability. The paper serves as a case study on how the complexity of medical imaging standards can lead to critical memory corruption when processed by automated ingest systems.
## Research Objective
The research aims to answer: How can the inherent complexity of the DICOM file format be weaponized to trigger out-of-bounds memory writes in modern medical imaging infrastructure? Specifically, it investigates the vulnerability of the Orthanc server during the image upload and parsing process.
## Methodology
### Approach
The researcher utilized a "follow-the-data" approach, tracing how an uploaded DICOM file is handled from the network interface through the application layer (Orthanc) down to the low-level parsing libraries (GDCM/Pydicom). The study focused on manual code audit and dynamic analysis to identify edge cases in length-field processing.
### Dataset/Environment
- **Software:** Orthanc Server (an open-source DICOM server).
- **Libraries:** GDCM (Grassroots DICOM) and Pydicom.
- **Protocol:** DICOM network DIMSE services and REST API upload mechanisms.
### Tools & Technologies
- Debuggers (GDB/LLVM) for monitoring heap allocations.
- Memory sanitizers (AddressSanitizer) to detect out-of-bounds writes.
- Custom DICOM hex editors and scriptable fuzzer-like generators to craft malformed tags.
## Key Findings
### Primary Results
1. **Parser Discrepancy:** Exploitation is possible because of how different layers of the software stack interpret DICOM "Sequence" tags and their associated lengths.
2. **Heap Overflow:** A specifically crafted DICOM file can bypass length validation checks, leading to a heap-based out-of-bounds write during the pixel data decompression phase.
3. **Automated Risk:** Because PACS systems often ingest data automatically from trusted and semi-trusted endpoints, the vulnerability can be triggered without direct user interaction with the malformed file.
### Supporting Evidence
- The research provides a proof-of-concept (PoC) demonstrating a crash in the Orthanc service when processing an imaging file with inconsistent Value Multiplicity (VM) and Value Representation (VR) fields.
### Novel Contributions
- This work bridges the gap between theoretical file-format fuzzing and practical exploitation of medical workflows, specifically targeting the "ingest" phase of a PACS server.
## Technical Details
The vulnerability centers on the **DICOM Data Element** structure. DICOM uses a Tag-Length-Value (TLV) format. The research highlights a specific flaw where the "Pixel Data" attribute (7FE0,0010) or "Sequence" tags are manipulated. When Orthanc calls the GDCM library to parse these tags, a mismatch between the declared length in the DICOM header and the actual memory allocated for the buffer occurs. By providing a "Undefined Length" (0xFFFFFFFF) in a context where the parser expects a finite size, an attacker can influence the heap allocator to provide a smaller buffer than necessary, subsequently overflowing it with the file's payload.
## Practical Implications
### For Security Practitioners
- DICOM parsers should be treated as high-risk entry points. Any system performing "auto-routing" or "auto-ingest" of medical images is potentially at risk of Remote Code Execution (RCE).
### For Defenders
- Implement strict "DICOM conformance" checking at the network perimeter.
- Use sandboxing for the GDCM/Pydicom parsing processes so that a heap overflow does not compromise the entire PACS database or host system.
### For Researchers
- The "complexity" of DICOM (over 3,000 pages of specification) remains a fertile ground for discovering logic flaws in how different vendors implement "standard" parsing.
## Limitations
- The research focuses on specific versions of GDCM and Orthanc; newer versions may have implemented different heap hardening measures or updated their parsing logic.
- Exploitation success can be highly dependent on the specific heap state (fragmentation) of the server at the time of upload.
## Comparison to Prior Work
While previous research has focused on the privacy aspects of DICOM (anonymization flaws), this work builds on the tradition of memory corruption research but applies it to the specific, niche parsers used in healthcare, which often lack the security scrutiny applied to web browsers or OS kernels.
## Real-world Applications
- **Use Case:** Security auditing of hospital Radiology Information Systems (RIS).
- **Implementation:** Integrating the findings into vulnerability scanners to identify unpatched Orthanc instances.
## Future Work
- Investigation into other medical formats like HL7 or FHIR for similar parsing discrepancies.
- Developing more robust, hardened DICOM parsers using memory-safe languages like Rust to replace legacy C++ implementations.
## References
- Talos Intelligence Blog: `https://blog.talosintelligence.com/`
- Orthanc Project: `https://www.orthanc-server.com/`
- GDCM GitHub Repository: `https://github.com/malaterre/GDCM`