Full Report
NVIDIA is warning users to activate System Level Error-Correcting Code mitigation to protect against Rowhammer attacks on graphical processors with GDDR6 memory. [...]
Analysis Summary
# Vulnerability: GPU GDDR6 Rowhammer Attack Guidance
## CVE Details
- CVE ID: Not explicitly provided in the context.
- CVSS Score: Not explicitly provided in the context.
- CWE: Potential memory corruption/side-channel vulnerability related to DRAM cells (similar to standard Rowhammer attacks).
## Affected Systems
- Products: NVIDIA GPUs utilizing GDDR6 memory.
- Versions: Not specified, implications are for systems using GDDR6 memory susceptible to this specific attack vector.
- Configurations: Systems where error correction is not fully enabled or configured, particularly in multi-tenant environments like cloud servers.
## Vulnerability Description
The context discusses guidance provided by NVIDIA to defend against Rowhammer attacks targeting GDDR6 GPUs. Rowhammer attacks exploit physical characteristics of DRAM bits becoming flipped by repeated access patterns (hammering) adjacent memory rows. If exploited successfully, this could lead to data corruption or potentially allow privileged code execution/memory integrity compromise, especially significant in environments like cloud servers hosting multiple tenants on shared hardware.
## Exploitation
- Status: Not explicitly stated as exploited in the wild, but the guidance suggests an active area of research and potential threat.
- Complexity: Attack is described as **difficult to execute** reliably, requiring specific conditions, high access rates, and precise control.
- Attack Vector: Likely requires **Local** or process-level local access, or sophisticated monitoring/control within a multi-tenant environment to execute the required memory access patterns.
## Impact
- Confidentiality: Potential compromise due to memory manipulation.
- Integrity: Risk of data corruption or unauthorized data modification.
- Availability: Potential for stability issues or denial of service due to memory corruption.
## Remediation
### Patches
- **Newer GPUs (Blackwell RTX 50 Series, GB200, B200, B100, H100, H200, H20, GH200):** These models come with built-in on-die ECC protection, mitigating the issue without user intervention.
- **Older/Other GPUs:** Remediation relies on ensuring proper configuration of System Level ECC.
### Workarounds
- For existing systems: Ensure System Level ECC is enabled. This can often be verified or configured via:
1. **Out-of-band method:** Using the system's BMC (Baseboard Management Controller) and hardware interface software utilizing the **Redfish API** to check the "ECCModeEnabled" status.
2. **In-Band method:** Using the **`nvidia-smi`** command-line utility from the system's CPU to check and enable ECC where supported.
## Detection
- Indicators of compromise are highly specific to the exploit implementation.
- Detection methods center on verifying system configuration status:
- Check `ECCModeEnabled` status via **Redfish API** (Out-of-band).
- Check and enable ECC using **`nvidia-smi`** (In-Band).
- Tools like NSM Type 3 and NVIDIA SMBPBI can also be used (may require access permissions).
## References
- Vendor guidance discussed in articles like: bleepingcomputer dot com/news/security/nvidia-shares-guidance-to-defend-gddr6-gpus-against-rowhammer-attacks/