Full Report
Google Dataproc is a managed service that runs Apache Spark and Hadoop clusters for data analytics workloads. When creating an instance, the default allows for no internet access but computers in the same VPC can access the the service completely. The Dataproc cluster contains a YARN Resource Manager on port 8088 and HDFS on port 9870. Neither of these require any authentication on them. If an attacker has access to a vulnerable compute instance via an RCE bug, they can then access the Dataproc clusters. If they access the HDFS endpoint, they can browse through a file system to obtain sensitive data. Their key takeaway of using an OSS project and hosting it without considering the security consequences is a good callout though. To me, the issue is on Google for using this incorrectly. To fix this, I'd personally add a better default network permissions in order to prevent this from happening. The authors are right - shells happen and is the public instance doesn't need access to it then it shouldn't have network access to it.
Analysis Summary
# Vulnerability: Unauthenticated Internal Access to GCP Dataproc Clusters
## CVE Details
- **CVE ID**: N/A (Classified by Google as an "Abuse Risk" / Design Flaw)
- **CVSS Score**: Estimated 7.5 - 8.1 (High)
- **CWE**: CWE-306 (Missing Authentication for Critical Function), CWE-668 (Exposure of Resource to Wrong Control Sphere)
## Affected Systems
- **Products**: Google Cloud Platform (GCP) Dataproc
- **Versions**: All current managed versions utilizing Apache Hadoop and Apache Spark.
- **Configurations**: Dataproc clusters deployed using the "default" VPC or any VPC where the internal firewall allows traffic between the cluster and other potentially vulnerable Compute Engine instances.
## Vulnerability Description
GCP Dataproc is a managed service for Apache Spark and Hadoop. By design, the underlying open-source software (OSS) components—specifically the **YARN ResourceManager (Port 8088)** and **HDFS NameNode (Port 9870)**—do not have built-in authentication mechanisms.
The security flaw arises from Google’s default VPC configuration, which allows all internal traffic across all ports. If an attacker gains an initial foothold (via RCE or compromised credentials) on *any* other Compute Engine instance within the same VPC subnet, they can communicate with the Dataproc master node over the internal network. Because the web interfaces require no authentication, the attacker can gain full control over the Hadoop Distributed File System (HDFS).
## Exploitation
- **Status**: PoC available/Demonstrated by researchers (Orca Security).
- **Complexity**: Low
- **Attack Vector**: Adjacent (Internal VPC network) / Network (if firewall is misconfigured for public access).
## Impact
- **Confidentiality**: High (Full access to browse, read, and download sensitive data stored in HDFS).
- **Integrity**: High (Ability to modify or delete data, or manipulate YARN jobs).
- **Availability**: High (Ability to shut down or corrupt cluster operations).
## Remediation
### Patches
- No official patch currently exists from Google, as this is considered an "Abuse Risk" stemming from the design of the integrated OSS components.
### Workarounds
- **VPC Segmentation**: Do not deploy Dataproc clusters in the "default" VPC. Use custom VPCs with strict isolation.
- **Firewall Hardening**: Explicitly deny internal traffic to ports 8088 and 9870 from any instance that does not require access to the Dataproc cluster.
- **Component Gateway**: Enable the [Dataproc Component Gateway](https://cloud.google.com/dataproc/docs/concepts/accessing/cluster-web-interfaces#component_gateway) to provide secure, authenticated access to web interfaces via Google IAM.
## Detection
- **Indicators of Compromise**: Unexpected internal traffic volumes to ports 8088 and 9870 originating from non-admin Compute instances.
- **Detection Methods**:
- Use VPC Flow Logs to monitor for lateral movement between web-facing instances and Dataproc nodes.
- Cloud security posture management (CSPM) tools can be used to identify Dataproc clusters residing on the default VPC.
## References
- [Orca Security: Unauthenticated Access to GCP Dataproc](https://orca[.]security/resources/blog/unauthenticated-access-to-gcp-dataproc-can-lead-to-data-leak/)
- [Google Cloud: Dataproc Cluster Web Interfaces](https://cloud[.]google[.]com/dataproc/docs/concepts/accessing/cluster-web-interfaces)
- [Google Cloud: Shared Responsibility Model](https://cloud[.]google[.]com/architecture/framework/security/shared-responsibility-shared-fate)