Full Report
How secure are top private AI companies? Find out from our scans and disclosures.
Analysis Summary
# Incident Report: Widespread Secret Leakage in Top AI Companies via GitHub Exposure
## Executive Summary
Wiz analysts conducted a security review focusing on 50 leading private AI companies listed in the Forbes AI 50 index and discovered that 65% of them had exposed verified secrets (such as API keys and tokens) on GitHub. The compromise stems from insecure developer practices leading to direct code exposure in public repositories, forks, and supporting materials like gists, which traditional security scanners often overlook. While the scope of actual exploitation is not definitively detailed, the potential impact includes unauthorized access to organizational structures, sensitive training data, and private AI models. Remediation requires adopting deeper code scanning technologies that examine commit history and less obvious repositories.
## Incident Details
- Discovery Date: Shortly prior to November 10, 2025 (Date of report publication)
- Incident Date: Ongoing exposure dating back to various commit histories or accidental pushes.
- Affected Organization: 65% of the 50 analyzed private AI companies from the Forbes AI 50 list.
- Sector: Artificial Intelligence / Technology Startups (AI R&D)
- Geography: Global (Based on the international nature of Forbes AI 50 constituents analyzed on GitHub)
## Timeline of Events
### Initial Access
- Date/Time: Not precisely timestamped; involves cumulative exposure over time.
- Vector: Developer misconfiguration and inadvertent publishing of secrets in source code repositories.
- Details: Secrets were found buried in public resources, including deleted forks, gists, and deep within commit histories, moving beyond the main organization's repositories.
### Lateral Movement
- Not explicitly detailed in the context provided. The primary threat is direct access to secrets which could *facilitate* lateral movement if valid credentials were used against the respective services.
### Data Exfiltration/Impact
- Potential Impact: Compromise of organizational structures, access to proprietary training data, or exposure of private AI models. Verified secrets (API keys, tokens) were leaked.
### Detection & Response
- Detection: Proactive security scanning by Wiz, utilizing enhanced scanning techniques (deep history, forks, gists) that surpass traditional CI/CD/Org-level scanning.
- Response Actions: The article implies disclosure to the affected parties or the public upon verification of the secrets, encouraging immediate remediation (though specific company responses are not detailed).
## Attack Methodology
This section describes the *exposure vector* rather than a traditional attacker kill chain:
- Initial Access: Inadvertent publishing of production secrets (API keys, tokens) into developer codebases hosted on GitHub.
- Persistence: N/A (This is an exposure event, not a persistent intrusion).
- Privilege Escalation: N/A (Secrets found directly grant authorized access levels).
- Defense Evasion: The exposure bypassed standard defenses because the secrets were hidden in "deleted forks, workflow logs, and gists," which default tooling often ignores (**Depth** scanning required detection).
- Credential Access: Direct exposure of valid credentials.
- Discovery: N/A (The researchers were looking for exposures).
- Lateral Movement: N/A (Movement is predicated on the successful use of the exposed secrets).
- Collection: Direct collection of sensitive configuration data (secrets) from repositories.
- Exfiltration: Secrets were exfiltrated by the scanning researcher (Wiz) upon verification, implying external malicious actors could exfiltrate sensitive data accessed via those keys.
- Impact: Unauthorized access potential across various cloud/service environments tied to those credentials.
## Impact Assessment
- Financial: Estimated costs related to incident response, key rotation, and potential data loss are not quantified.
- Data Breach: Verified API keys, tokens, and sensitive credentials were leaked. Potential exposure of training data and private models.
- Operational: Potential operational disruption if exposed keys lead to service manipulation or data modification/exfiltration.
- Reputational: Significant reputational risk for the 65% of analyzed AI companies due to widespread publicized security lapses concerning core intellectual property.
## Indicators of Compromise
As this is an *exposure* report, IoCs provided are abstract representations of the exposure source:
- Network indicators: N/A (Focus is on source code, not network traffic).
- File indicators: Specific configuration files containing secrets, keys, and tokens associated with AI platforms (e.g., Perplexity, WeightsAndBiases, Groq, NVIDIA API, Tavily, Langchain, Pinecone, etc.).
- Behavioral indicators: Developers checking secrets into public GitHub history, forks, or gists.
## Response Actions
Specific actions taken by the *affected* companies are not detailed, but the recommended response framework based on the findings includes:
- Containment measures: Immediate revocation and rotation of all verified leaked secrets (API keys, tokens).
- Eradication steps: Scrubbing of all compromised repositories, forks, and gists from GitHub history.
- Recovery actions: Auditing all systems accessed by the rotated credentials to ensure no exploitation occurred prior to rotation.
## Lessons Learned
- Speed, while crucial for AI development, is not an excuse for security gaps; security *must* move alongside development pace.
- Traditional secrets scanning tools are insufficient, focusing heavily on the "summit" (main repos) while ignoring the "under the water surface" (commit history, deleted forks, gists).
- Expanding the discovery **Perimeter** beyond the main organization account (i.e., scanning members' public repos) is critical for comprehensive security posture.
## Recommendations
- Implement continuous, deep source code scanning tools capable of analyzing commit history, deleted forks, and metadata adjacent to the main organization account.
- Train development teams specifically on the lifecycle of secrets and the risks associated with pushing credentials to *any* public-facing code or log source, regardless of perceived status (e.g., a deleted fork).
- Establish robust pre-commit hooks or automated CI/CD gates that actively prevent the push of known secret patterns to public repositories.