Full Report
pyspider is a web crawling framework. It has a standalone and locally hosted website. pyspider has a flag for using authentication and not using authentication. With authentication turned on, it uses Basic Auth - the username and password prompt from the browser. When submitting a cross-domain request with cookies, the cookies are automatically attached to the request. At least, before the creation of the SameSite cookie flag. Additionally, there is a pre-flight request in many cases that would prevent CSRF. This works well for functionality but is scary for cross-site request forgery (CSRF) attacks, otherwise known as the session riding attack. Browsers do not have any CSRF mitigations for Basic HTTP authentication. So, once you log in, all requests made will now include the credentials. If a malicious actor makes the call then it's a major issue for making calls. I'm unsure if the pre-flight request has any bearing here but I'd guess it does. According to the author, it's trivial to pop a shell. The website has a request to execute arbitrary code. So, using a CSRF alongside this is leads to RCE. Sadly, the project is not longer maintained. Finding the security issue led to the project being archived. Regardless, this was completely new information to me on the Basic HTTP Authentication.
Analysis Summary
# Vulnerability: Remote Code Execution via CSRF and Reflected XSS in pyspider
## CVE Details
- **CVE ID:** CVE-2024-39163 (CSRF), CVE-2024-39162 (Reflected XSS)
- **CVSS Score:** Not explicitly listed in the article, but characterized as leading to RCE (Typically **9.8 Critical** or **8.8 High** depending on environment)
- **CWE:** CWE-352 (Cross-Site Request Forgery), CWE-79 (Cross-Site Scripting)
## Affected Systems
- **Products:** pyspider (Web crawling framework)
- **Versions:** All versions up to the point of project archival (September 2024)
- **Configurations:** Systems running the WebUI component, especially those using the `--need-auth` flag which utilizes Basic HTTP Authentication.
## Vulnerability Description
The vulnerability stems from a combination of three factors:
1. **Lack of CSRF Protection:** The Flask-based WebUI does not implement CSRF tokens. While modern browsers use `SameSite` cookie flags, Basic HTTP Authentication (the manual username/password prompt) does not have built-in CSRF mitigations. Once a user authenticates, the browser automatically attaches the Basic Auth header to subsequent requests, including those cross-origin.
2. **Reflected XSS:** The `/update` route is vulnerable to Reflected XSS via the `name` parameter. Although this is a `POST` endpoint, it can be triggered via a CSRF attack.
3. **Insecure by Design (RCE):** The pyspider code editor allows the execution of arbitrary Python code by design.
An attacker can chain these by using CSRF to trigger the XSS, and then using the XSS to interact with the code editor to execute arbitrary Python commands on the host server.
## Exploitation
- **Status:** PoC described; high likelihood of exploitation due to project abandonment.
- **Complexity:** Low (Trivial to "pop a shell" once the victim clicks a link).
- **Attack Vector:** Network (Web-based/Social Engineering).
## Impact
- **Confidentiality:** High (Full access to server data and scraper scripts).
- **Integrity:** High (Ability to modify scripts and execute arbitrary code).
- **Availability:** High (Ability to shut down or delete the service).
## Remediation
### Patches
- **None:** There are no official patches available. The maintainer has archived the repository on GitHub and officially ceased maintenance of the project in response to these findings.
### Workarounds
- **Disable the WebUI:** If the framework must be used, disable the WebUI component entirely.
- **Network Isolation:** Ensure the WebUI is only accessible via a trusted VPN or localhost, and not exposed to the open internet.
- **Reverse Proxy:** Use a reverse proxy (like Nginx or Apache) to implement more robust authentication (such as OIDC or SAML) and add security headers (CSP, etc.) that the native application lacks.
## Detection
- **Indicators of Compromise:** Unusual Python processes originating from the pyspider service; unauthorized modifications to web crawling scripts; logs showing unexpected `POST` requests to the `/update` route.
- **Detection Methods:** Static analysis of the codebase using tools like SonarQube Cloud can identify the lack of CSRF tokens and the reflected XSS sink.
## References
- **Sonar Blog:** hxxps[://]www[.]sonarsource[.]com/blog/basic-http-authentication-risk-uncovering-pyspider-vulnerabilities/
- **Pyspider Documentation:** hxxps[://]docs[.]pyspider[.]org/en/latest/
- **SonarQube Analysis:** hxxps[://]sonarcloud[.]io/project/issues?id=SonarSourceResearch_pyspider-blogpost