Full Report
AI coding is used everywhere. A particular version of it "vibecoding" is letting the AI do the programming after a prompt only and seeing how it does. The author of this post asked the LLM to create a 2FA login application. Can it write secure code for a 2FA application? They tried both Sonnet 4.5 and Anthropic. During the first attempt, it works! The wrong 2FA token will fail and the correct one succeeds. The UI actually looks very similar to a CTF challenge that I wrote recently even. It has a terrible flaw though: you can just brute force the OTP space, since it's only 6 digits without any brute force protections. After discovering this feature issue, they asked the AI if there are any security features missing from the 2FA verify step. After doing this, it identifies the missing rate limiting. So, unless you tell the LLM to think about security, it won't magically do it for you. This is a really good lesson. They asked the LLM to fix the issue. It had a rate limit of 5 invalid codes that would lock out after 15 minutes. It uses the library flask-limiter with 1.2K stars that is fairly maintained. It just adds the decorator to the function. After looking at the settings for Limiter, the application appears to limit by IP. Just by flipping the IP, the rate limit can be bypassed. With this security issue, they decided to ask the LLM for "Is there anything faulty in the rate limitation that allows for a bypass?". Upon asking this, the LLM described the second vulnerability and fixed it. The fix had some weird cases for specific IPs but seemed okay. Upon taking a deeper look, the rate limiting was now based upon the IP and username. Again, the same issue still exists... After asking for more security issues, it gives you a bunch of non-existent ones. Vibecoding will not lead to secure code. I think my job just got a lot harder. It's a great article about someone who actually tried to write a security sensitive application with LLMs to show it's terrible.
Analysis Summary
# Best Practices: AI-Assisted Security Development
## Overview
These practices address the "mirage of security" created by Large Language Models (LLMs) during "vibecoding" (prompt-only programming). LLMs prioritize functional requirements and "looking" secure over actual robust security engineering, often omitting critical protections like rate limiting or implementing them with bypassable logic.
## Key Recommendations
### Immediate Actions
1. **Stop "Vibecoding" for Auth:** Cease using LLMs to generate standalone authentication or authorization logic without expert human-in-the-loop review.
2. **Explicit Security Prompting:** Never assume an LLM will include security features by default. Explicitly demand "Defense in Depth," "Rate Limiting," and "Input Validation" in initial prompts.
3. **Validate Rate Limiting Keys:** Ensure rate limiting is bound to the **Target Identity (Username/Email)** and not just the Source IP, to prevent bypass via IP rotation.
### Short-term Improvements (1-3 months)
1. **Implement Standard Libraries:** Instead of let-ting the LLM write custom logic, force it to use well-vetted, industry-standard libraries (e.g., `Flask-Limiter`, `OWASP` recommended frameworks).
2. **Red-Teaming AI Code:** Treat all AI-generated code as "untrusted." Run manual penetration tests specifically focusing on brute-force resilience and logic bypasses.
3. **Verify Rate Limit Triggers:** Confirm that 2FA failures trigger account-level lockouts (or delays) rather than just IP-based session blocks.
### Long-term Strategy (3+ months)
1. **Establish Secure Coding Baselines:** Distribute internal "Secure AI Prompting" templates that include mandatory security headers, CSRF protections, and session management requirements.
2. **Continuous Automated Scanning:** Integrate Static Application Security Testing (SAST) and Dynamic Application Security Testing (DAST) into the CI/CD pipeline to catch AI logic errors before deployment.
## Implementation Guidance
### For Small Organizations
- **Favor managed services:** Instead of building 2FA/Login logic with LLMs, use established providers like Auth0, AWS Cognito, or Firebase.
- **Peer Review:** Ensure at least two developers review any AI-generated security code.
### For Medium Organizations
- **Library Whitelisting:** Provide developers with a list of approved security libraries the LLM must use.
- **Focus on Logic:** Use LLMs for boilerplate but manually write the "Verification" and "Rate Limit" keys to ensure they are bound correctly to user sessions.
### For Large Enterprises
- **AI Policy Governance:** Mandate that AI-generated security code undergoes a formal Security Architecture Review.
- **Custom LLM System Prompts:** Configure organizational LLM instances with system instructions that mandate Owasp Top 10 compliance for all code outputs.
## Configuration Examples
To prevent the "IP Rotation Bypass" identified in the article, use a composite key for rate limiting:
**Vulnerable AI Configuration (IP Only):**
python
# Bypassed easily with VPN/Proxies
@limiter.limit("5 per minute", key_func=get_remote_address)
**Secure Configuration (Composite Key):**
python
# Limits based on the specific username being attacked
def global_user_key():
return f"{get_remote_address()}:{request.form.get('username')}"
@limiter.limit("5 per 15 minutes", key_func=global_user_key)
## Compliance Alignment
- **NIST SP 800-63B:** Digital Identity Guidelines (Authentication and Lifecycle Management).
- **OWASP ASVS:** Application Security Verification Standard (specifically V4: Authentication and V5: Session Management).
- **ISO/IEC 27002:** Controls for Information Security.
## Common Pitfalls to Avoid
- **The "Expert" Trap:** Do not trust the LLM's self-proclamation as an "Expert Developer." It predicts likely text; it does not "understand" security architecture.
- **IP-Only Limitation:** Relying on IP addresses for protection is ineffective against modern, distributed botnets or simple IP rotation.
- **Hallucinated Vulnerabilities:** Be aware that after being corrected, LLMs may start "hallucinating" fake vulnerabilities or "gibberish" logic to satisfy the user's demand for more issues.
## Resources
- **OWASP Top 10:** [owasp[.]org/www-project-top-ten/]
- **Flask-Limiter Documentation:** [flask-limiter[.]readthedocs[.]io]
- **PyOTP (Standard for 2FA):** [github[.]com/pyauth/pyotp]