Full Report
ZKSync was launching the Aave V3 pool on their chain. While activating this they noticed a major bug. The bug only happened after a complex flow of supplying and borrowing assets. Since things looked weird, they paused the protocol to investigate. The contract functioned perfectly on other chains so what was going wrong here? Their initial hypothesis was an issue with the bitmap math which had user data for 256 bits of boolean pairs. Since this bitmap was crucial to the protocol, being used for a token being collateral/borrowed, it was real bad. From dynamic testing, they noticed that in situations with multiple assets enabled as collateral and borrowed, disabling the isBorrowing flag would zero out the wrong isCollateral flags. All flags to the right were being disabled for some reason. They have a great example of the bug. For instance, if you have three assets that are being used as collateral and borrowed, the values would be 000...111111. If one of the debts was paid back, it should be 000...101111. In reality, this was becoming 000...100000. It believes that the user hadn't taken out a position, even though this simply isn't true. From reading the compiler output, it became clear what was going on. The compiler had made an optimization that was wrong! Both parties that had reviewed it, Certora and Matter Labs, came to the same conclusion at the same time. ZKSync takes the EVM bytecode and translates it to ZKSync bytecode using an intermediate representation in LLVM. The LLVM compiler had a bug in it when handling 256 bit number optimizations. The code xor (shl 1, x), -1 was optimized to rotl ~1, x. When the ~1 was converted, it would turn into 2^64-1 to be zero extended instead of sign extended. The code should have been roti 2^256 - 1, x but was incorrectly turned into rotl 2^64 - 1, x. I appreciate the detailed write up from the zkSync dev team. Additionally, their phased rollout of Aave to check for bugs really paid off in this case. Even though the bug wasn't the teams fault, it's still their customers funds at risk. Great write up and incident response by the ZKEra team!
Analysis Summary
# Incident Report: ZKSync Aave V3 Compiler Optimization Failure
## Executive Summary
During the phased rollout of Aave V3 on ZKSync Era, a critical logic error was discovered involving a corruption of the user account bitmap. The issue was traced back to a bug in the LLVM-based compiler (zksolc) that incorrectly optimized 256-bit operations, leading to potential data loss regarding collateral positions. The protocol was proactively paused, preventing any loss of user funds.
## Incident Details
- **Discovery Date:** Undisclosed (During Aave V3 activation phase)
- **Incident Date:** Undisclosed
- **Affected Organization:** ZKSync / Aave
- **Sector:** Decentralized Finance (DeFi)
- **Geography:** Global / Blockchain
## Timeline of Events
### Initial Access
- **Date/Time:** N/A (Non-adversarial incident; deployment-related)
- **Vector:** Smart Contract Deployment
- **Details:** The incident was not an external attack but a protocol malfunction triggered during the activation of the Aave V3 pool on the ZKSync chain.
### Lateral Movement
- **N/A:** No lateral movement occurred as this was a logic error within the ZK-EVM execution environment.
### Data Exfiltration/Impact
- **Details:** No data was exfiltrated. However, the integrity of the protocol's "Account Membership" bitmap was compromised. When a user repaid a debt, the bug caused the system to incorrectly zero out multiple flags, effectively "forgetting" that a user had provided collateral.
### Detection & Response
- **Detection:** ZKSync team noticed "weird" behavior in account states after complex supply/borrow sequences during testing/early activation.
- **Immediate Action:** The ZKSync team utilized the protocol’s emergency pause functionality to halt operations and investigate.
- **Root Cause Analysis:** Matter Labs and Certora performed a joint investigation, pinpointing the issue in the ZK-compiler's handling of LLVM IR.
## Attack Methodology
*Note: This was a technical failure, not a malicious exploit. The "methodology" describes the technical trigger.*
- **Initial Access:** Valid protocol interactions (Supplying/Borrowing).
- **Persistence:** N/A.
- **Privilege Escalation:** N/A.
- **Defense Evasion:** The bug was difficult to detect because the contract code was identical to versions running perfectly on other EVM chains.
- **Discovery:** Dynamic testing revealed that the `isBorrowing` flag being disabled caused a cascade of unintended bit-clearing to the right of the target bit.
- **Lateral Movement:** N/A.
- **Collection:** N/A.
- **Exfiltration:** N/A.
- **Impact:** Logic error transitioned a state of `000...111111` to `000...100000` instead of the expected `000...101111`.
## Impact Assessment
- **Financial:** Zero (due to successful pause and phased rollout). Potential for total loss of collateral visibility if left unchecked.
- **Data Breach:** None.
- **Operational:** Temporary suspension of Aave V3 services on ZKSync.
- **Reputational:** Minimal/Positive; the team's transparency and phased rollout strategy were praised by the community.
## Indicators of Compromise
- **Behavioral indicators:** Unexpected zeroing of `isCollateral` flags in the user account bitmap despite existing deposits.
- **Technical indicators:** Discrepancy between EVM-standard output and ZKSync bytecode output for the operation `xor (shl 1, x), -1`.
## Response Actions
- **Containment:** Emergency pause of the Aave V3 ZKSync contracts.
- **Eradication:** Identification of the compiler bug: LLVM optimized the operation to a `rotl` (rotate left) using a 64-bit mask (`2^64-1`) instead of a 256-bit mask, leading to incorrect zero extension.
- **Recovery:** Correction of the compiler optimization logic in the ZK-EVM translation layer and redeployment/patching of the affected infrastructure.
## Lessons Learned
- **Compiler Risk:** Even if smart contract code is audited and proven on other chains, the underlying compiler/execution environment (ZK-VM vs. EVM) can introduce unique vulnerabilities.
- **Phased Rollouts:** The decision to launch Aave V3 in phases was the primary reason funds remained safe.
- **Collaborative IR:** Simultaneous verification by internal (Matter Labs) and external (Certora) teams accelerated the resolution.
## Recommendations
- **Compiler Audits:** Perform deep-dive audits specifically on the translation layer/intermediate representation (IR) of non-standard EVM chains.
- **Invariants Monitoring:** Implement real-time monitoring for state invariants (e.g., ensuring a user's collateral flag cannot be dropped if their balance is > 0).
- **Differential Testing:** Use differential fuzzing between standard EVM (Geth/Foundry) and ZK-EVM to find execution discrepancies.