Full Report
Ethereum storage is very simple: a 32-byte slot with 32-byte values. Mapping these slots back to meaningful variable names and use cases is difficult to do though. This post is about going from storage back to the usage of the data. EVM itself has no concept of variable names. To begin with, everything just starts from slot 1. Maps and other dynamic types compute the slot number using a hash, which is unrecoverable of course. To figure out this, we need the execution information of a given transaction. debug_traceTransaction can be used to replay the transaction and return trace data. Notably, with prestateTracer, we can get a summary of the before and afters of slots. Sadly, this is only the final state though. structLogs is a trace format of every single EVM step. It includes opcodes, stack, memory and everything else. From this, the author extracts the SSTORE for immediate writes and SHA3 operations for preimages of mapping slots. This is much more powerful than the previous tracer but is too bulky. A mixture of these is used to make it faster. delegateCall allows contract B to write to contract A's storage, as long as the delegate call was originally made from contract A. structLogs doesn't include the address field on each step. So, the stack must be manually tracked to know the code context that is being written to. The strategy for mapping SHA3 calls to get the preimage of a hash works well. In some cases the compiler will optimize the code>SHA3 away and just use a constant. In this case, they parse the source code to get the value of it. Their code needed to handle the decoding of all types with care, nested writes and proxy detection for Solidity. Vyper had it's own differences in writing data. The constructor also had some weird quirks in it. They created SlotScan to make this easier to see. Pretty sick stuff!
Analysis Summary
# Research: Reverse Engineering EVM Storage
## Metadata
- Authors: wavey (Implied from the blog post structure)
- Institution: Independent/Self-Directed Research (Implied)
- Publication: wavey.info Blog Post
- Date: December 10, 2025 (As stated in the article header)
## Abstract
This research details the technical challenges and methodologies required to reverse engineer the storage layout of Ethereum Virtual Machine (EVM) bytecode back into meaningful variable names, given only post-execution transaction traces. Since the EVM abstracts away all variable metadata, recovering this information requires analyzing runtime execution artifacts like `structLogs` to capture key events—specifically `SSTORE` operations and `SHA3` computations used for mapping keys. The work resulted in the development of SlotScan, a tool that synthesizes data from multiple tracing sources to provide a human-readable view of contract state changes.
## Research Objective
The primary objective is to develop a rigorous methodology to map abstract, 32-byte EVM storage slots back to the original high-level variable names, types, and ownership context (especially across `DELEGATECALL`s), even without source code metadata. The core question is: How can runtime execution traces be leveraged to reconstruct the compiler-generated storage map?
## Methodology
### Approach
The methodology is a multi-pass tracing approach combining different levels of trace granularity to capture the necessary attributes:
1. **Ground Truth Identification:** Using `debug_traceTransaction` with `prestateTracer` to determine the final set of changed slots and their before/after values. This loses intermediate writes.
2. **Execution Fidelity and Mapping Key Recovery:** Using the verbose `structLogs` trace to capture every EVM step. This is utilized to concurrently extract:
* The exact **order and intermediate values** of all `SSTORE` operations.
* The **preimages** (mapping keys) fed into `SHA3` operations right before the slot hash is computed (`keccak256(key || baseSlot)`).
3. **Context Tracking:** Manually tracking the call stack via opcodes to correctly attribute `SSTORE` operations to the originating contract, particularly vital during `DELEGATECALL`s where code context changes but storage address remains with the caller.
4. **Compiler Optimization Handling:** Parsing available source code metadata or inferring constants when the compiler optimizes away runtime `SHA3` calls by substituting a precomputed constant hash.
### Dataset/Environment
The analysis is generally applicable to any contract deployed on the EVM (Ethereum mainnet, L2s), focusing on transaction traces produced under different smart contract languages (Solidity and Vyper).
### Tools & Technologies
* **RPC Method:** `debug_traceTransaction`
* **Tracer Formats:**
* `prestateTracer` (for final state diffs)
* `structLogs` (for opcode-level execution traces)
* **Fundamental Opcode Operations:** `SSTORE`, `SHA3`, `DELEGATECALL`, `CREATE`/`CREATE2`.
* **Developed Tool:** SlotScan.info
## Key Findings
### Primary Results
1. **Runtime Key Recovery is Essential:** Because mapping slot hashes are one-way functions, recovering the keys that generated them requires injecting tracing logic to capture the inputs to the `SHA3` opcode *before* the hash is finalized.
2. **Layered Tracing is Necessary:** No single tracing mechanism captures all required information. `prestateTracer` provides *what* changed definitively, while `structLogs` provides *when* and *how* (preimages/order). Both must be synthesized.
3. **Call Context Management is Complex:** `DELEGATECALL` necessitates manual tracking of the call stack derived from instruction sequences, as the native `structLogs` format omits the address context for every step.
4. **Compiler Choice Impacts Decoding:** Solidity and Vyper utilize different data packing rules, string encoding schemes (e.g., Solidity's length encoding vs. Vyper's raw byte count), and layout documentation availability, requiring language-specific parsing logic.
### Supporting Evidence
* The validation loop often involves cross-referencing the slot changes identified by `prestateTracer` against the attributed writes derived from the manual `structLogs` parsing.
* Handling compiler optimizations required falling back to static analysis (parsing source code) when runtime analysis failed to find a `SHA3` operation.
### Novel Contributions
1. **Blended Tracing Strategy:** The efficient combination of high-level `prestateTracer` snapshots with low-level `SSTORE`/`SHA3` instruction extraction to manage the massive overhead of full `structLogs`.
2. **`DELEGATECALL` Attribution Logic:** A robust method for tracking contract ownership during cross-context storage writes without relying solely on potentially incomplete RPC trace fields.
3. **SlotScan Implementation:** The creation of a practical, interactive tool that applies these complex reverse engineering techniques to make EVM storage readable for general users and security engineers.
## Practical Implications
### For Security Practitioners
* Tools for state analysis must account for intermediate writes, as viewing only the final state can obscure attack vectors or complex protocol interactions.
* Understanding how mappings are hashed is crucial for fuzzing, vulnerability scanning, and forensics on chains where source code may be obscured.
### For Defenders
* When analyzing a suspected breach, defenders must use full execution traces (`structLogs`) rather than simpler state diffs to identify the *sequence* of storage modifications and the exact keys used.
* Upgradeability patterns (proxies) require constant monitoring, as storage layouts can change dramatically, defeating static layout assumptions.
### For Researchers
* There is a continuous need to track compiler optimizations that prematurely resolve hashes, forcing analysis tools to rely more heavily on static source code parsing (when available) over pure runtime tracing.
* Further research could focus on optimizing the custom JS tracer technique to reduce the overhead of retrieving transaction preimages without Gigabyte trace files.
## Limitations
* **Reliance on Source Code for Optimizations:** When a compiler optimizes away a runtime `SHA3` call by embedding the constant hash, the tracer must parse external source code metadata, introducing a dependency on verified contracts or compiler artifacts.
* **Trace Bulk:** Utilizing `structLogs` for complete analysis is computationally expensive and impractical for high-volume analysis due to file sizes (even custom filtering results in large data exchanges).
* **Constructor Quirks:** Tracing storage writes within contract constructors requires specific detection logic (e.g., tracking `CREATE` calls) that adds complexity.
## Comparison to Prior Work
Prior methods likely relied on either static analysis of compiler output layouts (which fails when source is unavailable or optimizations occur) or simple `prestateTracer` analysis (which misses intermediate writes). This work significantly advances the field by integrating runtime execution fidelity (`structLogs`) directly into forensic analysis to solve the unrecoverable nature of hash preimages.
## Real-world Applications
* **Forensics:** Reconstructing the state preceding a failed transaction or confirming the exact storage position manipulated during an exploit.
* **Auditing:** Providing contract deployers or auditors a view of the final, actual state layout, verifying against expected compiler output.
* **Data Recovery:** Tools like SlotScan enable users to view the decoded state of any contract instantly via transaction hash or address.
## Future Work
* Developing more efficient, specialized tracing pipelines that automatically switch between high-level and low-level traces based on transaction complexity to manage data size.
* Formalizing the decoding logic for Vyper compilation outputs, given its lower reliance on standardized layout metadata compared to Solidity.
* Extending call stack tracking to cover nested calls beyond simple `DELEGATECALL`s to handle more complex interaction patterns.
## References
* *References were highly technical and focused on tool development; specific external paper citations were not extractable from the provided text, hence this section remains focused on the context.*