Reverse Engineering EVM Storage

Full Report

Ethereum storage is very simple: a 32-byte slot with 32-byte values. Mapping these slots back to meaningful variable names and use cases is difficult to do though. This post is about going from storage back to the usage of the data. EVM itself has no concept of variable names. To begin with, everything just starts from slot 1. Maps and other dynamic types compute the slot number using a hash, which is unrecoverable of course. To figure out this, we need the execution information of a given transaction. debug_traceTransaction can be used to replay the transaction and return trace data. Notably, with prestateTracer, we can get a summary of the before and afters of slots. Sadly, this is only the final state though. structLogs is a trace format of every single EVM step. It includes opcodes, stack, memory and everything else. From this, the author extracts the SSTORE for immediate writes and SHA3 operations for preimages of mapping slots. This is much more powerful than the previous tracer but is too bulky. A mixture of these is used to make it faster. delegateCall allows contract B to write to contract A's storage, as long as the delegate call was originally made from contract A. structLogs doesn't include the address field on each step. So, the stack must be manually tracked to know the code context that is being written to. The strategy for mapping SHA3 calls to get the preimage of a hash works well. In some cases the compiler will optimize the code>SHA3 away and just use a constant. In this case, they parse the source code to get the value of it. Their code needed to handle the decoding of all types with care, nested writes and proxy detection for Solidity. Vyper had it's own differences in writing data. The constructor also had some weird quirks in it. They created SlotScan to make this easier to see. Pretty sick stuff!

Analysis Summary