Story Network Postmortem

Full Report

Story Protocol received two denial-of-service reports that would take down the chain via panics. Both of these slipped through the cracks of audit competitions. The first vulnerability was caused by a faulty patch of a previously known issue. During the Cantina competition, a vulnerability was reported in the upstream fork codebase of Omni Network. Execution payloads being given to the execution client would be processed by Story but not by GETH due to some weird unmarshalling issues. For instance, adding the same field multiple times in JSON could bloat the payload and be considered valid. The goal was to refactor the JSON into a stricter format like protobuf but it was too late to make such a big change before launch. To fix this vulnerability, Story decided to put a hard limit on block size at 4MB. If the block size was bigger than this, the code would panic. By sending 128KB per message and sending this over and over again, the block could be valid at 4MB and lead to a node crash. For the patch, there's a quick patch and a long term patch. For the quick patch, they edited CometBFT to limit the block size to 20MB and edited the prepare proposal code not to propose blocks larger than a threshold. From rigorous dynamic testing, they determined that block sizes larger than 20MB could not be created. In the long term, they're moving to using protobuf and restricting extra fields. The second vulnerability was a logic bug in handling multiple delegation withdrawals that probably requires more digging into the codebase to fully understand. When unstaking from a validator or rewarding a delegator, the tokens are burnt from the consensus layer's balances. This is to prevent double accounting. If there are unclaimed rewards, they are automatically sent to the delegator. The function ProcessUnstakeWithdrawals iterates over a list of unbonded entries. This loop fails to deal with the situation of multiple withdrawal requests coming from the same delegator. Via some funky state handling, this led to a panic from too many coins attempting to be burned. They decided that the loop was too complicated, since it was handling too many cases at once to be cheaper computationally. They changed the code to have two loops to simplify the code. The takeaways are interesting to me from the development team: Have more time between audits and launches. This is pretty obvious but hard to do in practice. Increase test coverage. Another classic thing. Try to handle more panics within the codebase to not fail. Sometimes, you want stuff to fail but not all the time. Reduce code complexity and maximize readability. A little bit of performance gain is probably not worth a big hack. My personal takeaway as a bug bounty hunter is that DoS bugs are way easier to find as most things. If these are paid out as criticals, then it seems like it's the best bang for your buck.

Analysis Summary