"Invariant inversion" in memory-unsafe languages

Full Report

The author begins the post with an invisible C bug. After staring at the code for a while, I couldn't find it. The bug is simply that a boolean could have a value other than 0 or 1. Why does this matter though? In memory unsafe languages like C, the invariants used to uphold memory safety are programmer-created invariants. By breaking these assumptions of the program, safe-looking good can be broken via subtle memory unsafety issues. This is the main concern of the post. Why does having a boolean that is not a 0 or a 1 matter? Because it's a boolean, the compiler assumes that this byte will only ever be a 0 or a 1. Because of this, it will make some optimizations around this. When it's a non-binary value, this breaks the logic of the optimization and leads to memory corruption in the program. In a typical C codebase, you would look for memory unsafe accesses in things like keep[index] that actually perform the access. The author compares this bug to reviewing JIT compilers. They try to enforce invariants early on in the program then the rest of the code assumes that this invariant is true. If the invariant is ever violated, then you have a memory corruption bug. According to the author, the similarity is that the memory safety violation does not come from the exact line of code like with a bad access. Instead, it's the violation of an invariant that another part of the code relies on further down the line. This is the invariant inversion. Languages can create chains of invariants leaning on other invariants leaning on other invariants... until it's a crazy mess and web and invariants. Because of this, breaking a single one of these upper-level chained invariants can have much larger consequences than you realize. Unfortunately, managing this web of invariants in your head is impossible to do because it becomes a huge graph quickly. In the case of the bool-typed variables only having a 0 or a 1, they consider this an inverted invariant because it's "higher level" than memory safety yet it is relied upon for "lower level" safety properties later. Why does this all matter? It's a new way of finding bugs! Currently, we are asking ourselves "where is the memory unsafety occurring at", which is only relevant in languages like C. Instead, we should be asking ourselves "where was the first violation of an invariant relied upon?" This different view of the world seems more reliable since it's finding the safety bug first rather than backtracing where the bad access could occur at. Great post!

Analysis Summary