Exploiting null-dereferences in the Linux kernel

Full Report

Null dereferences are commonly known as an unexploitable bug. Sure, it's a denial of service but not much else. Well, in the context of some situations, we can make it more. In Linux, when a memory corruption issue is found, the kernel will attempt to recover; this is called an oops. The oops recovery path will NOT clean up the existing code correctly. Most of the time, it simply leaves this alone and kills the entire task. In C, there the main way to track the amount of open references is by using a counter known as a reference counter (refcount). If a new reference is made, the counter is incremented. If it's removed, it's decremented. If the refcount hits 0, then the object is freed, since there are no references to it. The act of leaving the state alone sounds like it would be fine. But, in fact, it's very scary to trigger these handlers, since the code is never fully completed. In the case of a refcount, we could trigger an oops with open references that never get closed. Over enough time, a 32-bit uint could overflow. Eventually, when this is set to 0, the object will be freed, creating a use after free on the object. The author had written in bug report about a null dereference in the kernel. Simply put: when a task's VMA is not mapped at all, the mm_struct_mmap will be null. Trying to access this will lead to a null dereference in the kernel, simply by reading process mapping files. Once the kernel oops occurs, a few things are left in weird states: struct file, mm_users and task struct have a recount leak. Two locks will never be freed, resulting in a hang forever on certain operations. Because of the locks, only the refcount leak on mm_users has the potential for exploitation. Even with this though, it uses the overflow safe atomic_t type. With some Linux shenanigans that I don't fully understand, this doesn't matter though. After avoiding deadlocks and other mm_users specific problems, this is possible to overflow though. The author believes that on a server that print serial logs to console, this would take over 2 years to exploit. On a Kali Linux box with a GUI, this took 8 days to hit. To turn this into an actual exploit, the author uses this UAF to cause further havoc within the AIO subsystem. They took this to a double free crash but didn't exploit it any further. The solution to this new attack idea? Introduce an oops limit that will cause the kernel to panic after so many oops occur. However, I doubt this will be picked up for most OS's, since little bugs happen all the time. It would be a bummer if your server crashed because it had a oops once every month. This is a very clever exploit strategy that I hope to see more in the future about!

Analysis Summary