Life, Death, and the Critical Transition: Finding Liveness Bugs in Systems Code

Charles Killian , James Anderson , Ranjit Jhala and Amin Vahdat .

Modern software model checkers find safety violations: breaches where the system enters some bad state. However, we argue that checking liveness properties offers both a richer and more natural way to search for errors, particularly in complex concurrent and distributed systems. Liveness properties specify desirable system conditions, which need not always hold, perhaps as a result of failure or during system initialization, but which must eventually be satisfied. Existing software model checkers cannot find liveness bugs because doing so requires finding an infinite execution that does not satisfy a liveness property. We present probabilistic techniques to find a large class of executions violating liveness as well the critical transition of the execution that moves the system from not satisfying some liveness property---with hope of future recovery---to a dead state through which no live executions pass, regardless of any subsequent sequence of actions. Our software model checker, MaceMC, isolates complex liveness errors in implementations of Pastry, Chord, a reliable transport protocol, an aggregation service, Overcast, and an overlay tree implementing a failure-resilient random tree.

Proceedings of the 4th USENIX Symposium on Networked Systems Design and Implementation, 2007. (NSDI 2007), (Best Paper Award).


PostScript PDF © 2007.