LiquidHaskell

A large part of the allure of Haskell are its elegant, high-level ADTs that ensure¹ that programs won’t be plagued by problems like the infamous SSL heartbleed bug.

However, another part of Haskell’s charm is that when you really really need to, you can drop down to low-level pointer twiddling to squeeze the most performance out of your machine. But of course, that opens the door to the #heartbleeds.

Can we have have our cake and eat it too?

Can we twiddle pointers and still get the nice safety assurances of high-level types?

For our last post on text, we return to the topic of building a new Text value, i.e. proving the safety of write operations.

Last time, we showed how to reason about Unicode and a variable-width encoding of Chars when consuming a Text value, today we’ll look at the same issue from the perspective of building a Text.

Welcome back! Last time we left off on a bit of a cliffhanger with the unstream example. Remember, the issue we found was that some Chars can’t fit into a single Word16, so the safety of a write depends not only on the index, but also on the value being written! Before we can resolve this issue with unstream we’ll have to learn about UTF-16, so let’s take a short detour and look at how one consumes a Text.

So far we have mostly discussed LiquidHaskell in the context of recursive data structures like lists, but there comes a time in many programs when you have to put down the list and pick up an array for the sake of performance. In this series we’re going to examine the text library, which does exactly this in addition to having extensive Unicode support.

text is a popular library for efficient text processing. It provides the high-level API Haskellers have come to expect while using stream fusion and byte arrays under the hood to guarantee high performance.

Suppose we wanted to get the ith Char of a Text, we could write a function¹

42: charAt (Text a o l) i = word2char $ unsafeIndex a i
43:   where word2char = chr . fromIntegral

which extracts the underlying array, indexes into it, and casts the Word16 to a Char, using functions exported by text.

Let’s try this out in GHCi.

51: ghci> let t = T.pack "dog"
52: ghci> charAt t 0
53: 'd'
54: ghci> charAt t 2
55: 'g'

Looks good so far, what happens if we keep going?

59: ghci> charAt t 3
60: '\NUL'
61: ghci> charAt t 100
62: '\8745'

Oh dear, not only did we not get any sort of exception from Haskell, we weren’t even stopped by the OS with a segfault. This is quite dangerous since we have no idea what sort of data (private keys?) we just read! To be fair to Bryan O’Sullivan, we did use a function that was clearly marked unsafe, but I’d much rather have the compiler throw an error on these last two calls.

In this post we’ll see exactly how prevent invalid memory accesses like this with LiquidHaskell.

Previously we saw how refinements can be used to prove termination and we promised to extend our termination checker to handle “real-word” programs.

Keeping our promise, today we shall see a trick that allows liquidHaskell to prove termination on more recursive functions, namely lexicographic termination.

We’ve seen how, in the presence of lazy evaluation, refinements require termination. Next, we saw how LiquidHaskell can be used to prove termination.

Today, lets see how termination requires refinements.

That is, a crucial feature of LiquidHaskell’s termination prover is that it is not syntactically driven, i.e. is not limited to say, structural recursion. Instead, it uses the wealth of information captured by refinements that are at our disposal, in order to prove termination.

This turns out to be crucial in practice. As a quick toy example – motivated by a question by Elias – lets see how, unlike purely syntax-directed (structural) approaches, LiquidHaskell proves that recursive functions, such as Euclid’s GCD algorithm, terminates.

As explained in the last two posts, we need a termination checker to ensure that LiquidHaskell is not tricked by divergent, lazy computations into telling lies. Happily, it turns out that with very little retrofitting, and a bit of jiu jitsu, we can use refinements themselves to prove termination!

Previously, we caught LiquidHaskell telling a lie. Today, lets try to get to the bottom of this mendacity, in order to understand how we can ensure that it always tells the truth.

[Previously][ref-lies], we caught LiquidHaskell telling a lie. Today, lets try to get to the bottom of this mendacity, in order to understand how we can ensure that it always tells the truth.

LiquidHaskell

Pointers Gone Wild

Text Bug

Text Write

Text Read

Text Internals

Lexicographic Termination

Termination Requires Refinements

Checking Termination

Getting to the Bottom

Getting to the Bottom of the Lies