Parallel Haskell Digest 9

Thursday, 19 April 2012, by Eric Kow.
Filed under ph-digest, parallel.

The Google Summer of Code is upon us and students have already submitted their proposals. There are a couple potential projects on concurrent data structures, which we'll have a look at below.

We will also be continuing our tour of Haskell concurrency abstractions with our word month, transaction. This digest is brought to you by the Parallel GHC project, an MSR-sponsored effort to push parallel Haskell technologies out into the real world. Check our project news below to see how we're doing in that front.

Finally, you may heard Functional Propaganda from a Simon or two. But how would the same message affect you if it came from a hardcore C++ hacker? If you haven't seen it making the rounds, have a quick look at Bartosz Milewski's The Downfall of Imperative Programming, and maybe tell your imperative friends? The FP monster is hatching from its academic egg; best be prepared!


Let's have a quick look at some of those GSoC proposals, particularly those with a parallel Haskell theme. It's all about performance this year. Two of the proposals involve using or improving parallellism in Haskell, and four are related to high-performance concurrency.

Parallel GHC project update

We have been continuing our work to make ThreadScope more helpful and informative in tracking down your parallel and concurrent Haskell performance problems. We now have the ability to collect heap statistics from the GHC runtime system and present them in ThreadScope. These features will be available for users of a recent development GHC (7.5.x) or the eventual 7.6 release. In addition to heap statistics, we have been working on collecting information from hardware performance counters, more specifically adding support for Linux Perf Events. This could be useful for studying IO-heavy programs, the idea being to visualise system calls as being distinct from actual execution of Haskell code.

Speaking of performance, we are also continuing work on the new Cloud Haskell implementation (see Duncan Coutts' Fun in the Afternoon Talk), and have lately been focused on reducing message latency. This consists of work in three areas: improving binary serialisation, investigating the implications of using Chan and MVar to pass messages between threads, and perhaps improving the Haskell network library implementation to compete better with a direct C implementation.

Word of the month

Lately, we've been investigating the various ways Haskell helps us to get to grips with concurrency. We talked about how the MVar, the Haskell variant on locks, allows us to share mutable variables between threads, with some safeguards to help ensure consistency. MVar's may provide a nice high-level packaging around locks, but as we mentioned in the last digest, they can still go horrifically wrong, just like locks and synchronized methods in other languages.

We could go through the usual litany of reasons why locks are bad news, but maybe a healthier approach would be for us to focus on the positive. What do we want as programmers? One possibility is what Simon PJ (Beautiful Concurrency) calls “modular programming”, the ability to “[build] large programs by gluing together smaller programs”. Locks fall short of helping us to meet this desire. First, because the mere act of combining two locky programs may be inherently incorrect; withdraw acct1 amt >> deposit acct2 amt is bad because of the gap between the two actions where the money is in neither account. Second, because they seal off programs that we may otherwise like to moosh together; if process p1 waits for input on a pipe, process p2 waits for input on another pipe, how do wait for either of p1 or p2? So how do we wrestle back this modularity from our locky masters? And how do we make programming fun again?

Our word of the month today is “transaction”. Software transactional memory (STM) takes this idea of a transaction (a sequence of operations that can be treated as a single atomic block) from database design. The Haskell implementation of STM was introduced in the 2005 paper Composable Memory Transactions by Harris et. al. If programming fun is what you're after, this is a paper that comes with its own war-cry: “compositionality: a programmer can control atomicity and blocking behaviour in a modular way that respects abstraction barriers.”

Here are some quick highlights of the stm library. You may notice a couple of things, first that this library introduces its own notion of variable, the TVar (MVar,IVar; what, no pirate joke?) and second that STM involves a new monad of its own. Unlike the MVar that we saw in the last digest, TVar's do not have the same notion of being full or empty; they just hold values plain and simple. As for the STM monad, we will see why it matters when we first try to do some IO.

 -- Control.Concurrent.STM
 data STM a
 instance Monad STM
 atomically :: STM a -> IO a
 data TVar a
 newTVar   :: a -> STM (TVar a)
 readTVar  :: TVar a -> STM a
 writeTVar :: TVar a -> a -> STM ()
 retry  :: STM a
 orElse :: STM a -> STM a -> STM a

To get a rough idea how some of this is used, let's look at the transactional hello world, safely wiring money from one bank account to another. For the purposes of our example, a bank account is just a balance. To get some money from an account, we read the balance, subtract the amount, and write the new balance. Making a deposit is just withdrawing negative-money.

 type Account = TVar Int
 withdraw :: Account -> Int -> STM ()        
 withdraw acc amount = do
     bal <- readTVar acc
     writeTVar acc (bal - amount)
 deposit :: Account -> Int -> STM ()
 deposit acc amount = withdraw acc (- amount)

These primitive operations (withdraw and deposit) bring us to the question of modularity. How do we know that it's safe to combine these mini-programs into a bigger one? In other words, if we write something like withdraw from 42 >> deposit to 42, how do we avoid the possibility of running into some twilight zone state where the money is neither here nor there? If people do strange things like simultaneously transfering money in the other direction, will our program still work?

The answer lies in the distinction between STM (transactions) and IO (actions). So long as we remain in STM, we are simply assembling transactions, piecing smaller ones (“withdraw from a”) into larger ones (“withdraw from a and deposit it to b”), but not actually performing them! Having composed our transactions, we can use the function atomically to turn them into IO actions.

 -- still just a transaction
 transfer :: Account -> Account -> Int -> STM ()
 transfer from to amount = do
     deposit to amount
     withdraw from amount
 -- now we have an action!
 doTransfer :: Account -> Account -> Int -> IO ()
 doTransfer from to amount =
     atomically $ transfer from to amount

And atomically does what it says on the tin: it runs the transaction in a way that renders it indivisible, no twlight zones. Lest there is any confusion, even though the transaction is indivisible, we can still have concurrency during the course of the transaction, even simultaneously read the affected TVars if we want to. The indivisibility simply means that we never catch our transactions with their pants down. We neither read nonsense mid-transactional values (simultaneous reads would either get the before or after value), nor injecting values into a transaction mid-stream.

To get a feel for how these guarantees are possible, it could be useful to take a peek under the hood. For each transaction that is run, GHC maintains a thread-local log with an entry for each TVar accessed in that transaction. Each entry contains both the old value and the new value that would be committed if the transaction is succesful. This may be easier to see with a silly example:

main = do
    v1 <- atomically $ newTVar "Joe"
    v2 <- atomically $ newTVar "Bob"
    done <- atomically $ newTVar 0
    -- thread A (you can just pretend forkDelayIO == forkIO)
    forkDelayIO . atomically $ do
                              -- transaction log if A runs first
        x <- readTVar v1      -- v1: Joe -> Joe
        y <- readTVar v2      -- v1: Joe -> Joe, v2: Sue -> Sue 
        writeTVar v1 "Sue"    -- v1: Joe -> Sue
        writeTVar v2 x        -- v1: Joe -> Sue, v2: Bob -> Joe 
        writeTVar v1 y        -- v1: Joe -> Bob, v2: Bob -> Joe
        modifyTVar done (+1)  -- (stm 2.3 but easy to define)
    -- thread B 
    forkDelayIO . atomically $ do
                              -- (if A runs first)
        writeTVar v1 "Jean"   -- v1: Bob -> Jean
        writeTVar v2 "Paul"   -- v1: Bob -> Jean, v2: Joe -> Paul
        modifyTVar done (+1)
    waitThreads 2 done
    people <- atomically $ do -- (if A runs first)
        p1 <- readTVar v1     -- v1: Jean -> Jean
        p2 <- readTVar v2     -- v1: Jean -> Jean, v2: Paul -> Paul
        return (p1, p2)
    print people -- if A runs first, (Jean, Paul)
                 -- if B runs first, (Paul, Jean).

-- boring details just for this example
forkDelayIO job = forkIO $
    randomRIO (1, 1000000) >>= threadDelay >> job
waitThreads n v = atomically $
    do { d <- readTVar v;  when (d < n) retry }

In the above, we fork off two threads, A which swaps a pair of names and, B which overwrites them with other names. Atomicity here means that other threads never see any intermediary states and state changes from other threads don't affect the current thread. For example, thread B should never see v1 being set to "Sue". Likewise, if thread A should still read "Joe" from v1 even if B simultaneously writes "Jean".

This is made possible by validation of the transaction logs. Validation normally occurs at the end of a transaction (we won't cover the two other cases here: exceptions, and thread wake-ups). It consists of checking that all the expected “before” values for TVars still match reality. If the logs are good, we commit the new values; if not, we simply discard them and try the transaction again, taking the new reality into account. This validate-and-commit model allows us to run transactions simultaneously, safely, but with the occasional rollback and retry to ensure atomicity.

The notion of a transaction log brings us to the notion of cost. Good things don't always come cheap, and using a good thing like STM may require a little familiarity with the cost model behind it. Basically, it's important to keep in mind that the values we write to TVar's may come from some arbitrary expression, and that arbitrary expressions may be arbitrarily expensive. So being forced to retry transactions may involve redoing something expensive. If the transactions affect many variables, the chances of hitting a retry go up. Likewise, if the transaction takes a long time to run, the chance goes up of some other thread making a change that triggers a retry. In the pathological worst case, you can have some transactional behemoth that never manages to commit; because some smaller faster transaction keeps stealing its thunder. So keep an eye out for starvation and the more general problem for retries being expensive.

Cost may be a bit of a bummer, but there's also a Haskell-related silver lining behind all this. Because we have a purely functional language and the enforced separation between pure functions and side-effecting actions, STM is actually quite practical in Haskell. The number of things we need to track in a transaction log is limited to handful of explicit TVars rather that just about everything. If you are coming from other languages, you may have a memory of STM as being nice, but wildly impractical. Not so in Haskell. Eminently viable.

Aside from making STM practical, this sort of separation is also good for general peace of mind. Suppose for example that we coded up a feature in our banking software to send our users a text message alert whenever their balances fall below a threshold. If we were in the middle of a complicated transaction, we might be tempted to just slap that logic right in the middle of the transaction; however, the Haskell implementation makes this deliberately impossible. This can be a bit frustrating at first (and new Haskellers are sometimes faced with the “how do I get this out of the monad” puzzle), but saves us the greater danger of bombarding our users with spurious retry-induced text messages.

The guarantees that STM offer make it a great place to get started with Haskell concurrency. After all, why make software any buggier than it needs to be? If you do want to get started, have a look at Simon Peyton Jones' Beautiful Concurrency. It's a particularly good idea to do so, because there's some really interesting ground that we've not covered here (briefly, blocking, the retry function aborts the current transaction, and causes it to be retried when appropriate; and choice: a orElse b tries a, and if that should retry, then b, and if that should also retry, the whole expression again). Other great STM resources are Simon Marlow's tutorial on parallelism and concurrency and the Real World Haskell chapter on STM. With the four resources combined, you'll see a nice range of examples from the usual bank-account one to concurrently shuffling windows between desktops.


Talks, tutorials, and packages

Mailing lists



StackOverflow and Reddit

Help and Feedback

If you'd like to make an announcement in the next Haskell Parallel Digest, then get in touch with me, Eric Kow, at Please feel free to leave any comments and feedback!