Communication Patterns in Cloud Haskell (Part 1)

Friday, 05 October 2012, by Edsko de Vries.
Filed under parallel, cloud-haskell.

Master-Slave, Work-Stealing and Work-Pushing

In this series (2,3,4) of blog posts we will describe a number of basic communication patterns in Cloud Haskell. We don't assume much familiarity with Cloud Haskell, but it will probably be useful to be familiar with the basics; Towards Haskell in the Cloud (Epstein, Black and Peyton Jones, Haskell Symposium 2011) is a good starting point. We will start simple but we will finish with some more advanced techniques.

Source The source code for these examples is available:

cabal unpack

(The code is also available on github)

Disclaimer The examples in this blog post and in the distributed-process-demos package are written for educational purposes; they are not engineered for optimal performance.


Master-Slave is one of the simplest communication patterns possible. A single master process spawns a bunch of slave processes to do computations on other nodes, and then combines the results.


A single master node (red) and a bunch of slave nodes (blue). A single master process spawns a bunch of slave processes, one for each subcomputation. The slave processes post the partial results on the message queue of the master node.

For example, consider summing the number of prime factors of the natural numbers 1 to 100 (why you would want do to that is anyone's guess :) We're just keeping CPUs busy). A master process can spawn a child process on remote nodes for each of the numbers in the sequence, then collect the results and return their sum. The implementation of the slave is very simple:

slave :: (ProcessId, Integer) -> Process ()
slave (pid, n) = send pid (numPrimeFactors n)
remotable ['slave]

Recall from Towards Haskell in the Cloud that in order to spawn a process on a node we need something of type Closure (Process ()). In distributed-process if f : T1 -> T2 then

$(mkClosure 'f) :: T1 -> Closure T2

That is, the first argument the function we pass to mkClosure will act as the closure environment for that process; if you want multiple values in the closure environment, you must tuple them up. In this case, the closure environment will contain the process ID of the master and a natural number that the slave process must factorize.

The master process is a bit longer but not much more complicated:

master :: Integer -> [NodeId] -> Process Integer
master n slaves = do
  us <- getSelfPid

  -- Spawn slave processes to compute numPrimeFactors 1 .. numPrimeFactors n
  spawnLocal $ 
    forM_ (zip [1 .. n] (cycle slaves)) $ \(m, them) -> 
      spawn them ($(mkClosure 'slave) (us, m))
  -- Wait for the result
  sumIntegers (fromIntegral n)
sumIntegers :: Int -> Process Integer
sumIntegers = go 0
    go :: Integer -> Int -> Process Integer
    go !acc 0 = return acc
    go !acc n = do 
      m <- expect
      go (acc + m) (n - 1)

We have n bits of work to distribute amongst the slaves, which we do by zipping [1 .. n] and cycle slaves to get something like

[(1, slave1), (2, slave2), (3, slave3), (4, slave1), (5, slave2), ...

For each of these bits of work we spawn a separate process on the slave node, all of which will run concurrently. This may be too resource intensive (for instance, if each computation would be memory hungry). We will consider a solution to that problem in the next section.

The partial results arrive back at the master node in arbitrary order; this does not matter because the result of addition does not depend on the order of the arguments. We spawn the slaves in a separate process (using spawnLocal) so that the master process can start collecting partial results while it is still spawning more slaves.


If we spawn a separate process for each computation then all these computations run concurrently, which may be too resource intensive. We can instead spawn a single child process on each of the slave nodes, and ask each of those slave processes to factorize a bunch of numbers:


As before, we have a number of slave nodes (blue) and a single master node (red), but now we only have a single slave process on each slave node. The master process pushes the computations to be done to the message queues of the slave processes, which will process them one by one and push the partial results back on the message queue of the master process.

The slave processes wait repeatedly for an integer n and compute numPrimeFactors n. The closure environment for the slave (the first argument) now only contains the process ID of the master, because the slave receives the natural number to factorize by message:

slave :: ProcessId -> Process ()
slave them = forever $ do
  n <- expect
  send them (numPrimeFactors n)

remotable ['slave]

The master process starts one of these slave processes on each of the slave nodes, distributes the integers [1 .. 100] among them and waits for the results.

master :: Integer -> [NodeId] -> Process Integer
master n slaves = do
  us <- getSelfPid

  -- Start slave processes 
  slaveProcesses <- forM slaves $ \nid -> spawn nid ($(mkClosure 'slave) us)

  -- Distribute 1 .. n amongst the slave processes 
  forM_ (zip [1 .. n] (cycle slaveProcesses)) $ \(m, them) -> send them m 

  -- Wait for the result
  sumIntegers (fromIntegral n)

Exercise 1: The slave processes keep running even when the master process finishes. How would you modify this example so that they are terminated when they are no longer necessary?

The master pushes all bits of work to be done to the slaves up front. These messages will sit in the slaves' message queues until they are processed. If the messages are big then it might be more efficient to incrementally send messages to slaves.

Exercise 2: (More difficult) Modify the master so that the slaves will only have a limited number of messages waiting in their queue (this means that the master will need to know the sender slave of each reply). (An alternative solution is to switch from work-pushing to work-stealing, which we discuss in the next section).

Note on reliability In the Master-Slave example, if one slave process dies we can restart it to redo that single computation. Restarting is more tricky in the Work-Pushing setup, because a single process is responsible for a large amount of work.


A disadvantage of both the master-slave setup and the work-pushing setup is that the master node must decide, a priori, what each slave is going to do. Unless the master node can predict accurately how long each computation will take, it is likely that this leaves some slaves twidling their thumbs while others are buried in work.

One way to avoid this problem is to have the slaves ask the master for work whenever they're ready. This is a simple but effective way of achieving load balancing.


A single master node (red) and a bunch of slave nodes (blue). Each of the slave nodes runs a single slave process. The master node does not push work to the slaves, but rather the slaves query the master for work. To simplify the design, the master process spawns an auxiliary "work queue" process that the slave processes query for the next bit of work to do. This auxiliary process replies to the slave process which then does the work, posts the partial result to the master process message queue, and queries the "work queue" process for more work.

slave :: (ProcessId, ProcessId) -> Process ()
slave (master, workQueue) = do
    us <- getSelfPid
    go us
    go us = do
      -- Ask the queue for work 
      send workQueue us
      -- If there is work, do it, otherwise terminate 
        [ match $ \n  -> send master (numPrimeFactors n) >> go us
        , match $ \() -> return ()
remotable ['slave]

The slave is passed the process ID of the process that it can query for more work, as well as the process ID of the master. When it receives an integer it factorizes it, sends the number of prime factors to the master process, and then asks the work queue process for the next bit of work; when it receives a unit value () it terminates.

master :: Integer -> [NodeId] -> Process Integer
master n slaves = do
  us <- getSelfPid

  workQueue <- spawnLocal $ do
    -- Return the next bit of work to be done 
    forM_ [1 .. n] $ \m -> do
      them <- expect 
      send them m 

    -- Once all the work is done tell the slaves to terminate 
    forever $ do
      pid <- expect
      send pid ()

  -- Start slave processes
  forM_ slaves $ \nid -> spawn nid ($(mkClosure 'slave) (us, workQueue))

  -- Wait for the result
  sumIntegers (fromIntegral n)

The master process needs to do two things concurrently: it needs to make sure that slave nodes can ask for more work to do, and it needs to collect the partial results from the slaves. We could do this in a single process, but the design above is much simpler: the master spawns an auxiliary process whose sole purpose is to provide the slaves with more work when they request it; the master process itself meanwhile waits for the partial results from the slaves, as before.

The master spawns a local process which the slaves can query for work; it just sends out the integers [0 .. 100] in order to whomever asks nexts. It then starts the slaves, waits for results, and returns the sum.

Exercise 3: Does the above implementation of the master process guarantee that all slave nodes will be properly terminated?

Exercise 4: A downside of this approach compared to the work-pushing approach above is that the latency between computations by each slave is higher: when one computation completes, the slaves must wait for a reply from the work queue process before the next can start. How might you improve this?

To be continued

In the next blog post we will analyze the performance and memory usage of these communication patterns in more detail.

The New Cloud Haskell

Thursday, 04 October 2012, by Duncan Coutts.
Filed under parallel, cloud-haskell.

The new implementation

For about the last year we have been working on a new implementation of Cloud Haskell. This is the same idea for concurrent distributed programming in Haskell that Simon Peyton Jones has been telling everyone about, but it's a new implementation designed to be robust and flexible.

The summary about the new implementation is that it exists, it works, it's on hackage, and we think it is now ready for serious experiments.

Compared to the previous prototype:

The key packages on hackage are:

We will also release a backend for the Windows Azure cloud platform later this month.

I gave a talk at the Haskell Implementors Workshop last month with lots more details about the new implementation. The slides and video are available:

A tutorial on Communication Patterns in Cloud Haskell

Starting tomorrow we're going to do a series of blog posts about using Cloud Haskell. It's a mini-tutorial, with examples that you can try and exercises to extend the code.

We'll be focusing on patterns for distributing work amongst a number of machines in a network. We'll start with very simple distributed patterns and work up to map-reduce and a slight generalisation of map-reduce. We'll also look closely at performance and resources like memory and network latency.

A gentle introduction

If you want a little background reading to help you follow what we're going to be talking about, here's some recommendations:

More technical details

For even more technical details, see the developer documentation in the source repo and wiki

There is also the parallel-haskell google group for discussion and release announcements.

Haskell Courses

Wednesday, 19 September 2012, by Andres Löh.
Filed under well-typed.

Well-Typed is partnering with Skills Matter to offer two Haskell courses in London, targeting professional developers who want to learn Haskell.

Fast Track to Haskell

8-9 October 2012

A two-day introduction to Haskell assuming no previous Haskell or functional programming experience. Covers Haskell syntax, how to define functions and datatypes, dealing with IO, and monads.

Advanced Haskell

11-12 October 2012

A two-day course for people with basic Haskell experience who want to take their Haskell skills to the next level. Covers functional data structures, profiling Haskell programs, concurrency and parallelism, programming patterns and type-level programming.

The courses are designed such that they can be taken both individually and in sequence.

On the day in between, October 10, Skills Matter is organizing the Haskell eXchange, a one-day conference featuring talks by Simon Peyton Jones, Simon Marlow, Lennart Augustsson, Blake Rain, Duncan Coutts and Rob Harrop.

Registration for all these events is open. I hope to see many of you there.

A Cloud Haskell Appetiser (Parallel Haskell Digest 11)

Saturday, 21 July 2012, by Eric Kow.
Filed under ph-digest, parallel.

Hello Haskellers! We mentioned in the last digest that we'd have just a tiny bit more to say about Parallel Haskell. As promised, here is the completed word of month on actors and their use in Cloud Haskell. It so happens — what a coincidence! — that Well-Typed's Edsko de Vries has recently published a beta version of the new distributed-process implementation on Hackage. We'd love it if you could give it a try let us know any trouble you ran into or ways we could improve things. To help push things along a bit, this word of month will be using the new distributed-process implementation.

Also, have you had a chance to fill out the Parallel Haskell Digest Survey? It's collecting data for another couple of weeks. Anything you can tell us in the survey will inform future efforts in building the Haskell community, so if you've got a couple of minutes before Cloud Haskell Time, head over to

Parallel Haskell Digest Survey

Many thanks!

Word of the month

The word of the month series has given us a chance to survey the arsenal of Haskell parallelism and concurrency constructs:

The Haskell approach has been to explicitly recognise the vastness of the parallelism/concurrency space, in other words, to provide a multitude of right tools for a multitude of right jobs. Better still, the tools we have are largely interoperable, should we find ourselves with jobs that don't neatly fit into a single category.

The Haskell of 2012 may be in a great place for parallelism and concurrency, but don't think this is the end of the story! What we've seen so far is only a snapshot of the technology as it hurtles through the twenty-tens (How quaint are we, Future Haskeller?). While we can't say what exactly the future will bring, we can look at one of the directions that Haskell might branch into in the coming decade. The series so far has focused on things you might do with a single computer, using parallelism to speed up your software, or using concurrency abstractions to preserve your sanity in the face of non-determinism. But now what if you have more than one computer?


Our final word of the month is actor. Actors are not specific to distributed programming; they are really more of a low level concurrency abstraction on a par with threads. And they certainly aren't new either. The actor model has been around since the 70s at least, and has been seriously used for distributed programming since the late 80s with Erlang. So what makes an actor an actor? Let's compare with threads

Actor Thread
can create more actors can create more threads
can have private local state can have private local state
has NO shared state (isolated from other actors!) has limited shared state
communicates with other actors via asynchronous message passing communicates with other threads via shared variables

The essential difference between actors and threads is the isolation and message passing. There aren't any holes punched into lids here, but you can always shine a message from one jam jar to another, perhaps hoping they send you one of their own. The appeal of actors is thus a kind of simplicity, where avoiding shared state eliminates a class of concurrency bugs by definition, and where each actor can be reasoned about in isolation of its brethren.

This sort of thing may perhaps strike a chord with us functional programmers, and actually, there is quite a bit of actor-related work in Haskell: a handful of packages offering the actor as concurrency primitive, Martin Sulzmann's multi-headed twist on the model; Communicating Haskell Processes exploring an actor-ish cousin known as CSP. Finally, there's Cloud Haskell, which in explicit homage to Erlang, applies the actor model to distributed programming.

Glimpse of Cloud Haskell

We'll be taking a quick look at Cloud Haskell in this word of the month, unfortunately with only the most fleeting of glimpses. If squirting money between bank accounts is the transactional hello world, playing ping pong must surely be its distributed counterpart. Before working up to that, we first start with half a hello. The following example creates three processes — “process” is the Erlang-inspired word for the actor here — one which receives Ping messages and just prints them to screen, one which sends a single Ping message, and finally one which fires up the first two processes:

{-# LANGUAGE DeriveDataTypeable #-}
module Main where
import Control.Concurrent ( threadDelay )
import Data.Binary
import Data.Typeable
import Control.Distributed.Process
import Control.Distributed.Process.Node
import Network.Transport.TCP
-- Serializable (= Binary + Typeable)
data Ping = Ping deriving (Typeable)
instance Binary Ping where
    put Ping = putWord8 0
    get      = do { getWord8; return Ping }
server :: ReceivePort Ping -> Process ()
server rPing = do
    Ping <- receiveChan rPing
    liftIO $ putStrLn "Got a ping!"
client :: SendPort Ping -> Process ()
client sPing =
    sendChan sPing Ping
ignition :: Process ()
ignition = do
    -- start the server
    sPing <- spawnChannelLocal server
    -- start the client
    spawnLocal $ client sPing
    liftIO $ threadDelay 100000 -- wait a while
main :: IO ()
main = do
    Right transport <- createTransport "" "8080"
    node <- newLocalNode transport initRemoteTable
    runProcess node ignition

This little package gives us a chance to look at three big pieces of Cloud Haskell, the Serializable typeclass, the Process monad, and channels.


Actors send messages to each other. As programmers, we see the messages in nice high-level form (eg. Ping), but somewhere along the way, these messages are going to have to be encoded to something we can ship around on a network. Cloud Haskell makes this encoding explicit, but reasonably convenient at the same time. Things can be messages if they implement the Serializable typeclass, which is done indirectly by implementing Binary and deriving Typeable. You won't be starting from scratch, as implementations are already provided for primitives and some commonly used data structures.

Things which don't make sense as messages are deliberately left unserializable, for example MVar and TVar, which are only meaningful in the context of threads with a shared memory. Our Cloud Haskell program is perfectly free to use these constructs within processes (or within processes on the same machine; a bit more on that below), just not to ship them around.


We use “process” to mean “actor” in a similar fashion as Erlang, in other words nothing nearly so heavy as an operating system process. One different with Erlang, however, is that Cloud Haskell allows for both actor style concurrency and the thread-based approach. The infrastructure gears you towards using the actor model when talking across machines, but on the same machine, you could also conveniently do things the old way. Want to use STM to pass notes between processes? Fine, just spawn them locally via spawnLocal and give them a common TVar.

As for the Process monad, we see again the idea of special monad either for special kinds of sequencing. Here the idea is that things like sending/receiving messages or spawning other processes only makes sense for processes, and so you can only do these things in a “process context”. Process implements MonadIO, though, so any input/output you'd like to do within a process is merely a liftIO away. Going the other way, running a process from IO, you would do with the runProcess function.


Cloud Haskell provides a notion of channels (somewhat similar to those we introduced in the last word of the month), typed unidirectional pipelines that go from one process to another. Using them is optional (there are simpler ways to bop messages back and forth), but worth trying out for the promise of sending messages only to processes that will understand them. Below is a quick glance at channels in action:

data SendPort a     -- Serializable
data ReceivePort a  -- NOT Serializable
newChan     :: Serializable a => Process (SendPort a, ReceivePort a)
sendChan    :: Serializable a => SendPort a -> a -> Process ()
receiveChan :: Serializable a => ReceivePort a -> Process a

A channel comes with a send and receive port, both of which are parameterised on the same type variable. Creating a Ping channel thus gives a ReceivePort Ping out of which only Ping's will ever emerge, and a SendPort Ping into which we can only put Ping's. This looks a lot more attractive when you work with multiple channels. Replying to pings with pongs, for example, would require us to create a second channel with a send a receive port of its own, which means we have now 4 ports to juggle! Having the type distinctions makes things a bit clearer: SendPort Ping vs ReceivePort Ping vs SendPort Pong, vs ReceivePort Pong.

Finally, it's worth noticing that SendPort's are themselves Serializable, meaning that they can be copied and shipped around to other processes possibly on other computers. This allows a channel to accept data from more than one place, and also makes for idioms like including a reply-to SendPort in your messages. ReceivePort's on the other hand are (deliberately) left unserializable which leaves them tied to single computer.

Ping? What happened to Pong?

Our little example was more “hello wo” than “hello world”; we'd only managed to send a Ping without even thinking about sending Pong's back. Want to try your hand at Cloud Haskell? Here's a great opportunity!

1. [Easy] Start with a cabal install distributed-process and make sure you can run this example. Note that you'll need GHC 7.4.1 and up for this

2. [Less easy] Next, add a new Pong message (as a separate data type), extending the server to send this message back, and the client to receive that reply. There are some puzzle pieces to work through here. How does the server know where to send its replies? Moreover, how do we keep the server nice and decoupled from the client? We want it to receive pings from any client, and send a reply back to the ping'er (and not just some hard-coded client). Hint: you can solve this without touching ignition or main. Remember that SendPort is Serializable!

3. [Easy] You now have a single ping/pong interaction. Can you make the game go back and forth indefinitely (or until the threadDelay ends)? Hint: have a look at Control.Monad; it's not essential, but it's a bit nicer.


Stepping back from the technology a bit, we have introduced the notion of actors as a concurrency abstraction on a par with threads. While there's nothing that makes them specific to distributed programming, they do seem to fit nicely to the problem and have been used to great effect before. Cloud Haskell is one attempt to apply this actor model, taking some of the ideas from Erlang, and combining them with Haskell's purity and type system.

You might notice that in a word of the month about distributed programming, we've kept things on a single machine, alas! Indeed, we have not been able to do Cloud Haskell justice in this article, but we have hopefully laid some foundations by introducing some of the basic layers, Serializable messages, processes, and channels. To escape from one-machine-island, we would need to get to grips with two more concepts, nodes and closures.

Nodes can basically be thought of as separate machines (you could run multiple nodes on the same machine if you wanted to, say for development purposes). This makes for three layers: nodes (machines), which contain processes (actors), which can run any number of threads they wanted. We saw how processes can communicate by sending each other messages across channels; what we've left out is the crucial detail of what happens when the processes live on different nodes. The good news here is “nothing special”, still messages across channels. The bad news is a bit of infrastructural fiddliness setting up the nodes in the first place, assigning them to roles, and spawning remote processes… for which we need to know about closures.

The basic story with closures is that we need to be able send functions back and forth in order to do anything really useful with Cloud Haskell, and to send functions we need to say how they are Serializable. This would be easy enough — assume for now that all nodes are running the same code and just send “run function foo” style instructions — were it not for the fact that Haskellers do all sorts of crazy things with functions all the time (partially applying them, returning them from other function…), crazy things that introduce free variables. Expressing the serializability of function-and-its-free-variables was a source of furious head-scratching for a while until somebody hit on the old Henry T. Ford idea: You can have any free variables you want so long as they are a ByteString.

Where to from here? If you're looking for more introductory stuff and have not already seen, try Simon Peyton Jones's presentation of Cloud Haskell to the Scala community (1h video). Edsko has been hard at work at the distributed-process Haddock, so it's worth checking out when you're ready to roll up your sleeves and get hacking. It'd be a very good idea to have a look at the simplelocalnet backend, which will help you get started with the nitty gritty node management issues when you start yearning to go distributed. That's the practical stuff, but don't forget to read the Cloud Haskell paper either! The API has some slight differences (for example, ProcessM has since been renamed to Process), but it should be fairly straightforwardly transferable to the new package. It's likely we'll need a wider spectrum of documentation to bring more Cloud Haskellers into the fold (early days, eh?). Hopefully this word of the month will help you get started, and maybe in turn write a blog post of your own? Happy Distributed Haskell'ing!

Parallel Haskell Digest 11

Thursday, 05 July 2012, by Eric Kow.
Filed under ph-digest, parallel.

It's time for another Parallel Haskell Digest! Unfortunately, this may just be our last one, at least within the context of the Parallel GHC project. That said, we may as a community be at the very beginnings of Haskell as the language of choice for your parallel and concurrent needs. Maybe we need to keep something like the Digest going to help our little FP monster through its infancy? Any volunteers in the community? If you're interested in picking up the torch, please give us a shout!

Otherwise, if you can't take on a (perhaps rotating) digest commitment, but still want to help, would you be kind enough to fill out a small survey on the digest? There are just five questions on it, plus a feedback form. Anything you can say will help those of us in the Secret Haskell Propaganda Commitee to fine tune our efforts:

Parallel Haskell Digest Survey

It's been a fantastic year for me, working on the Parallel GHC project, learning about all sorts of neat ideas and technologies (as a basic parallel-naive Haskeller), and trying to reflect them back in a way that hopefully helps the broader community. Thanks to all of you in the parallel Haskell world first for cranking out all this great stuff for us to use, and second for your patience and support. Thanks especially to my follow Well-Typed-ers for all the fun chats, the feedback on drafts, and help getting up to speed.

One last thing before signing off as your Parallel Haskell Digester. While the digest may be coming to an end, there will at least be one encore! It turns out we had so much to say in our last word of the month, that we'll have to put in in a follow-up posting. In the meantime, we'll just leave you with a little teaser…


Word of the month (teaser!)

The word of the month series has given us a chance to survey the arsenal of Haskell parallelism and concurrency constructs:

The Haskell approach has been to explicitly recognise the vastness of the parallelism/concurrency space, in other words, to provide a multitude of right tools for a multitude of right jobs. Better still, the tools we have are largely interoperable, should we find ourselves with jobs that don't neatly fit into a single category.

The Haskell of 2012 may be in a great place for parallelism and concurrency, but don't think this is the end of the story! What we've seen so far is only a snapshot of the technology as it hurtles through the twenty-tens (How quaint are we, Future Haskeller?). While we can't say what exactly the future will bring, we can look at one of the directions that Haskell might branch into in the coming decade. The series so far has focused on things you might do with a single computer, using parallelism to speed up your software, or using concurrency abstractions to preserve your sanity in the face of non-determinism. But now what if you have more than one computer?

Our final word of the month is actor. Actors are not specific to distributed programming; they are really more of a low level concurrency abstraction on a par with threads. And they certainly aren't new either. The actor model has been around since the early 70s at least, and has been seriously used for distributed programming since the late 80s with Erlang. Can you guess where this word of the month is going? We have a bit more to say about it shortly, so while this is the last Parallel Haskell Digest, watch this space for the final word of the month :-)

Parallel GHC project update

Our work on the distributed-process implementation of Cloud Haskell continues apace. We're almost there, having implemented most of the API described in the original Epstein et al paper, except for node configuration and initialisation. We are very excited to be getting this out of the door soon and into your hands. In fact, we've even submitted a proposal to present this work at the upcoming Haskell Implementors Workshop; so hopefully you'll be able to join Duncan and Edsko in Copenhagen and catch up on the Cloud Haskell news.

As for ThreadScope, we last mentioned that we were working to make use of information from hardware performance counters (specifically, Linux Perf Events). This took a bit more work and trickier GHC patches than we had anticipated, but it does seem to be in order now and we are now in the testing phase for the next release. The next ThreadScope release will also include the use of heap statistics from the (eventual) GHC 7.6 RTS, and some user interface enhancements suggested by our users.


Blogs and packages

Mailing lists

StackOverflow and Reddit

Help and Feedback

Well, this is the end of the Haskell Parallel Digest, but feedback would still be much appreciated! Get in touch with me, Eric Kow, at Bye for now!

Parallel Haskell Digest 10

Friday, 18 May 2012, by Eric Kow.
Filed under ph-digest, parallel.

Hello Haskellers!

Did you see Ambassador Peyton Jones in Scala land? Simon was recently at ScalaDays 2012 (a large gathering for professional Scala users) giving a keynote talk on Cloud Haskell (one hour video). Cloud Haskell is a pretty exciting new development in the Haskell space, providing the beginnings of a story for distributed programming in Haskell. It's also one of the areas we're focused on over the Parallel GHC project, building a new implementation to replace the current prototype. We're looking forward to talking a bit more about Cloud Haskell in the next (and final) edition of the digest.

Wait, did I say just final? Indeed, by the next digest, we'll be wrapping up the Parallel GHC project. In addition to a bit more Cloud Haskell material, we'll give a little recap of the things we and our partners worked on over the two years. It's been fun!

Meanwhile, in this penultimate edition, we'll be taking a look at concurrent channels for our word of month. We also have new parallel Haskell book to look forward to, an update to Accelerate, the new meta-par family of packages to look at, and also a lot of recent activity on StackOverflow.



Word of the month

This month, we'll be taking a short breather in our exploration of the Haskell concurrency space, and fleshing out some of the uses for the tools we already have. In the past two digests, we saw how Haskell provides locks for low-level concurrency, and the vastly safer transactions for concurrency at a higher level. Both approaches give us the notion of a typed mutable variables, the idea being that an MVar Int would hold a locked integer, whereas a TVar Int would hold instead hold transactional reference to an integer. These variables can hold arbitrarily complex things of arbitrary type; you could have anything from a TVar Char to a TVar Customer (where Customer would be some record you've defined in your application).

Now that we have mutable variables, it's worth thinking a bit harder about what we might actually put into them. Suppose you find yourself in a typical producer/consumer scenario, for example, with a web service that automatically marks student essays, and which is broken into a piece that accepts submissions (producer) and which passes them on to the core essay-marking engine (consumer). So the producer generates essays and the consumer eats them up and does some work on them; how do we get them talking to each other? It's not enough to just use a single TVar because we want the producer to be able to continue cranking out essays whilst the consumer is working, rather than waiting for it to finish. We assume here that essay-marking is a fairly clever and computationally expensive process, and for this reason, we would want some kind of backlog that the producer can tack things on to, and the consumer can pull things off of.

As such, our word of the month is channel. The unbounded channel abstraction is something that you can fairly easily implement out of either the locky MVar's or transactional TVar's, but we'll focus on the latter as transactions are just so much more civilised (though the same concepts would mostly apply). In the STM world, channels look a little like the following:

-- Control.Concurrent.STM.TChan
data TChan a

newTChan   :: STM (TChan a)
writeTChan :: TChan a -> a -> STM ()
readTChan  :: TChan -> STM a

In the same fashion as the TVar's that we introduced last time, TChan's are parameterised with a type variable, meaning that you could have a channel of characters with TChan Char, or a channel of customers with TChan Customer, and so forth. Creating, reading, and writing to a channel are all transactions (i.e., in the the STM monad). Revisiting our essay marking service, we can sketch out how these channels might be used:

import Control.Concurrent.STM.TChan

main :: IO ()
main = do
    chan <- newTChan
    forkIO (producer chan)
    forkIO (consumer chan)
    forever $ return ()

producer :: TChan Essay -> IO ()
producer chan = forever $ do
    essay <- magicalWebFrameworkStuff
    atomically $ writeTChan chan essay

consumer :: TChan Essay -> IO ()
consumer chan = forever $ do
    essay <- atomically $ readTChan chan
    mark essay

mark :: Essay -> IO ()
mark essay = 
    putStrLn "Let me think..."
    -- State-of-the-art marking technology,
    -- just $25000 per site license
    randomRIO (1, 10000000) >>= threadDelay
    pass <- randomIO
    if pass
       then putStrLn "Pass, good job!"
       eles putStrLn "Fail!"

And that's it! Using concurrent channels does not get more complicated or deeper than this. You may have noticed that in this particular example, we have not really gained (or for that matter lost) that much from sticking to the transactional version of channels. Using the locky MVar version would basically consist of dropping the atomically's, importing from Control.Concurrent.Chan, and using Chan instead of TChan.

Now that we have a bit of an idea what channels are about, it could be worthwhile to consider what it really offers over simpler alternatives. For example, in the introduction we rejected the idea of just using a single TVar because this would force our producer and consumers to wait on each other for each and every essay, rather than going about their asynchronously merry ways.

So we know we want something like channels, but how exactly do we go about building them? For starters, wouldn't we get a channel structure by just wrapping Data.Sequence.Seq with a single TVar? It could be made to work as we are using STM (it simply wouldn't work if we were using MVar's instead; consider the empty channel), but it would leave us with the unfortunately inability to simultaneously read from and write to the channel. These operations would have to grab a hold of the whole queue, leaving the other to retry until later. It would a little sad not to enable this bit of concurrency, considering that that reading and writing take place at opposite ends of the queue, the reader walking along trying to keep up with the writer.

Instead of naively wrapping a queue, the current implementation uses a sort of linked list with TVar'ed cons cells and TVar's pointing to both the beginning (the read end) and the end of the list (the write end). Here are the data structures that make up a channel:

type TVarList a = TVar (TList a)
data TList a    = TNil | TCons a (TVarList a)

data TChan a = TChan (TVar (TVarList a)) -- read end
                     (TVar (TVarList a)) -- write end

It can be a little bit tricky to think about because we've got TVar's wrapping around things that eventually wrap around TVar's themselves. It's a whole chain of TVar's, and if you can have a TVar a, there's no reason not to have a TVar (TVar a). If that feels a bit shaky, try implementing channels yourself as a quick little exercise. We'll speed things along with a handful of pictures to illustrate how it might work. First, our visual language for talking about TVar'ed cons cells:

TChan legend

A new channel has three TVar's, one for the linked list (it points to TNil), and a pair of read/write ones pointing to this pointer:

new TChan

Writing the channel involves adding to the list and moving the write pointer to the new tail of the list:

write TChan

And finally reading those items off the channel involves moving the read pointer forward:

read TChan

The implementation should be fairly straightforward from the pictures, although one place you might get stuck when trying to read from an empty channel. After all, how do you return a value from a channel that doesn't have any, especially since you're expected to return plain old a instead of Maybe a? Well, sometimes you just gotta wait. We briefly glossed over this in our taste of STM in the last word of the month, but STM offers a retry function simply causes a transaction to be aborted and tried again. Using this notion of blocking, you should be able to get a readTChan that waits until there is something to be read.

Hopefully, the exercise of implementing channels will be a useful reminder to think of the concurrency abstractions that Haskell provides (threads, transactional mutable variables) as primitives on top of which you can build more interesting and useful abstractions. For a little more fun, head over to Simon Marlow's tutorial on parallelism and concurrency. In this tutorial, Simon illustrates the building channels over MVar's (also worth doing) and mentions an easy generalisation to multicast channels (one write end, two read ends!) and also a small extension to “unread” a value (pushing it back on to the read end). Both extensions are easy on the surface, but hard to nail down to the exact desired semantics (there's no known correct implementation of the unread extension), at least when you're dealing with locks and MVar's. But if you stick to transactions and TVar's, both wind up being straightforward. Check out his tutorial!


Blogs and packages

Mailing lists

StackOverflow and Reddit

This month saw quite a lot of activity on StackOverflow, largely from user Clinton trying to puzzle through STM and other concurrency issues. The STM and atomicModifyIORef series of questions could be interesting, at the very least, to see what sorts of things people wonder about when they first run into Haskell concurrency.

General questions




Help and Feedback

If you'd like to make an announcement in the next Haskell Parallel Digest, then get in touch with me, Eric Kow, at Please feel free to leave any comments and feedback!

Parallel Haskell Digest 9

Thursday, 19 April 2012, by Eric Kow.
Filed under ph-digest, parallel.

The Google Summer of Code is upon us and students have already submitted their proposals. There are a couple potential projects on concurrent data structures, which we'll have a look at below.

We will also be continuing our tour of Haskell concurrency abstractions with our word month, transaction. This digest is brought to you by the Parallel GHC project, an MSR-sponsored effort to push parallel Haskell technologies out into the real world. Check our project news below to see how we're doing in that front.

Finally, you may heard Functional Propaganda from a Simon or two. But how would the same message affect you if it came from a hardcore C++ hacker? If you haven't seen it making the rounds, have a quick look at Bartosz Milewski's The Downfall of Imperative Programming, and maybe tell your imperative friends? The FP monster is hatching from its academic egg; best be prepared!


Let's have a quick look at some of those GSoC proposals, particularly those with a parallel Haskell theme. It's all about performance this year. Two of the proposals involve using or improving parallellism in Haskell, and four are related to high-performance concurrency.

Parallel GHC project update

We have been continuing our work to make ThreadScope more helpful and informative in tracking down your parallel and concurrent Haskell performance problems. We now have the ability to collect heap statistics from the GHC runtime system and present them in ThreadScope. These features will be available for users of a recent development GHC (7.5.x) or the eventual 7.6 release. In addition to heap statistics, we have been working on collecting information from hardware performance counters, more specifically adding support for Linux Perf Events. This could be useful for studying IO-heavy programs, the idea being to visualise system calls as being distinct from actual execution of Haskell code.

Speaking of performance, we are also continuing work on the new Cloud Haskell implementation (see Duncan Coutts' Fun in the Afternoon Talk), and have lately been focused on reducing message latency. This consists of work in three areas: improving binary serialisation, investigating the implications of using Chan and MVar to pass messages between threads, and perhaps improving the Haskell network library implementation to compete better with a direct C implementation.

Word of the month

Lately, we've been investigating the various ways Haskell helps us to get to grips with concurrency. We talked about how the MVar, the Haskell variant on locks, allows us to share mutable variables between threads, with some safeguards to help ensure consistency. MVar's may provide a nice high-level packaging around locks, but as we mentioned in the last digest, they can still go horrifically wrong, just like locks and synchronized methods in other languages.

We could go through the usual litany of reasons why locks are bad news, but maybe a healthier approach would be for us to focus on the positive. What do we want as programmers? One possibility is what Simon PJ (Beautiful Concurrency) calls “modular programming”, the ability to “[build] large programs by gluing together smaller programs”. Locks fall short of helping us to meet this desire. First, because the mere act of combining two locky programs may be inherently incorrect; withdraw acct1 amt >> deposit acct2 amt is bad because of the gap between the two actions where the money is in neither account. Second, because they seal off programs that we may otherwise like to moosh together; if process p1 waits for input on a pipe, process p2 waits for input on another pipe, how do wait for either of p1 or p2? So how do we wrestle back this modularity from our locky masters? And how do we make programming fun again?

Our word of the month today is “transaction”. Software transactional memory (STM) takes this idea of a transaction (a sequence of operations that can be treated as a single atomic block) from database design. The Haskell implementation of STM was introduced in the 2005 paper Composable Memory Transactions by Harris et. al. If programming fun is what you're after, this is a paper that comes with its own war-cry: “compositionality: a programmer can control atomicity and blocking behaviour in a modular way that respects abstraction barriers.”

Here are some quick highlights of the stm library. You may notice a couple of things, first that this library introduces its own notion of variable, the TVar (MVar,IVar; what, no pirate joke?) and second that STM involves a new monad of its own. Unlike the MVar that we saw in the last digest, TVar's do not have the same notion of being full or empty; they just hold values plain and simple. As for the STM monad, we will see why it matters when we first try to do some IO.

 -- Control.Concurrent.STM
 data STM a
 instance Monad STM
 atomically :: STM a -> IO a
 data TVar a
 newTVar   :: a -> STM (TVar a)
 readTVar  :: TVar a -> STM a
 writeTVar :: TVar a -> a -> STM ()
 retry  :: STM a
 orElse :: STM a -> STM a -> STM a

To get a rough idea how some of this is used, let's look at the transactional hello world, safely wiring money from one bank account to another. For the purposes of our example, a bank account is just a balance. To get some money from an account, we read the balance, subtract the amount, and write the new balance. Making a deposit is just withdrawing negative-money.

 type Account = TVar Int
 withdraw :: Account -> Int -> STM ()        
 withdraw acc amount = do
     bal <- readTVar acc
     writeTVar acc (bal - amount)
 deposit :: Account -> Int -> STM ()
 deposit acc amount = withdraw acc (- amount)

These primitive operations (withdraw and deposit) bring us to the question of modularity. How do we know that it's safe to combine these mini-programs into a bigger one? In other words, if we write something like withdraw from 42 >> deposit to 42, how do we avoid the possibility of running into some twilight zone state where the money is neither here nor there? If people do strange things like simultaneously transfering money in the other direction, will our program still work?

The answer lies in the distinction between STM (transactions) and IO (actions). So long as we remain in STM, we are simply assembling transactions, piecing smaller ones (“withdraw from a”) into larger ones (“withdraw from a and deposit it to b”), but not actually performing them! Having composed our transactions, we can use the function atomically to turn them into IO actions.

 -- still just a transaction
 transfer :: Account -> Account -> Int -> STM ()
 transfer from to amount = do
     deposit to amount
     withdraw from amount
 -- now we have an action!
 doTransfer :: Account -> Account -> Int -> IO ()
 doTransfer from to amount =
     atomically $ transfer from to amount

And atomically does what it says on the tin: it runs the transaction in a way that renders it indivisible, no twlight zones. Lest there is any confusion, even though the transaction is indivisible, we can still have concurrency during the course of the transaction, even simultaneously read the affected TVars if we want to. The indivisibility simply means that we never catch our transactions with their pants down. We neither read nonsense mid-transactional values (simultaneous reads would either get the before or after value), nor injecting values into a transaction mid-stream.

To get a feel for how these guarantees are possible, it could be useful to take a peek under the hood. For each transaction that is run, GHC maintains a thread-local log with an entry for each TVar accessed in that transaction. Each entry contains both the old value and the new value that would be committed if the transaction is succesful. This may be easier to see with a silly example:

main = do
    v1 <- atomically $ newTVar "Joe"
    v2 <- atomically $ newTVar "Bob"
    done <- atomically $ newTVar 0
    -- thread A (you can just pretend forkDelayIO == forkIO)
    forkDelayIO . atomically $ do
                              -- transaction log if A runs first
        x <- readTVar v1      -- v1: Joe -> Joe
        y <- readTVar v2      -- v1: Joe -> Joe, v2: Sue -> Sue 
        writeTVar v1 "Sue"    -- v1: Joe -> Sue
        writeTVar v2 x        -- v1: Joe -> Sue, v2: Bob -> Joe 
        writeTVar v1 y        -- v1: Joe -> Bob, v2: Bob -> Joe
        modifyTVar done (+1)  -- (stm 2.3 but easy to define)
    -- thread B 
    forkDelayIO . atomically $ do
                              -- (if A runs first)
        writeTVar v1 "Jean"   -- v1: Bob -> Jean
        writeTVar v2 "Paul"   -- v1: Bob -> Jean, v2: Joe -> Paul
        modifyTVar done (+1)
    waitThreads 2 done
    people <- atomically $ do -- (if A runs first)
        p1 <- readTVar v1     -- v1: Jean -> Jean
        p2 <- readTVar v2     -- v1: Jean -> Jean, v2: Paul -> Paul
        return (p1, p2)
    print people -- if A runs first, (Jean, Paul)
                 -- if B runs first, (Paul, Jean).

-- boring details just for this example
forkDelayIO job = forkIO $
    randomRIO (1, 1000000) >>= threadDelay >> job
waitThreads n v = atomically $
    do { d <- readTVar v;  when (d < n) retry }

In the above, we fork off two threads, A which swaps a pair of names and, B which overwrites them with other names. Atomicity here means that other threads never see any intermediary states and state changes from other threads don't affect the current thread. For example, thread B should never see v1 being set to "Sue". Likewise, if thread A should still read "Joe" from v1 even if B simultaneously writes "Jean".

This is made possible by validation of the transaction logs. Validation normally occurs at the end of a transaction (we won't cover the two other cases here: exceptions, and thread wake-ups). It consists of checking that all the expected “before” values for TVars still match reality. If the logs are good, we commit the new values; if not, we simply discard them and try the transaction again, taking the new reality into account. This validate-and-commit model allows us to run transactions simultaneously, safely, but with the occasional rollback and retry to ensure atomicity.

The notion of a transaction log brings us to the notion of cost. Good things don't always come cheap, and using a good thing like STM may require a little familiarity with the cost model behind it. Basically, it's important to keep in mind that the values we write to TVar's may come from some arbitrary expression, and that arbitrary expressions may be arbitrarily expensive. So being forced to retry transactions may involve redoing something expensive. If the transactions affect many variables, the chances of hitting a retry go up. Likewise, if the transaction takes a long time to run, the chance goes up of some other thread making a change that triggers a retry. In the pathological worst case, you can have some transactional behemoth that never manages to commit; because some smaller faster transaction keeps stealing its thunder. So keep an eye out for starvation and the more general problem for retries being expensive.

Cost may be a bit of a bummer, but there's also a Haskell-related silver lining behind all this. Because we have a purely functional language and the enforced separation between pure functions and side-effecting actions, STM is actually quite practical in Haskell. The number of things we need to track in a transaction log is limited to handful of explicit TVars rather that just about everything. If you are coming from other languages, you may have a memory of STM as being nice, but wildly impractical. Not so in Haskell. Eminently viable.

Aside from making STM practical, this sort of separation is also good for general peace of mind. Suppose for example that we coded up a feature in our banking software to send our users a text message alert whenever their balances fall below a threshold. If we were in the middle of a complicated transaction, we might be tempted to just slap that logic right in the middle of the transaction; however, the Haskell implementation makes this deliberately impossible. This can be a bit frustrating at first (and new Haskellers are sometimes faced with the “how do I get this out of the monad” puzzle), but saves us the greater danger of bombarding our users with spurious retry-induced text messages.

The guarantees that STM offer make it a great place to get started with Haskell concurrency. After all, why make software any buggier than it needs to be? If you do want to get started, have a look at Simon Peyton Jones' Beautiful Concurrency. It's a particularly good idea to do so, because there's some really interesting ground that we've not covered here (briefly, blocking, the retry function aborts the current transaction, and causes it to be retried when appropriate; and choice: a orElse b tries a, and if that should retry, then b, and if that should also retry, the whole expression again). Other great STM resources are Simon Marlow's tutorial on parallelism and concurrency and the Real World Haskell chapter on STM. With the four resources combined, you'll see a nice range of examples from the usual bank-account one to concurrently shuffling windows between desktops.


Talks, tutorials, and packages

Mailing lists



StackOverflow and Reddit

Help and Feedback

If you'd like to make an announcement in the next Haskell Parallel Digest, then get in touch with me, Eric Kow, at Please feel free to leave any comments and feedback!

Parallel Haskell Digest 8

Friday, 02 March 2012, by Eric Kow.
Filed under ph-digest, parallel.

It's time for our next catch-up with the Parallel Haskell community. Did you have a chance to see Simon Peyton Jones's talk The Future is Parallel, and the Future of Parallel is Declarative? It's a good survey of some of the directions that parallel Haskell has been taking, and if you're somewhat new to this stuff, a great feel for the breadth of the community. You'll get a better idea why you see people in this digest muttering about arrays, others trying to make channels and transactions work together, and yet others talking up the 0MQ protocol. So check it out!

We at Well-Typed are writing the digest as part of our mission to push Haskell parallelism and concurrency out into the real world. We are very excited about these technologies and we want to do whatever it takes to make it more accessible to everybody. More news below on how we're doing in this Parallel GHC project.



Parallel GHC Project Update

ThreadScope 0.2.1 has been released! This version contains the features we had demonstrated at the Haskell Implementor's Workshop in September 2011. Since our workshop prototype, we have greatly refined the spark histogram feature, tuning the visualisations so that they are easier to understand. We've also written a small tutorial to go with the new release. The ThreadScope Tour works through concrete examples on using ThreadScope to debug the performance of parallel programs. We'd love feedback you have about the tutorial, especially things you feel like you need a little more help with.

Along with the new ThreadScope and tutorial, we also have a new version of the ghc-events package which now provides an encoding of the meanings of events in state machines. This makes it possible to validate eventlogs, and doubles as an always up-to-date source of code as documentation.

We've made some progress on our work in developing a swappable transport layer for Cloud Haskell. We now have a prototype implementation “distributed-process” (intended to be the sucessor to “remote”, the current Clound Haskell implementation). For more details, see the distributed-process GitHub page, particularly the examples and the design document, which incorporates feedback on our initial proposal.

Finally a bit of partner news to wrap things up:

Word of the month

Over the next few digests, we'll be switching our focus from parallelism to concurrency. We tend to stress the distinction because Haskell offers ways to write parallel programs without making explicit use of concurrency. Parallelism done right gets us faster programs. Concurrency on the other hand buys us… concurrency. It's not going away. If every multicore computer in existence were to vanish, we would want to solve concurrent problems. Whether the simultaneity is real or simulated, we would still want to do more than one thing at the same time – accept user input, display progress messages, serve multiple clients.

So let's dig in! We first got a taste of concurrency in second Parallel Haskell digest, where we introduced the notion of threads. As an abstraction, threads give us a way to express the separation of concerns between different jobs. But this isn't enough. Sometimes we need to undo the separation just enough to pass information from one thread to another.

This brings us to our word of the month: MVar. The humble MVar (pronounced “em-var”) is one of many solutions for this communication problem, a fairly low-level one by Haskell standards, but one that is still useful enough that you'll see it used very frequently. An MVar is like a burri… wait, wrong tutorial. Actually, it is helpful to think of an MVar as a box in the sense that it holds values and can either be full or empty. The MVar type takes a type variable, so an MVar Int might hold an integer , an MVar String a String, an MVar [String] a list of strings and so on.

   -- Control.Concurrent.MVar
   data MVar a
   newEmptyMVar :: IO (MVar a)
   takeMVar :: MVar a -> IO a
   putMVar  :: MVar a -> a -> IO ()

To give an idea how this might be used, below is a small program that fetches some URL in one thread while doing something else in the other. We fork off a Haskell thread that does the fetching and write to MVar to indicate what we've retrieved. In the main thread, we do our other work and then just wait until the page has been fetched.

 main = do
    m <- newEmptyMVar
    forkIO $ do
      r <- getURL ""
      putMVar m r
    r <- takeMVar m
    putStr r

These MVar's may look a little familiar if you've used IORefs in Haskell. Here is a mini API for comparison:

   -- Data.IORef
   data IORef a
   newIORef   :: IO (IORef a)
   readIORef  :: IORef a -> IO a
   writeIORef :: IORef a -> a -> IO ()

So what exactly do MVar's buy us? Why not just use IORefs to share mutable variable across threads? The reason is that coordination between threads can get messy: we want to make sure that any value we pass from one thread to another is accounted for (and not accidentally overwritten before being consumed), that we don't try to consume values that are somehow out of date with respect to other threads (that updated values are received instead of an old value being read twice). Suppose we wanted to fetch a URL while doing something else at the same time. How do we know when we have successfully retrieved it?

 -- don't write this at home!
 inadvisableMain = do
    m <- newIORef "" -- default value? :-(
    forkIO $ do
      r <- getURL ""
      writeIORef m r -- are we overwriting something? :-(
    r <- readIORef m -- is there something to read? :-(
    putStr r

In the example above, we have no idea if the page at URL would have been fetched by the time we try to display its contents. What we are looking for is a synchronisation mechanism. We need a way to indicate that our shared values are ready to be used. For example, we might hit upon the idea of combining IORef with Maybe. Now we have a some extra bureaucracy to handle. If we read a value and get Nothing we would then know that there isn't yet a value ready to be read. One way or another we would have to account for this case, for example busy waiting until we get a Just. On the flip side, we want to make sure that when somebody has written a value, the intended recipient has read it before we write over it. This sort of bureaucracy would be best packaged up into helper functions, functions that look awfully like takeMVar and putMVar might. Notice the change in name even. Now we're not just reading, but taking, emptying the box to signal that it's been read; and we're not just writing, but putting, only writing when the box is empty. Throw in a little help from the runtime system so that we're doing something smarter than busy waiting and we'll have worked our way back to the MVar.

So the MVar combines references with locking to provide for synchronisation between threads. If you're coming from other languages, this should sound rather familiar. C programmers may have used mutexes (MVar ()) or semaphores (MVar Int) to protect shared data. Java programmers may have used synchronized methods and statements to prevent thread interference and memory inconsistency problems. The MVar is just a slightly nicer Haskell packaging to the same sort of idea. This means it suffers the same problems as its locky sisters. Sure, having the locks be implicit and putting them where they count (the data being shared) makes life a bit simpler, but at the end of the day locks are still locks. Take them in the wrong order, and you can still get deadlocks, races and all those subtle hard-to-reproduce bugs that keep programmers up at night.

What is the hapless foot-shooting programmer to do? The good news is that MVar's are only one of several mechanisms for dealing with concurrency. Safer mechanisms exist, albeit at a cost. MVar's present a compromise between performance and safety. If you are extra extra careful, you can on the one hand squeeze some serious performance out of atomicallyModifyIORef for concurrent data structures. If on the other hand, if you're willing to take a potential performance penalty in exchange for never worrying about a deadlock again, stay tuned for our next word of the month, “transaction”. For more about MVar's in the meantime, have a look at Edward Z. Yang's blog for an MVar overview as well as the updated API documentation, and finally the concurrency chapter from Real World Haskell.


Blogs and packages

Mailing lists



StackOverflow and Reddit

Help and Feedback

If you'd like to make an announcement in the next Haskell Parallel Digest, then get in touch with me, Eric Kow, at Please feel free to leave any comments and feedback!

Well-Typed are hiring: Haskell consultant

Thursday, 12 January 2012, by Andres Löh.
Filed under well-typed.

In order to keep up with customer demand, we are looking to hire a Haskell expert to work with us at Well-Typed as a Haskell consultant.

This is an exciting opportunity for someone who is passionate about Haskell and who is keen to improve and promote Haskell in a professional context.

The role is quite general and could cover any of the projects and activities that we are involved in as a company. The tasks may involve:

Well-Typed has a variety of clients. For some we do proprietary Haskell development and consulting. For others, much of the work involves open-source development and cooperating with the rest of the Haskell community: the commercial, open-source and academic users.

At the moment, we are running the Parallel GHC Project. It is likely that initial tasks will have some connection with parallel and/or concurrent programming in Haskell. We are also doing quite a bit of GHC maintenance, and some knowledge or interest in compiler internals, operating systems, the foreign language interface, and/or deployment issues would be welcome.

Our ideal candidate has excellent knowledge of Haskell, whether from industry, academia, or personal interest. Familiarity with other languages, low-level programming, and good software engineering practices are also useful. Good organisation and ablity to manage your own time, and reliably meet deadlines, is important. You are likely to have a bachelor's degree or higher in computer science or a related field, although this isn't a requirement. Experience of consulting, or running a business, is also a bonus.

The position is initially as a contractor for one year with a salary of 150 GBP per day. We offer flexible hours and work from home. Living in England is not required.

In the longer term there is the opportunity to become a member of the partnership with a full stake in the business: being involved in business decisions, and fully sharing the risks and rewards.

If you are interested, please apply via Tell us why you are interested and why you would be a good fit for the job, and attach your CV. Please also indicate when you might be able to start. We are more than happy to answer informal enquiries. Contact Duncan Coutts, Ian Lynagh or Andres Löh for further information, either by email or IRC.

The deadline for applications is Friday 27th January 2012.

About Well-Typed

Well-Typed LLP is a Haskell services company, providing consultancy services, writing bespoke applications, and offering commercial training in Haskell and related topics.

Parallel Haskell Digest 7

Saturday, 24 December 2011, by Eric Kow.
Filed under parallel, ph-digest.

GHC 7.4 is coming! There is loads to look forward to, but sometimes, it's the little things that count. For example, do you hate the fact that you can't just flip on an +RTS -N without having to first recompile your program, this time remembering to throw an -rtsopts on it? Duncan Coutts has relaxed the requirement so that commonly used RTS options can be used without it. This flag was originally implemented to counter security problems for CGI or setuid programs; however, it was also a hassle for regular users because it got in the way of common options like -eventlog, -N, or -prof. The GHC 7.4 RTS will make a better tradeoff between security and convenience, allowing a common set of benign flags without needing -rtsopts.

That's the sort of thing that the Parallel GHC Project is about. We want to push parallel Haskell out into the real world, first by helping real users (our guinea pigs industrial partners) to apply it to their work, second by making it easier to use (tools, libraries), and finally communicating more about it (this digest).

In this month's digest, we'll be catching up on news from the community. After the holidays, we'll be back with some new words of the month exploring a bit of concurrent Haskell. In the meantime, happy hacking and Merry Christmas!


Job Opportunity at Parallel Scientific

Peter Braam wants you, parallel Haskeller!

Parallel Scientific, LLC is a Boulder, CO based early stage, but funded startup company working in the area of scalable parallelization for scientific and large data computing. We are implementing radically new software tools for the creation and optimization of parallel programs benefiting applications and leveraging modern systems architecture. We build on our mathematical knowledge, cutting edge programming languages and our understanding of systems software and hardware. We are currently working with the Haskell development team and major HPC laboratories world wide on libraries and compiler extensions for parallel programming.

Note the mandatory Haskell experience and the desirability of “in depth knowledge of core Haskell libraries for parallel programming (NDP, REPA etc)”.

Parallel GHC Project Update

The Parallel GHC Project is an MSR-funded project, run by Well-Typed, with the aim of demonstrating that parallel Haskell can be employed successfully in real world projects.

Our most recent work has been in polishing the upcoming ThreadScope release that we previewed this September at the Haskell Implementor's Workshop. This new release comes with goodies for users of Strategies or the basic par/pseq parallelism: spark creation/conversion graphs, visualisations showing your spark pools filling and emptying, and histograms displaying the distribution of spark sizes. All this with the aim of helping you gain deeper insight, not just what your program is doing but why.

We've also done backend work to make ThreadScope even more useful further down the road. First, we have improved the ghc-events package by encoding the meanings of events in state machines. This makes it possible to validate eventlogs, and doubles as an always up-to-date source of code as documentation. Second, we have extended the GHC RTS to emit the startup wall-clock time and Haskell threads labels to the eventlog. The wall-clock time event allows us to synchronise logs for simultaneous processes, brining us a step closer to using ThreadScope on distributed programs. Named Haskell thread make it easier to distinguish threads from each other.

Finally, we have been exploring the use of Cloud Haskell for high performance computing on clusters. To do this we would need to abstract Cloud Haskell over different transport mechanisms, that is to develop a robust Cloud Haskell implementation sitting on top of a swappable transport layer. We have posted an initial design for this layer on the parallel-haskell list. We have taken the substantial feedback into consideration and will be sending a revised design and recording it in a page on the GHC wiki. Meanwhile, we are working to further validate our design on simple models of both the transport layer and a cloud Haskell layer on top. Longer term, we aim to implement some transports, an IP transport in particular and perhaps a single-node multi-process transport using forks and pipes.

Tutorials and Papers

Blogs and Packages

Actors, actors everywhere

More concurrency


Mailing list discussions

Help wanted

Cloud Haskell

Multicore performance

Data structures and concurrency

Threads, blocking

Stack Overflow and Reddit

Help and Feedback

If you'd like to make an announcement in the next Haskell Parallel Digest, then get in touch with me, Eric Kow, at Please feel free to leave any comments and feedback!

Bikeshed image by banlon1964 available under a CC-NC-ND-2.0 license.

Previous entries

Next entries