Parallel Haskell Digest 6

Thursday, 06 October 2011, by Eric Kow.
Filed under parallel, ph-digest.

This edition of the Parallel Haskell Digest kicks off with some warm feelings from BioHaskeller Ketil Malde

Blessing of the day goes to the authors of threadscope. In fact, let me look it up: Donnie Jones, Simon Marlow, and Satnam Singh. I had an old STM experiment lying around, but couldn't quite make it go faster than the single threaded version.

Today I installed threadscope, and a glance at the output sufficed to identify the problem, and now at least it's faster. Great tool! (But you all knew that, of course)

Thanks indeed to the original authors of ThreadScope! Ketil made good use of the original ThreadScope. Since the original release, the folks at Well-Typed have taken over development of ThreadScope (Andres, Duncan, and Mikolaj) and are working to extend it so that it becomes even easier to squash those pesky performance bugs. These extensions, much like this digest, are made possible by the Parallel GHC Project. We hope you'll love the results.


Data Parallelism in Haskell slides and video: Manuel Chakravarty came all the way from Sydney (that's 931 km!) to tell the Brisbane FP Group about work going on at UNSW on data parallel programming in Haskell. The talk covers a lot of ground. It's an hour and a half long and

[It] motivates the use of functional programming for parallel, and in particular, data parallel programming and explains the difference between regular and irregular (or nested) data parallelism. It also discusses the Repa library (regular data parallelism for multicore CPUs), the embedded language Accelerate (regular data parallelism for GPUs), and Data Parallel Haskell (nested data parallelism for multicore CPUs).
Both video and slides are available, so check it out!

Word of the Month

The word of the month is dataflow as in dataflow parallelism. (And if you've just seen Manuel's talk, not to be confused with data parallelism). With dataflow, we will be wrapping up our now four-part series on parallel programming in Haskell. In recent words of the month, we've been looking at various approaches for parallelising your way to faster code. We started with the parallel arrays library Repa, then moved on to the Haskell par and pseq primitives, and built from there to get Strategies. Now we'll be looking at monad-par, a library for dataflow parallelism.

First, the bigger picture: why do we have another way of doing parallel programming in Haskell when we already have parallel arrays and Strategies? Part of the answer is parallelism is still a research topic, with new ideas coming out from time to time and some old ideas slowly withering away. For another part of the answer, it may help to think in terms of a trade-off between implicitness and performance, in other words, between Easy and Fast.

Degrees of implicitness

Not all problems are subject to the trade-off in the same way, and it's not always clear to what extent they are. As a rough sketch, problems that require applying the same operation on a large number of simple object can get fast performance from a parallel arrays library. This is a form of data parallelism. If the problem does not lend itself to data parallelism, the next port of call is likely Strategies. If Strategies are not adequate for the desired speedups, the problem may require an even more explicit approach.

Until recently, this meant “rolling your own” parallelism by using Concurrent Haskell: forking off threads, assigning tasks to them, and communicating finished work between them. Using concurrency for DIY parallelism is risky. It means venturing into IO, giving up determinism, exposing ourselves to all manner of side-effects, not to mention subtle concurrency bugs like deadlock. As Haskellers, we should demand better: safer, more predictable and easier to reason about. We want to have explicit fine-grained control without all the nasty side-effects. And now we can have it!

The latest addition to the parallel Haskell quiver is the monad-par library. Programming in monad-par looks a lot like programming in Concurrent Haskell:

data Par a
instance Monad Par
runPar :: Par a -> a     
fork :: Par () -> Par ()

data IVar a
new :: Par (IVar a)
put :: NFData a => IVar a -> a -> Par ()
get :: IVar a -> Par a

Instead of using forkIO, we use fork to create threads (not sparks!) and instead of using MVar's to communicate, we use IVar's. MVar and IVar behave in somewhat similar ways, in that any attempt to read from an IVar will wait until it has been filled. But this is also where differences emerge. Unlike their concurrent counterparts IVar may only be written to once; subsequent writes result in an error. This aligns with our desire for determinism. We never have to worry about the IVar changing values over time. Moreover, the Par monad does not allow for any IO (feature! not a bug) and computations in the Par monad can always be extracted with a simple runPar.

But this doesn't really tell us all that much about how to use this library. For now we come back to our word dataflow. The dataflow model treats program as a little black box, an input goes in, an output comes out, and what happens in between is immaterial:

Degrees of implicitness

The program is represented as a directed graph (a dataflow network), with each node representing an individual operation (black boxes in their own right), and connections between the nodes expressing dependencies. In the graph above, g and h depend on the output of f (the same output - there is only one) and in turn, i depends on the output of both g and h.

Dataflow network

Dataflow networks are handy thinking tools because one of the things that makes it hard to get fast parallel code is the inherent sequentially that dependencies introduce. A dataflow network nicely captures which parts of the program are parallel (g and h are independent of each other) and which parts are sequential (g depends on f). This is a sort of parallelism topline; we can't go any faster than what this network allows, but at the very least we should take the shape of the network into account so we don't make matters worse and find ourselves waiting a lot more on dependencies than is strictly necessary.

Using the Par monad essentially consists in translating dataflow networks into code.

Suppose we have functions

f :: In -> A
g :: A  -> B
h :: A  -> C
j :: B  -> C -> Out

For the graph above, we might say:

network :: IVar In -> Par Out
network inp = do
 [vf,vg,vh] <- sequence [new,new,new]
 fork $ do x <- get inp
           put vf (f x)
 fork $ do x <- get vf
           put vg (g x)
 fork $ do x <- get vf
           put vh (h x)
 x <- get vg
 y <- get vh
 return (j x y)

It's useful to note that here dataflow graphs need not be static; they may be dynamically generated to fit the shape of the input. To see this and also a more realistic taste of monad-par, check out Simon Marlow's Parallel and Concurrent Programming Haskell tutorial. It shows how a parallel type inferencer might be implemented, its dataflow graph arising from the set of bindings it receives as input.

Finally what about Strategies? How do we choose between the two? Both are IO-free and deterministic, both require some but not a whole lot of comfort with monads. Par may be easier to use correctly than Strategies because it is strict and forces you to explicitly delineate the dependencies in parallel computations. This also makes it more verbose on the other hand. Aside from being more verbose, Par can be a little bit “messier” than Strategies because you don't get the same separation between algorithm and parallelism. And since we're ultimately concerned with performance, it's worth knowing that the Par monad currently involves more overhead than the Eval monad from Strategies. The latter may be more appropriate at finer granularities.

Perhaps one way to go about this is to start off with Strategies as they have been around longer and may require less invasive modifications to your code. But if you find Strategies to be difficult, or you just can't get the performance you want, don't reach for those forkIO's yet. Give Par a try.

For more about monad-par, do visit Simon's tutorial and the API on hackage. To dig deeper, you may also find it instructive to read A Monad of Deterministic Parallelism (Marlow et al) presented in this year's Haskell Symposium. Related to this are his CUFP monad-par slides tutorial slides.

This concludes our tour of current approaches to Haskell parallelism. There is some new technology on the horizon which we may cover in a later digest, particularly Data Parallel Haskell and nested data parallelism (data parallelism not to be confused with dataflow parallelism), but for next month, let's take a little break from parallelism and look at something a little different.

Parallel GHC Project Update

The Parallel GHC Project is an MSR-funded project, run by Well-Typed, with the aim of demonstrating that parallel Haskell can be employed successfully in real world projects.

This has been a month for releasing software. We've recently put on Hackage updates to ThreadScope and some helper libraries

Duncan Coutts presented our recent work on ThreadScope at the Haskell Implementors Workshop in Tokyo (23 Sep). He talked about the new spark visualisation feature which show a graphical representation of spark creation and conversion statistics with a demo inspired by a program we are working on for Parallel GHC.

Meanwhile, we have also completed a pure Haskell implementation of the “Modified Additive Lagged Fibonacci” random number generator. This generator is attractive for use in Monte Carlo simulations because it is splittable and has good statistical quality, while providing high performance. The LFG implementation will be released on Hackage when it has undergone more extensive quality testing.

Blogs, Papers, and Packages

Mailing list discussions

Stack Overflow and Reddit

Help and Feedback

If you'd like to make an announcement in the next Haskell Parallel Digest, then get in touch with me, Eric Kow, at Please feel free to leave any comments and feedback!