Tutorial on Parallel Programming in Haskell

Friday, 07 October 2011, by Andres Löh.
Filed under parallel.

Today, I presented a tutorial at the Haskell in Leipzig meeting in Germany. The topic was parallel programming in Haskell, mainly using strategies and the parallel package.

The slides and the code that I have used are available for download. Feedback is welcome!


Parallel Haskell Digest 6

Thursday, 06 October 2011, by Eric Kow.
Filed under parallel, ph-digest.

This edition of the Parallel Haskell Digest kicks off with some warm feelings from BioHaskeller Ketil Malde

Blessing of the day goes to the authors of threadscope. In fact, let me look it up: Donnie Jones, Simon Marlow, and Satnam Singh. I had an old STM experiment lying around, but couldn't quite make it go faster than the single threaded version.

Today I installed threadscope, and a glance at the output sufficed to identify the problem, and now at least it's faster. Great tool! (But you all knew that, of course)

Thanks indeed to the original authors of ThreadScope! Ketil made good use of the original ThreadScope. Since the original release, the folks at Well-Typed have taken over development of ThreadScope (Andres, Duncan, and Mikolaj) and are working to extend it so that it becomes even easier to squash those pesky performance bugs. These extensions, much like this digest, are made possible by the Parallel GHC Project. We hope you'll love the results.

News

Data Parallelism in Haskell slides and video: Manuel Chakravarty came all the way from Sydney (that's 931 km!) to tell the Brisbane FP Group about work going on at UNSW on data parallel programming in Haskell. The talk covers a lot of ground. It's an hour and a half long and

[It] motivates the use of functional programming for parallel, and in particular, data parallel programming and explains the difference between regular and irregular (or nested) data parallelism. It also discusses the Repa library (regular data parallelism for multicore CPUs), the embedded language Accelerate (regular data parallelism for GPUs), and Data Parallel Haskell (nested data parallelism for multicore CPUs).
Both video and slides are available, so check it out!

Word of the Month

The word of the month is dataflow as in dataflow parallelism. (And if you've just seen Manuel's talk, not to be confused with data parallelism). With dataflow, we will be wrapping up our now four-part series on parallel programming in Haskell. In recent words of the month, we've been looking at various approaches for parallelising your way to faster code. We started with the parallel arrays library Repa, then moved on to the Haskell par and pseq primitives, and built from there to get Strategies. Now we'll be looking at monad-par, a library for dataflow parallelism.

First, the bigger picture: why do we have another way of doing parallel programming in Haskell when we already have parallel arrays and Strategies? Part of the answer is parallelism is still a research topic, with new ideas coming out from time to time and some old ideas slowly withering away. For another part of the answer, it may help to think in terms of a trade-off between implicitness and performance, in other words, between Easy and Fast.

Degrees of implicitness

Not all problems are subject to the trade-off in the same way, and it's not always clear to what extent they are. As a rough sketch, problems that require applying the same operation on a large number of simple object can get fast performance from a parallel arrays library. This is a form of data parallelism. If the problem does not lend itself to data parallelism, the next port of call is likely Strategies. If Strategies are not adequate for the desired speedups, the problem may require an even more explicit approach.

Until recently, this meant “rolling your own” parallelism by using Concurrent Haskell: forking off threads, assigning tasks to them, and communicating finished work between them. Using concurrency for DIY parallelism is risky. It means venturing into IO, giving up determinism, exposing ourselves to all manner of side-effects, not to mention subtle concurrency bugs like deadlock. As Haskellers, we should demand better: safer, more predictable and easier to reason about. We want to have explicit fine-grained control without all the nasty side-effects. And now we can have it!

The latest addition to the parallel Haskell quiver is the monad-par library. Programming in monad-par looks a lot like programming in Concurrent Haskell:

data Par a
instance Monad Par
 
runPar :: Par a -> a     
fork :: Par () -> Par ()

data IVar a
 
new :: Par (IVar a)
put :: NFData a => IVar a -> a -> Par ()
get :: IVar a -> Par a

Instead of using forkIO, we use fork to create threads (not sparks!) and instead of using MVar's to communicate, we use IVar's. MVar and IVar behave in somewhat similar ways, in that any attempt to read from an IVar will wait until it has been filled. But this is also where differences emerge. Unlike their concurrent counterparts IVar may only be written to once; subsequent writes result in an error. This aligns with our desire for determinism. We never have to worry about the IVar changing values over time. Moreover, the Par monad does not allow for any IO (feature! not a bug) and computations in the Par monad can always be extracted with a simple runPar.

But this doesn't really tell us all that much about how to use this library. For now we come back to our word dataflow. The dataflow model treats program as a little black box, an input goes in, an output comes out, and what happens in between is immaterial:

Degrees of implicitness

The program is represented as a directed graph (a dataflow network), with each node representing an individual operation (black boxes in their own right), and connections between the nodes expressing dependencies. In the graph above, g and h depend on the output of f (the same output - there is only one) and in turn, i depends on the output of both g and h.

Dataflow network

Dataflow networks are handy thinking tools because one of the things that makes it hard to get fast parallel code is the inherent sequentially that dependencies introduce. A dataflow network nicely captures which parts of the program are parallel (g and h are independent of each other) and which parts are sequential (g depends on f). This is a sort of parallelism topline; we can't go any faster than what this network allows, but at the very least we should take the shape of the network into account so we don't make matters worse and find ourselves waiting a lot more on dependencies than is strictly necessary.

Using the Par monad essentially consists in translating dataflow networks into code.

Suppose we have functions

f :: In -> A
g :: A  -> B
h :: A  -> C
j :: B  -> C -> Out

For the graph above, we might say:

network :: IVar In -> Par Out
network inp = do
 [vf,vg,vh] <- sequence [new,new,new]
 
 fork $ do x <- get inp
           put vf (f x)
  
 fork $ do x <- get vf
           put vg (g x)
 
 fork $ do x <- get vf
           put vh (h x)
 
 x <- get vg
 y <- get vh
 return (j x y)

It's useful to note that here dataflow graphs need not be static; they may be dynamically generated to fit the shape of the input. To see this and also a more realistic taste of monad-par, check out Simon Marlow's Parallel and Concurrent Programming Haskell tutorial. It shows how a parallel type inferencer might be implemented, its dataflow graph arising from the set of bindings it receives as input.

Finally what about Strategies? How do we choose between the two? Both are IO-free and deterministic, both require some but not a whole lot of comfort with monads. Par may be easier to use correctly than Strategies because it is strict and forces you to explicitly delineate the dependencies in parallel computations. This also makes it more verbose on the other hand. Aside from being more verbose, Par can be a little bit “messier” than Strategies because you don't get the same separation between algorithm and parallelism. And since we're ultimately concerned with performance, it's worth knowing that the Par monad currently involves more overhead than the Eval monad from Strategies. The latter may be more appropriate at finer granularities.

Perhaps one way to go about this is to start off with Strategies as they have been around longer and may require less invasive modifications to your code. But if you find Strategies to be difficult, or you just can't get the performance you want, don't reach for those forkIO's yet. Give Par a try.

For more about monad-par, do visit Simon's tutorial and the API on hackage. To dig deeper, you may also find it instructive to read A Monad of Deterministic Parallelism (Marlow et al) presented in this year's Haskell Symposium. Related to this are his CUFP monad-par slides tutorial slides.

This concludes our tour of current approaches to Haskell parallelism. There is some new technology on the horizon which we may cover in a later digest, particularly Data Parallel Haskell and nested data parallelism (data parallelism not to be confused with dataflow parallelism), but for next month, let's take a little break from parallelism and look at something a little different.

Parallel GHC Project Update

The Parallel GHC Project is an MSR-funded project, run by Well-Typed, with the aim of demonstrating that parallel Haskell can be employed successfully in real world projects.

This has been a month for releasing software. We've recently put on Hackage updates to ThreadScope and some helper libraries

Duncan Coutts presented our recent work on ThreadScope at the Haskell Implementors Workshop in Tokyo (23 Sep). He talked about the new spark visualisation feature which show a graphical representation of spark creation and conversion statistics with a demo inspired by a program we are working on for Parallel GHC.

Meanwhile, we have also completed a pure Haskell implementation of the “Modified Additive Lagged Fibonacci” random number generator. This generator is attractive for use in Monte Carlo simulations because it is splittable and has good statistical quality, while providing high performance. The LFG implementation will be released on Hackage when it has undergone more extensive quality testing.

Blogs, Papers, and Packages

Mailing list discussions

Stack Overflow and Reddit

Help and Feedback

If you'd like to make an announcement in the next Haskell Parallel Digest, then get in touch with me, Eric Kow, at parallel@well-typed.com. Please feel free to leave any comments and feedback!


Upcoming Events

Saturday, 10 September 2011, by Andres Löh.
Filed under well-typed.

Here's two upcoming events that Well-Typed will be involved in:

Hal6 Haskell-Workshop, 7 October 2011, Leipzig, Germany

This is a one-day workshop with various talks and tutorials. I will be giving a tutorial (90 minutes) on parallel programming in Haskell. In addition, Kevin Hammond will be talking about parallel refactorings.

Most of the talks during the day will be in German. So if you speak German, you might want to consider coming to Leipzig!

(More info)

FP Day, 14 October 2011, Cambridge, UK

This is a one-day training event for Haskell, F#, and Clojure, aimed mainly at participants from industry. Simon Peyton Jones and Don Syme are keynote speakers. Well-Typed is running a hands-on introduction to Haskell (180 minutes). There are a limited number of tickets available, and registration is open. It would be great to see you there!

(More info)


Parallel Haskell Digest 5

Wednesday, 31 August 2011, by Eric Kow.
Filed under parallel, ph-digest.

Hello Haskellers!

Eric here, reprising my role as Parallel Haskell Digester. Many thanks to Nick for good stewardship of the digest and nice new directions for the future. This month we have Erlang PhDs (and a postdoc), new partners, two Monad Reader articles in progress and some strategies. As usual, this digest is made possible by the Parallel GHC Project.

News

Scalable Erlang PostDoc and 2 PhD Studentships (8 Aug)

What, Erlang?! Yes, you are reading the right digest. If you're generally into parallelism and concurrency in functional programming languages, you may be especially interested to know about this announcement from Phil Trinder.

RELEASE project - A High-level Paradigm for Reliable Large-scale Server Software - is funded by the EU Framework 7 programme for 36 months from October 2011. Its aim is to scale the radical concurrency-oriented programming paradigm to build reliable general-purpose software, such as server-based systems, on massively parallel machines (100 000 cores).

Word of the Month

Last month, we had par and pseq as our twin words of the month. Let's pursue this little train of parallel thought; our next word the month is strategy. Strategies have been around since the 1993 paper Algorithm + Strategy = Parallelism; that's before even we started using monads in Haskell! They've recently been revamped in Simon Marlow's Seq no More paper, and it's this version of strategies that we'll be exploring here.

Strategies are built on top of the par and pseq primitives we saw in the last digest. They provide a nice way to express the often complicated logic we need to make the best use of parallelism in our code. Use of strategies can also help to make parallel code easier to read and maintain because they allow us to more cleanly separate the core logic from our code that which pertains to our use of parallelism.

Before delving into strategies, let's take a small notational detour by introducing the Eval monad. Suppose we wanted a parallel version of the map function, something that would apply a function to each item of a list. Using the par and pseq from the last digest, we might express this function

parMap :: (a -> b) -> [a] -> [b]
parMap f [] = []
parMap f (a:as) = b `par` bs `pseq` (b:bs)
 where
  b  = f a
  bs = parMap f as

If we look carefully at the code we can observe that there is something inherently sequential in the way we have expressed this parallel computation: first spark off f a then recurse to the tail of the list, and finally cons.

The Eval monad builds off the insight that expressing parallelism is fundamentally (perhaps counterintuitively) about ordering things. Monads are well-suited for expressing ordering relationships, and so they have been been pressed to work for expressing parallel computation as well.

data Eval a
instance Monad Eval

runEval :: Eval a -> a
rpar :: a -> Eval a
rseq :: a -> Eval a

Eval is just a strict identity monad, with rpar and rseq as counterparts to par and pseq. We use Eval to compose sequences of parallel computation, which we extract the results of by using the runEval function. If you're not familiar with monads, you can get away with just treating Eval as new notation, or rather, borrowed notation, the same that we use IO, parser combinator libraries, QuickCheck and a plethora of other useful monads. It's worth noting also that despite appearances, we are still in purely functional territory — no IO here! — with the notion of sequencing being limited to controlling parallelism and evaluation depth.

To make use of Eval for our parMap function, we could write a version like the below. It introduces a change of type, from returning [b] to Eval [b]. In the general case, we could just use the runEval function to get our result back, but we are not baking it into parMap because we would typically want to use then function within a greater Eval context anyway.

parMap :: (a -> b) -> [a] -> Eval [b]
parMap f [] = return []
parMap f (a:as) = do
  b  <- rpar  (f a)
  bs <- parMap f as
  return (b:bs)

As before, this function captures the basic idea of its sequential counterpart map: apply function, recurse to tail, cons new head to new tail. This is a passable parallel map, but there are still two things which are unfortunate about it. First, we have repeated the implementation of map, not a big deal for such a small function but a potential maintenance problem for more complex code. Second, we have only captured one sort of parallel evaluation, firing off sparks for all the cons cells, whereas in practice getting parallelism right requires some often careful tuning to get the right level of granularity and account for dependencies. For instance, what if instead of doing each cell in parallel, we wanted to bunch them up into chunks so as to minimise the coordination overhead? What we need is a way to express ideas about running code in parallel (such as “break into chunks and run each chunk simultaneously"), preferably avoid duplicating existing code or otherwise making it more complicated than it needs to be.

This is where strategies come in. A strategy is just a function that turns some data into an Eval sequence. We've already encountered two strategies above, rpar which sparks off a parallel evaluation, and rseq which evaluates its argument to weak head normal form. Many more strategies possible particularly when they are custom designed for specific data types. Also, if we have a strategy, we can imagine wanting to use it. This we capture with using, which applies the strategy function and runs the resulting sequence. Parallel evaluation aside, you can almost pretty much think of x `using` s as being equivalent to as id x, although you'll want to watch out for a couple of caveats described the tutorial Parallel and Concurrent Programming in Haskell.

Strategy a = a -> Eval a

using :: a -> Strategy a -> a
x `using` s = runEval (s x)

Now that we've put a name to the idea of building Eval sequences, we can think about trying to generalise our parMap. One way would be to write a higher-order strategy. This parList function is almost identical to parMap, except that instead of applying any function to a list we limit ourselves to just applying some strategy on it.

parList :: Strategy a -> Strategy [a]
parList strat []     = return []
parList strat (x:xs) = do
  x'  <- rpar (x `using` strat)
  xs' <- parList xs strat
  return (x':xs')

We can then define parMap by parameterising parList with rseq to get a parallel list strategy which we then apply to map f xs.

parMap :: (a -> b) -> [a] -> Eval [b]
parMap f xs = map f xs `using` parList rseq

We have now achieved two things. First we've improved the modularity of our code by separating algorithm (map f xs) from coordination (parList rseq). Notice how we are now reusing map instead of reimplementing it, and notice as well how we now have a reusable parList that we can use apply to any situation that involves a list. Second, by isolating the algorithm from the coordination code we have given ourselves a lot more flexibility to switch strategies. To bunch our list up into coarser-grained chunks, for example, we could just swap out parList with parListChunk

parMap :: (a -> b) -> [a] -> Eval [b]
parMap f xs = map f xs `using` parListChunk 100 rseq

In this word of the month, we have very lightly touched on the idea of strategies as a compositional way of expressing parallel coordination and improving the modularity of our code. To make use of strategies, we basically need to be aware of two or three things:

1. How to apply a strategy: foo `using` s evaluates the value foo with strategy s

2. Useful strategies from the library: the basics are r0 which does nothing, rseq which evaluates its argument to weak head normal form, rdeepseq which evaluates it all the way down to normal form, and rpar which sparks the value for parallel evaluation. Be sure to check out Control.Parallel.Strategies for more sophisticated strategies such as parListChunk, which divides the list into chunks and applies a strategy to each one of the chunks in parallel.

3. (Optionally) How to build strategies, the simplest way being to use the dot function to compose two strategies sequentially. If you implement your own strategies instead of combining those from the library, you may want to have a look at the Seq no More for a little safety note.

If you want to know more about strategies, the first place to look is probably the tutorial Parallel and Concurrent Programming in Haskell (there is a nice example in there, implementing a parallel K-means algorithm), followed by the Control.Parallel.Strategies API. You could also just dive in and give it a try on some of your own code. It's sometimes just a matter of grabbing a strategy and using it.

Parallel GHC Project Update

The Parallel GHC Project is an MSR-funded project to push forward the use of parallel Haskell in the real world. The aim is to demonstrate that parallel Haskell can be employed successfully in industrial projects.

Speaking of industrial projects, our recent search for a new project partner has been successful! In fact, we will be welcoming two new partners to the project, the research and development group of Spanish telecoms company Telefónica, and VETT a UK-based payment processing company. We are excited to be working with the teams at Telefónica I+D and VETT. There may be some Cloud Haskell in our future; stay tuned for more details about the respective partner projects.

Also coming up are a couple of Monad Reader articles featuring work from the Parallel GHC project.

Kazu Yamamoto has been writing an article for the upcoming Monad Reader special edition on parallelism and concurrency. He'll be revisiting Mighttpd (pronounced "Mighty"), a high-performance web server written in Haskell in late 2009. Since Mighttpd came out the Haskell web programming landscape has evolved considerably, with the release of GHC 7 and development of several web application frameworks. The new mighttpd takes advantage of GHC 7's new IO manager as well as the Yesod Web Application Interface (WAI) and the HTTP engine Warp. Do check out his article when it comes out for proof that "we can implement a high performance web server in Haskell, which is comparable to highly tuned web servers written in C", or if you can't wait, try his most recent slides on the topic.

Bernie Pope and Dmitry Astapov have been working on an article about the Haskell MPI binding developed in the context of this project. You can find out more about the binding on its Hackage and GitHub pages.

Blogs, Papers, and Packages

Mailing list discussions

All you need to know about asynchronous exceptions is: don't use them! They're too difficult to reason about, in terms of failure modes and such. Use TVars or MVars instead.

Stack Overflow and Reddit

Help and Feedback

If you'd like to make an announcement in the next Haskell Parallel Digest, then get in touch with me, Eric Kow, at parallel@well-typed.com. Please feel free to leave any comments and feedback!


Hacking on the hackage-server at the Haskell Hackathon

Monday, 25 July 2011, by Duncan Coutts.
Filed under community.

Several people have been wondering recently what is up with the new hackage-server implementation. At the weekend at HacPDX-II we made quite a bit of progress in fixing things and getting more people involved.

The HacPDX-II hackathon took place this last weekend in Portland Oregon. Thomas DuBuisson contacted me a week or two before to see if we could make the new hackage-server a focus for the hackathon and get several people involved. This worked out quite well, we got several people who attended HacPDX-II helping out. Also, just the fact that Thomas declared that this project would be the focus meant that it energised some of the rest of us who are interested in the hackage-server but who could not attend HacPDX-II in person, including Matthew Gruen and Jeremy Shaw. So before getting into the technical details I'd really like to thank Thomas for organising things and to Matthew and Jeremy for giving up much of their weekends to help us out.

Jeremy Shaw, of Happstack fame amongst other things, sent in two excellent patches to shift hackage-server to using newer infrastructure. One patch updated everything to using the latest Happstack 6.x version, and the other converted from using happstack-data to the newer acid-state package.

David Lazar and Thomas DuBuisson got stuck in, tried things out and found and fixed various bugs. In particular, David fixed some things to do with the interface that package maintainers can use to mark packages as deprecated, and Thomas fixed up some stuff to do with package tags.

Matthew Gruen, who did his GSoC project with me on the hackage-server last summer, spent much of the weekend hacking, debugging and answering questions from the rest of us. In particular he and I spent quite a while thinking about HTTP authentication issues.

Originally, last summer, we had the crazy idea that we could offer either HTTP Basic or Digest authentication based on the user account. The reason we wanted to do this is because the current ~800 user accounts on the central hackage.haskell.org server all use HTTP basic auth (stored in Apache's standard htpasswd database format) but really we'd like to migrate to using HTTP digest auth because it's more secure than basic auth. Unfortunately, due to the way HTTP authentication works, having some user accounts using basic and some using digest authentication just will not work. That means we cannot do a smooth transition from basic to digest for the existing user accounts. So we had a think about what system to go for and how to clean up the mess in the current code that tries to offer different auth systems to different users.

For the moment what we've decided is to go for digest authentication exclusively and to think about how to do migration separately. This is partly so that we can get something working now, because several people want to use the hackage-server now, it's not just the central hackage.haskell.org that we have to think about.

One option for migration is to import all the old accounts and have a special re-registration page for existing users. That would authenticate them using their old passwords and then enable their new accounts with the new digest authentication system. The point is it would not be transparent, but it should not be too much of a bother.

Going for digest authentication means that we had to fix a few bugs in the client side HTTP implementation. Matthew tracked down the problem and I've sent the patches off to the maintainer of the HTTP package that cabal-install uses.

I spent a few days before the hackathon working on the hackage-mirror client. This is a program included in the hackage-server package that can synchronise packages between servers. In particular it can copy packages from the old hackage.haskell.org to your own hackage-server instance. Along with the HTTP digest work, on the final day of the hackathon I managed to get it to sync packages from hackage.haskell.org to my hackage-server running on localhost and to do the uploads authenticated using digest auth. So that was quite a good conclusion to things.

One thing that worked quite well for the hackathon was that we used a common darcs repository and gave everyone write access. We all just worked away and darcs pushed and pulled using the common repository. This is a model where darcs works really well. Now that the hackathon is over, I'll move the changes into the main repository.

So there's still plenty to do, but I think we've shown that the new hackage-server implementation is nearly ready to be used, for new server instances at least. I'm going to be doing more work on the mirroring, because a client of Well-Typed wants to use it, so I expect to post an update on that sometime soon.


Parallel Haskell Digest 4

Friday, 22 July 2011, by Nicolas Wu.
Filed under parallel, ph-digest.

Hello Haskellers!

It's time for the fourth edition of the Parallel Haskell Digest, bringing you a summary of the news and discussions of parallelism and concurrency in Haskell. The digest is made possible by the Parallel GHC Project.

News

Word of the Month

In this issue we have two words of the month: par and pseq.

Haskell provides two annotations, par and pseq, that allow the programmer to give hints to the compiler about where there are opportunities to exploit parallelism. While these annotations are not typically used directly by programmers, it is useful to understand them because they are the underlying mechanism for higher level tools such as "parallel strategies" and the Eval monad.

The two combinators have the following signatures:

par  :: a -> b -> b
pseq :: a -> b -> b

While their signatures are the same, they are used to annotate different things. The par combinator hints to the Haskell implementation that it might be beneficial to evaluate the first argument in parallel. However, since Haskell does not impose an evaluation order, we also need pseq, which instructs the compiler to ensure that its first argument is evaluated before the second.

Let's take a look at an example inspired by Parallel Performance Tuning for Haskell by Jones, Marlow, and Singh, which illustrates this nicely. Suppose you're interested in the sum of two expensive computations. To keep things simple, we'll use a naive implementation of fib (the point here isn't to have an efficient computation, I'm trying to show an expensive one):

fib :: Int -> Int
fib 0 = 0
fib 1 = 1
fib n = fib (n-1) + fib (n-2)

For a second expensive computation, we'll calculate the negafibonacci number, which works on negative numbers:

negafib :: Int -> Int
negafib 0 = 0
negafib (-1) = 1
negafib n = nfib (n+2) - nfib (n+1)

The sum of these two can be calculated by the following sequential function:

sumfib :: Int -> Int
sumfib n = x + y
 where
  x = fib n
  y = negafib (-n)

There's obvious room for improvement here when we have two cores: we simply calculate the expensive computations on separate cores. Annotating the code above is a fairly simple process. We first use par to annotate the fact that x must be calculated in parallel with the rest of the computation. Second, we ensure that y gets computed before x + y by annotating with pseq. The result is as follows:

import Control.Parallel (par, pseq)

psumfib :: Int -> Int
psumfib n = x `par` (y `pseq` x + y)
 where
  x = fib n
  y = negafib (-n)

We can write a simple program that outputs the result of running this computation with the following main function:

main :: IO ()
main = putStrLn . show . sumfib $ 37

We should hope for the parallel version to work twice as fast, since the two expensive functions should take about the same time to compute. Here's the output of compiling and running the sequential version of the program:

$ ghc -rtsopts Main.hs
[1 of 1] Compiling Main             ( Main.hs, Main.o )
Linking Main ...
$ time ./Main

real        0m6.113s
user        0m6.090s
sys         0m0.010s

Now replacing sumfib with psumfib produces the following results:

$ ghc -rtsopts -threaded Main.hs
[1 of 1] Compiling Main             ( Main.hs, Main.o )
Linking Main ...
$ time ./Main +RTS -N2

real        0m3.402s
user        0m6.660s
sys         0m0.040s

This is obviously a very trivial example, but the point is that annotations provide a powerful way of expressing parallel algorithms. It's interesting to note that for this simple program, the timings for the parallel version on a single core performs as well as the single core version compiled without threading.

While annotations are a simple mechanism for expressing where parallelism might be exploited in a program, beware that there are a number of pitfalls to using this technique: all that glitters is not gold! The main difficulty in using par and pseq directly is that you really need to have a clear understanding of evaluation order. In particular, unless you understand what laziness is doing to the evaluation order, then you might find that the computations you're sparking off with par might not occur when you expected they should.

Then there are all the general difficulties that you face with parallel programming, like getting the right granularity of work to do in parallel. Using par is quite lightweight so can be used to exploit reasonably fine grained parallelism, but it is certainly not free. Finally, parallel performance suffers when there are too many garbage collections, so keeping this to a minimum by either using more efficient data structures or increasing available memory, becomes an important factor.

Nevertheless, it's well worth having a play with par and pseq. The next step after that is to look at parallel strategies. Strategies is a layer of abstraction on top of par and pseq. You might like to to read Seq no more: Better Strategies for Parallel Haskell, by Simon Marlow et al. which describes Strategies and the Eval monad. It's all available in the parallel library on Hackage. More recently, the Par monad has also been introduced as yet another way of describing parallel evaluations. These key topics will no doubt feature in a future word of the month, so stay tuned!

Parallel GHC Project Update

The Parallel GHC Project is an MSR-funded project to push the real-world use of parallel Haskell. The aim is to demonstrate that parallel Haskell can be employed successfully in industrial projects. This month we're having a guest column from the team over at Los Alamos National Laboratory, one of the partners involved in the project (you can see the full details in report LA-UR 11-0341). They have been working on writing Monte Carlo physics simulations in Haskell, which has given them high levels of parallelism, along with useful tools for abstraction. So, without further ado, over to Michael Buksas from LANL:

Our goal is to build highly efficient Monte Carlo physics simulations using parallel Haskell. We're focusing on SMP performance though some combination of explicit threading and pure parallel annotations.

The Monte Carlo approach involves randomly sampling the space of solutions to generate data which contributes to the solution. For these physical problems, our samples are the tracks of particles as they move through space, interacting with a physical material as they go. Data collected from each particle trajectory is then combined into information needed to compute the solution. For example, the detailed information about the particle's interaction with the material is collected into a collective effect on the material properties.

To date, we have a code base which includes two approaches to the problem. One is a specific and parallel-tuned application code targeting relativistic neutrino transport in stellar atmospheres. The other is building a more general environment for creating specific applications, such as this one.

We recently presented to our colleagues in LANL some preliminary results on the parallel performance of the targeted application code.

To give a sense of the approach to parallelization in this code, consider these high-level functions from an earlier serial version:

main :: IO ()
main = do
  (n, rest) <- parseCL
  let tally = runMany infMesh simpleMat n
  writeTally &quot;tally&quot; tally

runMany :: Mesh -> Material -> Word32 -> RNG -> Tally
runMany msh mat ntot rng = let
  ps = genParticles ntot msh rng
  tallies = map (runParticle msh mat) $ ps
  in foldl' merge emptyTally tallies

And consider the following changes for the parallel version:

main :: IO ()
main = do
  (n,sz) <- parseCL
  let tally = feed infMesh simpleMat n sz prand
  writeTally &quot;tally&quot; tally

feed :: Mesh -> Material -> Word32 -> Word32 -> RNG -> Tally
feed msh mat ntot chunkSz rng
    | ntot <= chunkSz = runMany msh mat ntot rng
    | otherwise       = t `par` (ts `pseq` (merge t ts))
    where t  = runMany msh mat chunkSz g1
          ts = feed msh mat (ntot - chunkSz) chunkSz g2
          (g1,g2) = split g

We've wrapped function runMany in feed, which partitions the collection of generated particles into groups of size chunkSz, and issues these particles to runMany in parallel.

With this simple change, we seeing upwards of 80% utilization of up to 8 cores, for a performance improvement greater than a factor of 6. We believe that performance can be further improved with different strategies for breaking down the work, and looking for additional parallelization opportunities in the collection of results.

Our other branch of development is focused on finding useful abstractions and high-level functions to support programming a variety of Monte Carlo problems of this kind. We have identified a few such useful abstractions, and implemented them as type classes and type families.

For example, Space is a general term for the physical space and imposed symmetries in which we can perform a simulation. We express this as follows:

class Space s where
  type Position s  :: *
  type Direction s :: *
  stream    :: s -> Distance -> s
  position  :: s -> Position s
  direction :: s -> Direction s
  make      :: Position s -> Direction s -> s

and implement specific spaces, such as one with the symmetry of the unit sphere:

instance Space Spherical1D where
    type Position  Spherical1D = Radius
    type Direction Spherical1D = Normalized Vector2
    stream (Vector2 x y) (Distance d) = Vector2 (x+d) y
    position s  = Radius $ vmag s
    direction s = normalize s
    make (Radius pos) dir = pos *| (normalized_value dir)

This allows the specific space data types to be used in a variety of contexts. Using ordinary parametric polymorphism is also effective:

-- | Stream a single particle:
stream :: (p -> (e,p))   -- ^ Function to produce each step. Comes from a model.
          -> (e -> Bool) -- ^ Check for terminal events to stop streaming
          -> p           -- ^ Initial particle
          -> [(e, p)]    -- ^ Resulting list of events and particle states.
stream stepper continue p = next p
  where next p =
          let (e, p') = stepper p
          in  (e, p') : if continue e then next p' else []

The above is our high-level routine function for generating a history from a single particle, recorded as a list of (event, particle) pairs, where the event and particle data types are provided for each problem.

Blogs, Papers, and Packages

Mailing list discussions

Stack Overflow

Help and Feedback

If you'd like to make an announcement in the next Haskell Parallel Digest, then get in touch with me, Nicolas Wu, at parallel@well-typed.com. Please feel free to leave any comments and feedback!


Parallel Haskell Digest 3

Thursday, 16 June 2011, by Nicolas Wu.
Filed under parallel, ph-digest.

Hello Haskellers!

Welcome to the third edition of the Parallel Haskell Digest, bringing you news, discussions and tasters of parallelism and concurrency in Haskell. The digest is made possible by the Parallel GHC project. More news about how we're doing below.

This digest is brought to you by Eric Kow and Nicolas Wu. Nick will be taking over as maintainer of the Parallel Digest for the next few months.

It's tutorial season in the Parallel Haskell world, with Simon Marlow's Parallel and Concurrent Programming in Haskell, and Don Stewart's Numeric Haskell: A Repa Tutorial. The former gives a broad tour of parallelism and concurrency techniques in Haskell and the latter focuses on the Repa parallel array library. Both are very concrete and focus on real working code. Give them a try!

News

Word of the Month

The word of the month (well, phrase really!) for this digest is parallel arrays. Parallel array manipulation fits nicely into Haskell's arsenal of parallel processing techniques. In fact, you might have seen the Repa library, as mentioned above, in the news a while back. And now there's a new tutorial on the Haskell Wiki.
So what's all the fuss?

Parallel arrays manipulation is a particularly nice way of writing parallel programs: it's pure and it has the potential to scale very well to large numbers of CPUs. The main limitation is that not all programs can be written in this style. Parallel arrays are a way of doing data parallelism (as opposed to control parallelism).

Repa (REgular Parallel Arrays) is a Haskell library for high performance, regular, multi-dimensional parallel arrays. All numeric data is stored unboxed and functions written with the Repa combinators are automatically parallel. This means that you can write simple array code and get parallelism for free.

As an example, Repa has been used for real time edge detection using a Canny edge algorithm, which has resulted in performance comparable to the standard OpenCV library, an industry standard library of computer vision algorithms written in C and C++ (a single threaded Haskell implementation is about 4 times slower than the OpenCV implementation, but is on par when using 8 threads on large images).

Repa performing Canny edge detection

Repa performing Canny edge detection

While the Canny algorithm written with Repa doesn't quite match the speed of its procedural counterparts, it benefits from all of Haskell's built in support for purity and type safety, and painlessly scales to multiple cores. You can find more details on Ben Lippmeier's blog.

Parallel GHC project news

The Parallel GHC Project is an MSR-funded project to promote the real-world use of parallel Haskell. Part of this project involves effort by Well-Typed to provide tools for use by the general community.

Last week, Well-Typed were excited to announce that we have capacity to support another partner for the parallel project for at least another 12 months. So, if you have a project that involves scaling up to multiple cores, or that deals with some heavy duty concurrency, then we'd love to hear from you. In return for your time and commitment to the parallel project, we'll unleash our team of expert programmers to help build tools and provide support that will help you and the community towards making parallel Haskell even more accessible.

For more information on the Parallel GHC project, on the Parallel GHC Project wiki page

Blogs, papers and packages

Mailing list discussions

Stack Overflow

Help and feedback

Got something for the digest? Send us an email at . We're particularly interested in short parallelism/concurrency puzzles, and cool projects for featured code. Other comments and feedback (criticism and corrections especially!) would be welcome too.


Parallel GHC project: new opportunity for an organisation to participate

Wednesday, 08 June 2011, by Duncan Coutts.
Filed under parallel, well-typed.

GHC HQ and Well-Typed are pleased to announce a new opportunity for an organisation to take part in the Parallel GHC Project.

The project started in November 2010 with four industrial partners, and consulting and engineering support from Well-Typed. Each organisation is working on its own particular project making use of parallel Haskell. The overall goal is to demonstrate successful use of parallel Haskell and along the way to apply engineering effort to any problems with the tools that the partner organisations might run into.

We have capacity to support another partner organisation for the remaining duration of the project (at least another 12 months). Organisations do not need to contribute financially but should be prepared to make a significant commitment of their own time. Familiarity with Haskell would be helpful, but Haskell expertise is not needed. Partner organisations' choice of projects is similarly open-ended and could be based on anything from pre-existing code bases to green field endeavours.

We would welcome organisations interested in pure parallelism, concurrency and/or distributed Haskell. Presently, two of our partner organisations are using mainly pure parallelism and two are using concurrency. What would be especially interesting for us is to diversify this mix further by working with an organisation interested in making use of of distributed Haskell, in particular the work highlighted in the recent paper "Haskell for the Cloud" (pdf).

To help give an idea what participating in the Parallel GHC Project is like, here is what some of what our current partner organisations have to say:

The Parallel GHC Project has enabled us to make steady progress towards our goals. Well-typed has provided support in the form of best practice recommendations, general engagement with the project, and directly coding up key components.

I have been getting lots of help from Well-Typed, and enjoy our weekly meetings.

― Finlay Thompson, Dragonfly

My organization is now trying to implement highly concurrent Web servers. After GHC 7 was released we faced several critical bugs in the new IO manager and one guy at Well-Typed kindly fixed all the bugs. This has been a big benefit for our organization.

Another benefit is feedback/suggestions from Well-Typed. Well-Typed and our organization have meetings every other week and we report progress to each other. During the discussions, we can set the right direction to go in.

― Kazu Yamamoto, IIJ Innovation Institute Inc.

Well-Typed is coordinating the project, working directly with the participating organisations and the Simons at GHC HQ. If you think your organisation may be interested then get in touch with me, Duncan Coutts, via info@well-typed.com.


Parallel Haskell Digest 2

Wednesday, 11 May 2011, by Eric Kow.
Filed under parallel, ph-digest.

Welcome to the second edition of the Parallel Haskell Digest, bringing you news, discussions and tasters of parallelism and concurrency in Haskell.

The digest is made possible by the Parallel GHC project. More news about how we're doing below.

News

Word of the Month

This edition of the digest is brought to you by threads, threads and more threads. In the last digest, we took a look at Haskell sparks and particularly at how they differ from threads. Now let's delve a little bit deeper into threads.

A "thread of execution" is a programming abstraction that helps the programmer separate concerns within a program. Consider a web server serving many documents to many clients simultaneously. The programmer may wish to use threads, using one thread per client making it easier to manage concurrent action.

In the Haskell world, this programming abstraction is provided by Haskell threads. To implement Haskell threads, the GHC runtime system uses what is known as a M:N threading model, where many Haskell threads are scheduled across a much smaller number of operating system threads. This model has two key benefits:

Don Stewart illustrated this on StackOverflow with the following diagram, citing 2011 figures of about a handful of CPUs, a dozen or so OS threads and tens of thousands of Haskell threads (plus for those of us interested in pure parallelism, millions of sparks).

Thread illustration by Don Stewart

If you want to try generating some figures for yourself, have a look at this nofib benchmark utility. The benchmark tool creates a large "pipeline" of simultaneous Haskell threads, and tries to send a message through the pipeline, passing it from one thread to the next. On my laptop, I can create 10000 Haskell threads in 0.074 seconds and pump "Hello World" through them in 0.07 seconds. That works out to 7.4 microseconds per thread (0.7 microseconds for pumping). How about giving it a shot on your computer? Haskell threads are very lightweight.

For most concurrency needs, you can generally forget that operating system threads exist. Indeed, when the documentation for modules like Control.Concurrent refers to "threads", or when Haskell hackers discuss threads in Haskell code it's a good bet that they're referring to Haskell threads rather than OS threads.

That said, there are a few occasions where you may want to be aware about OS threads, in order of importance, if you

Doing any of these things requires that you use GHC's multi-threaded runtime system. GHC currently uses a single-threaded runtime system by default, but until this changes, you will have to explicitly enable the multi-threaded one by linking your program with the flag -threaded. With the multi-threaded runtime system all Haskell threads are scheduled across multiple OS threads as opposed to being interleaved on a single one. This allows for parallelism in the first case, for concurrent foreign code in the second case.

For the first case, where you are specifically interested in parallelism, as well as enabling the multi-threaded RTS you also need to tell the runtime system how much parallelism to try to use. Specifically, you have to say how many OS threads will be used for scheduling your program's Haskell threads. You can do this by passing +RTS -N when you run your program (with GHC 6.12.1 and higher, the bare -N flag will cause the RTS to use a sensible default based on the number of CPU cores on your machine).

If you are only concerned about making foreign calls, just enabling the multi-threaded RTS is enough. The issue with the single-threaded runtime system is that Haskell threads that make foreign calls to the operating system or C libraries will block all other Haskell threads. This means that if a foreign call should take a long time, or worse, if it should block in its own right, all other Haskell threads will be stuck. With the multi-threaded runtime system, Haskell threads that make foreign calls do not block other Haskell threads, because they can be handed over to another OS thread while the one making the foreign call is churning away.

But be careful! Concurrency with foreign calls can be a tricky business. If you are only using one OS thread at a time, you are effectively shielded from having to worry about concurrency issues in foreign code (by not being able to run foreign code concurrently). Enabling multi-threaded mode comes with extra responsibility of making sure any foreign libraries you use are thread-safe, or that you have adequate mechanisms to deal with the ones that are not. Thread-safety isn't an issue with Haskell-only code because shared state is always managed with locks or atomicity. But when concurrency and foreign calls mix, you will need to take care.

There's another issue to watch out for when mixing foreign calls with multiple OS threads. The nice thing thing about the M:N threading model in Haskell is that the GHC RTS scheduler will automatically move Haskell threads between OS thread to achieve a good balance of work. But this introduces a new problem for foreign libraries that use thread-local state: the Haskell thread may calling the library from any number of OS threads, so if the foreign library uses OS-thread-local state then this state could be different from one call to the next... what a mess! To solve this problem we have to use a feature called "bound threads". Bound threads are Haskell threads that are bound to a single OS thread; they are not moved around by the scheduler. Bound threads are more expensive than unbound ones because they tie up a whole OS thread, but they are necessary for working with certain foreign libraries like OpenGL that make essential use of thread local state.

Summing up threads in Haskell:

Thanks to Paul Bone for the paragraph presenting threads as a programming abstraction and also to Andres Löh and Duncan Coutts from Well-Typed for extensive help revising this word of the month!

Parallel GHC project news

The Parallel GHC Project is an MSR-funded project to push the real-world use of parallel Haskell. Part of this project involves effort by Well-Typed to provide tools for use by the general community:

We have begun work on making the "Modified Additive Lagged Fibonacci" and perhaps other random number generators from the SPRNG library available in Haskell. As a first step to the Haskell SPRNG reimplementation, we have developed a binding to the existing C++ library. The binding will serve as a reference implementation to test against, but it also ready to be used now.

To complement our work on extending GHC eventlog and Threadscope to support multi-process or distributed Haskell systems, we have begun work on developing new visualisations for ThreadScope, including the rate of parallel spark creation, and the distribution of spark evaluation times.

Meanwhile, work continues on our project partners' side. We hope to be say more about it in the next edition of the digest :-)

For more information on the Parallel GHC project, see the Haskell wiki page

Talks

Blogs, papers and packages

Parallel-Haskell and Haskell Cafe

Stack Overflow

Help and feedback

Got something for the digest? Send me an email at eric@well-typed.com. I'm particularly interested in short parallelism/concurrency puzzles, cool projects for featured code. Other comments and feedback (criticism and corrections especially!) would be welcome too.


Parallel Haskell Digest 1

Thursday, 31 March 2011, by Eric Kow.
Filed under parallel, ph-digest.

I'd like to introduce the first Parallel Haskell Digest, a newsletter aiming to show off all the work that's going on using parallelism and concurrency in the Haskell community.

The digest is a bit of an experiment. For the community in general, I hope to help you catch up with a monthly recap of news, interesting blog posts and discussions about parallelism in Haskell. For people (like me) who are new to parallelism and concurrency in Haskell, or maybe just have a passing interest, I hope to give you little nibbles, with regular features like the Word of Month, Featured Code and Parallel Puzzlers.

Finally, as a bit of disclosure, I'm writing this digest as part of my work to promote the Parallel GHC Project. So don't be surprised if I give it a little special emphasis :-)

Word of the Month

This very first edition of the PH digest is brought to you by the word spark. You may have heard of sparks and threads in the same sentence. What's the difference?

A Haskell thread is a thread of execution for IO code. Multiple Haskell threads can execute IO code concurrently and they can communicate using shared mutable variables and channels.

Sparks are specific to parallel Haskell. Abstractly, a spark is a pure computation which may be evaluated in parallel. Sparks are introduced with the par combinator; the expression (x par y) "sparks off" x, telling the runtime that it may evaluate the value of x in parallel to other work. Whether or not a spark is evaluated in parallel with other computations, or other Haskell IO threads, depends on what your hardware supports and on how your program is written. Sparks are put in a work queue and when a CPU core is idle, it can execute a spark by taking one from the work queue and evaluating it.

On a multi-core machine, both threads and sparks can be used to achieve parallelism. Threads give you concurrent, non-deterministic parallelism, while sparks give you pure deterministic parallelism.

Haskell threads are ideal for applications like network servers where you need to do lots of I/O and using concurrency fits the nature of the problem. Sparks are ideal for speeding up pure calculations where adding non-deterministic concurrency would just make things more complicated.

Parallel GHC project news

The Parallel GHC Project is an MSR-funded project to push the real-world use of parallel Haskell. Part of this project involves effort by Well-Typed to provide tools for use by the general community:

Work shall soon begin working on making the "Modified Additive Lagged Fibonacci" and perhaps other random number generators from the SPRNG library available in Haskell. Current plans are to implement the algorithms directly in Haskell, and to expose them as instances of System.Random.RandomGen. These generators are attractive for use in Monte Carlo simulations because they are splittable and have good statistical quality, while providing high performance.

Work is underway on extending the GHC EventLog and associated tools (ghc-events, ThreadScope). The aim is to support profiling of multi-process or distributed Haskell systems such as client/server or MPI programs. This involves incorporate some changes made in related projects (Eden, EdenTV). This work may have some benefits even for profiling single-process programs. It should also allow comparative profiling where multiple runs of the same program (e.g. different inputs or slightly different code) are viewed on the same timeline.

For more information on the Parallel GHC project, see the Haskell wiki page.

Featured Code

The feature code for this month is hulk, an IRC server in 1250 lines of code. Hulk was coded up by Chris Done in one evening, and it was used the next morning. (Chris has done a bit of cleaning up since).

Here's what Chris had to say about his experience writing Hulk:

Haskell's lightweight threads make it natural (and guilt-free!) to choose the design of one-thread-per-client. With a single MVar containing a pure value for the whole server-state, and the help of a monad stack of ReaderT(connection information), WriterT (replies) and StateT (user details), it was trivial to make the bulk of the code completely pure. LineBuffering on the (Handle-wrapped) sockets tripped me up; as opposed to NoBuffering, this did not behave as expected: many threads would block when only one should have. Overall, it was textbook Haskell; keep the main code pure, and only the truly impure in the outer shell.

It's up on hackage so you can install it with a quick

cabal install hulk

You can also fork his code on GitHub.

Got a cool use of Parallel Haskell or an interesting problem? Nominate it for the PH digest featured code/parallel puzzler!

Blog Posts

Parallel-Haskell and Haskell Cafe

Stack Overflow

Help and feedback

Got something for the digest? Send me an email at eric@well-typed.com. I'm particularly interested in short parallelism or concurrency puzzles, cool projects for featured code. Other comments and feedback (criticism and corrections especially!) would be welcome too.


Previous entries

Next entries