Parallel Haskell Digest 5

Wed, 31 Aug 2011 11:04:42 GMT, by eric.
Filed under parallel, ph-digest.

Hello Haskellers!

Eric here, reprising my role as Parallel Haskell Digester. Many thanks to Nick for good stewardship of the digest and nice new directions for the future. This month we have Erlang PhDs (and a postdoc), new partners, two Monad Reader articles in progress and some strategies. As usual, this digest is made possible by the Parallel GHC Project.

News

Scalable Erlang PostDoc and 2 PhD Studentships (8 Aug)

What, Erlang?! Yes, you are reading the right digest. If you're generally into parallelism and concurrency in functional programming languages, you may be especially interested to know about this announcement from Phil Trinder.

RELEASE project - A High-level Paradigm for Reliable Large-scale Server Software - is funded by the EU Framework 7 programme for 36 months from October 2011. Its aim is to scale the radical concurrency-oriented programming paradigm to build reliable general-purpose software, such as server-based systems, on massively parallel machines (100 000 cores).

Word of the Month

Last month, we had par and pseq as our twin words of the month. Let's pursue this little train of parallel thought; our next word the month is strategy. Strategies have been around since the 1993 paper Algorithm + Strategy = Parallelism; that's before even we started using monads in Haskell! They've recently been revamped in Simon Marlow's Seq no More paper, and it's this version of strategies that we'll be exploring here.

Strategies are built on top of the par and pseq primitives we saw in the last digest. They provide a nice way to express the often complicated logic we need to make the best use of parallelism in our code. Use of strategies can also help to make parallel code easier to read and maintain because they allow us to more cleanly separate the core logic from our code that which pertains to our use of parallelism.

Before delving into strategies, let's take a small notational detour by introducing the Eval monad. Suppose we wanted a parallel version of the map function, something that would apply a function to each item of a list. Using the par and pseq from the last digest, we might express this function

parMap :: (a -> b) -> [a] -> [b]
parMap f [] = []
parMap f (a:as) = b `par` bs `pseq` (b:bs)
 where
  b  = f a
  bs = parMap f as

If we look carefully at the code we can observe that there is something inherently sequential in the way we have expressed this parallel computation: first spark off f a then recurse to the tail of the list, and finally cons.

The Eval monad builds off the insight that expressing parallelism is fundamentally (perhaps counterintuitively) about ordering things. Monads are well-suited for expressing ordering relationships, and so they have been been pressed to work for expressing parallel computation as well.

data Eval a
instance Monad Eval

runEval :: Eval a -> a
rpar :: a -> Eval a
rseq :: a -> Eval a

Eval is just a strict identity monad, with rpar and rseq as counterparts to par and pseq. We use Eval to compose sequences of parallel computation, which we extract the results of by using the runEval function. If you're not familiar with monads, you can get away with just treating Eval as new notation, or rather, borrowed notation, the same that we use IO, parser combinator libraries, QuickCheck and a plethora of other useful monads. It's worth noting also that despite appearances, we are still in purely functional territory — no IO here! — with the notion of sequencing being limited to controlling parallelism and evaluation depth.

To make use of Eval for our parMap function, we could write a version like the below. It introduces a change of type, from returning [b] to Eval [b]. In the general case, we could just use the runEval function to get our result back, but we are not baking it into parMap because we would typically want to use then function within a greater Eval context anyway.

parMap :: (a -> b) -> [a] -> Eval [b]
parMap f [] = return []
parMap f (a:as) = do
  b  <- rpar  (f a)
  bs <- parMap f as
  return (b:bs)

As before, this function captures the basic idea of its sequential counterpart map: apply function, recurse to tail, cons new head to new tail. This is a passable parallel map, but there are still two things which are unfortunate about it. First, we have repeated the implementation of map, not a big deal for such a small function but a potential maintenance problem for more complex code. Second, we have only captured one sort of parallel evaluation, firing off sparks for all the cons cells, whereas in practice getting parallelism right requires some often careful tuning to get the right level of granularity and account for dependencies. For instance, what if instead of doing each cell in parallel, we wanted to bunch them up into chunks so as to minimise the coordination overhead? What we need is a way to express ideas about running code in parallel (such as “break into chunks and run each chunk simultaneously"), preferably avoid duplicating existing code or otherwise making it more complicated than it needs to be.

This is where strategies come in. A strategy is just a function that turns some data into an Eval sequence. We've already encountered two strategies above, rpar which sparks off a parallel evaluation, and rseq which evaluates its argument to weak head normal form. Many more strategies possible particularly when they are custom designed for specific data types. Also, if we have a strategy, we can imagine wanting to use it. This we capture with using, which applies the strategy function and runs the resulting sequence. Parallel evaluation aside, you can almost pretty much think of x `using` s as being equivalent to as id x, although you'll want to watch out for a couple of caveats described the tutorial Parallel and Concurrent Programming in Haskell.

Strategy a = a -> Eval a

using :: a -> Strategy a -> a
x `using` s = runEval (s x)

Now that we've put a name to the idea of building Eval sequences, we can think about trying to generalise our parMap. One way would be to write a higher-order strategy. This parList function is almost identical to parMap, except that instead of applying any function to a list we limit ourselves to just applying some strategy on it.

parList :: Strategy a -> Strategy [a]
parList strat []     = return []
parList strat (x:xs) = do
  x'  <- rpar (x `using` strat)
  xs' <- parList xs strat
  return (x':xs')

We can then define parMap by parameterising parList with rseq to get a parallel list strategy which we then apply to map f xs.

parMap :: (a -> b) -> [a] -> Eval [b]
parMap f xs = map f xs `using` parList rseq

We have now achieved two things. First we've improved the modularity of our code by separating algorithm (map f xs) from coordination (parList rseq). Notice how we are now reusing map instead of reimplementing it, and notice as well how we now have a reusable parList that we can use apply to any situation that involves a list. Second, by isolating the algorithm from the coordination code we have given ourselves a lot more flexibility to switch strategies. To bunch our list up into coarser-grained chunks, for example, we could just swap out parList with parListChunk

parMap :: (a -> b) -> [a] -> Eval [b]
parMap f xs = map f xs `using` parListChunk 100 rseq

In this word of the month, we have very lightly touched on the idea of strategies as a compositional way of expressing parallel coordination and improving the modularity of our code. To make use of strategies, we basically need to be aware of two or three things:

1. How to apply a strategy: foo `using` s evaluates the value foo with strategy s

2. Useful strategies from the library: the basics are r0 which does nothing, rseq which evaluates its argument to weak head normal form, rdeepseq which evaluates it all the way down to normal form, and rpar which sparks the value for parallel evaluation. Be sure to check out Control.Parallel.Strategies for more sophisticated strategies such as parListChunk, which divides the list into chunks and applies a strategy to each one of the chunks in parallel.

3. (Optionally) How to build strategies, the simplest way being to use the dot function to compose two strategies sequentially. If you implement your own strategies instead of combining those from the library, you may want to have a look at the Seq no More for a little safety note.

If you want to know more about strategies, the first place to look is probably the tutorial Parallel and Concurrent Programming in Haskell (there is a nice example in there, implementing a parallel K-means algorithm), followed by the Control.Parallel.Strategies API. You could also just dive in and give it a try on some of your own code. It's sometimes just a matter of grabbing a strategy and using it.

Parallel GHC Project Update

The Parallel GHC Project is an MSR-funded project to push forward the use of parallel Haskell in the real world. The aim is to demonstrate that parallel Haskell can be employed successfully in industrial projects.

Speaking of industrial projects, our recent search for a new project partner has been successful! In fact, we will be welcoming two new partners to the project, the research and development group of Spanish telecoms company Telefónica, and VETT a UK-based payment processing company. We are excited to be working with the teams at Telefónica I+D and VETT. There may be some Cloud Haskell in our future; stay tuned for more details about the respective partner projects.

Also coming up are a couple of Monad Reader articles featuring work from the Parallel GHC project.

Kazu Yamamoto has been writing an article for the upcoming Monad Reader special edition on parallelism and concurrency. He'll be revisiting Mighttpd (pronounced "Mighty"), a high-performance web server written in Haskell in late 2009. Since Mighttpd came out the Haskell web programming landscape has evolved considerably, with the release of GHC 7 and development of several web application frameworks. The new mighttpd takes advantage of GHC 7's new IO manager as well as the Yesod Web Application Interface (WAI) and the HTTP engine Warp. Do check out his article when it comes out for proof that "we can implement a high performance web server in Haskell, which is comparable to highly tuned web servers written in C", or if you can't wait, try his most recent slides on the topic.

Bernie Pope and Dmitry Astapov have been working on an article about the Haskell MPI binding developed in the context of this project. You can find out more about the binding on its Hackage and GitHub pages.

Blogs, Papers, and Packages

Mailing list discussions

All you need to know about asynchronous exceptions is: don't use them! They're too difficult to reason about, in terms of failure modes and such. Use TVars or MVars instead.

Stack Overflow and Reddit

Help and Feedback

If you'd like to make an announcement in the next Haskell Parallel Digest, then get in touch with me, Eric Kow, at parallel@well-typed.com. Please feel free to leave any comments and feedback!