Parallel Haskell Digest 10

Fri, 18 May 2012 14:11:15 GMT, by eric.
Filed under ph-digest, parallel.

Hello Haskellers!

Did you see Ambassador Peyton Jones in Scala land? Simon was recently at ScalaDays 2012 (a large gathering for professional Scala users) giving a keynote talk on Cloud Haskell (one hour video). Cloud Haskell is a pretty exciting new development in the Haskell space, providing the beginnings of a story for distributed programming in Haskell. It's also one of the areas we're focused on over the Parallel GHC project, building a new implementation to replace the current prototype. We're looking forward to talking a bit more about Cloud Haskell in the next (and final) edition of the digest.

Wait, did I say just final? Indeed, by the next digest, we'll be wrapping up the Parallel GHC project. In addition to a bit more Cloud Haskell material, we'll give a little recap of the things we and our partners worked on over the two years. It's been fun!

Meanwhile, in this penultimate edition, we'll be taking a look at concurrent channels for our word of month. We also have new parallel Haskell book to look forward to, an update to Accelerate, the new meta-par family of packages to look at, and also a lot of recent activity on StackOverflow.

News

Conferences

Word of the month

This month, we'll be taking a short breather in our exploration of the Haskell concurrency space, and fleshing out some of the uses for the tools we already have. In the past two digests, we saw how Haskell provides locks for low-level concurrency, and the vastly safer transactions for concurrency at a higher level. Both approaches give us the notion of a typed mutable variables, the idea being that an MVar Int would hold a locked integer, whereas a TVar Int would hold instead hold transactional reference to an integer. These variables can hold arbitrarily complex things of arbitrary type; you could have anything from a TVar Char to a TVar Customer (where Customer would be some record you've defined in your application).

Now that we have mutable variables, it's worth thinking a bit harder about what we might actually put into them. Suppose you find yourself in a typical producer/consumer scenario, for example, with a web service that automatically marks student essays, and which is broken into a piece that accepts submissions (producer) and which passes them on to the core essay-marking engine (consumer). So the producer generates essays and the consumer eats them up and does some work on them; how do we get them talking to each other? It's not enough to just use a single TVar because we want the producer to be able to continue cranking out essays whilst the consumer is working, rather than waiting for it to finish. We assume here that essay-marking is a fairly clever and computationally expensive process, and for this reason, we would want some kind of backlog that the producer can tack things on to, and the consumer can pull things off of.

As such, our word of the month is channel. The unbounded channel abstraction is something that you can fairly easily implement out of either the locky MVar's or transactional TVar's, but we'll focus on the latter as transactions are just so much more civilised (though the same concepts would mostly apply). In the STM world, channels look a little like the following:

-- Control.Concurrent.STM.TChan
data TChan a

newTChan   :: STM (TChan a)
writeTChan :: TChan a -> a -> STM ()
readTChan  :: TChan -> STM a

In the same fashion as the TVar's that we introduced last time, TChan's are parameterised with a type variable, meaning that you could have a channel of characters with TChan Char, or a channel of customers with TChan Customer, and so forth. Creating, reading, and writing to a channel are all transactions (i.e., in the the STM monad). Revisiting our essay marking service, we can sketch out how these channels might be used:

import Control.Concurrent.STM.TChan

main :: IO ()
main = do
    chan <- newTChan
    forkIO (producer chan)
    forkIO (consumer chan)
    forever $ return ()

producer :: TChan Essay -> IO ()
producer chan = forever $ do
    essay <- magicalWebFrameworkStuff
    atomically $ writeTChan chan essay

consumer :: TChan Essay -> IO ()
consumer chan = forever $ do
    essay <- atomically $ readTChan chan
    mark essay

mark :: Essay -> IO ()
mark essay = 
    putStrLn "Let me think..."
    -- State-of-the-art marking technology,
    -- just $25000 per site license
    randomRIO (1, 10000000) >>= threadDelay
    pass <- randomIO
    if pass
       then putStrLn "Pass, good job!"
       eles putStrLn "Fail!"

And that's it! Using concurrent channels does not get more complicated or deeper than this. You may have noticed that in this particular example, we have not really gained (or for that matter lost) that much from sticking to the transactional version of channels. Using the locky MVar version would basically consist of dropping the atomically's, importing from Control.Concurrent.Chan, and using Chan instead of TChan.

Now that we have a bit of an idea what channels are about, it could be worthwhile to consider what it really offers over simpler alternatives. For example, in the introduction we rejected the idea of just using a single TVar because this would force our producer and consumers to wait on each other for each and every essay, rather than going about their asynchronously merry ways.

So we know we want something like channels, but how exactly do we go about building them? For starters, wouldn't we get a channel structure by just wrapping Data.Sequence.Seq with a single TVar? It could be made to work as we are using STM (it simply wouldn't work if we were using MVar's instead; consider the empty channel), but it would leave us with the unfortunately inability to simultaneously read from and write to the channel. These operations would have to grab a hold of the whole queue, leaving the other to retry until later. It would a little sad not to enable this bit of concurrency, considering that that reading and writing take place at opposite ends of the queue, the reader walking along trying to keep up with the writer.

Instead of naively wrapping a queue, the current implementation uses a sort of linked list with TVar'ed cons cells and TVar's pointing to both the beginning (the read end) and the end of the list (the write end). Here are the data structures that make up a channel:

type TVarList a = TVar (TList a)
data TList a    = TNil | TCons a (TVarList a)

data TChan a = TChan (TVar (TVarList a)) -- read end
                     (TVar (TVarList a)) -- write end

It can be a little bit tricky to think about because we've got TVar's wrapping around things that eventually wrap around TVar's themselves. It's a whole chain of TVar's, and if you can have a TVar a, there's no reason not to have a TVar (TVar a). If that feels a bit shaky, try implementing channels yourself as a quick little exercise. We'll speed things along with a handful of pictures to illustrate how it might work. First, our visual language for talking about TVar'ed cons cells:

TChan legend

A new channel has three TVar's, one for the linked list (it points to TNil), and a pair of read/write ones pointing to this pointer:

new TChan

Writing the channel involves adding to the list and moving the write pointer to the new tail of the list:

write TChan

And finally reading those items off the channel involves moving the read pointer forward:

read TChan

The implementation should be fairly straightforward from the pictures, although one place you might get stuck when trying to read from an empty channel. After all, how do you return a value from a channel that doesn't have any, especially since you're expected to return plain old a instead of Maybe a? Well, sometimes you just gotta wait. We briefly glossed over this in our taste of STM in the last word of the month, but STM offers a retry function simply causes a transaction to be aborted and tried again. Using this notion of blocking, you should be able to get a readTChan that waits until there is something to be read.

Hopefully, the exercise of implementing channels will be a useful reminder to think of the concurrency abstractions that Haskell provides (threads, transactional mutable variables) as primitives on top of which you can build more interesting and useful abstractions. For a little more fun, head over to Simon Marlow's tutorial on parallelism and concurrency. In this tutorial, Simon illustrates the building channels over MVar's (also worth doing) and mentions an easy generalisation to multicast channels (one write end, two read ends!) and also a small extension to “unread” a value (pushing it back on to the read end). Both extensions are easy on the surface, but hard to nail down to the exact desired semantics (there's no known correct implementation of the unread extension), at least when you're dealing with locks and MVar's. But if you stick to transactions and TVar's, both wind up being straightforward. Check out his tutorial!

Videos

Blogs and packages

Mailing lists

StackOverflow and Reddit

This month saw quite a lot of activity on StackOverflow, largely from user Clinton trying to puzzle through STM and other concurrency issues. The STM and atomicModifyIORef series of questions could be interesting, at the very least, to see what sorts of things people wonder about when they first run into Haskell concurrency.

General questions

STM

atomicModifyIORef

Reddit

Help and Feedback

If you'd like to make an announcement in the next Haskell Parallel Digest, then get in touch with me, Eric Kow, at parallel@well-typed.com. Please feel free to leave any comments and feedback!