Parallel Haskell Digest 8

Fri, 02 Mar 2012 13:41:58 GMT, by eric.
Filed under ph-digest, parallel.

It's time for our next catch-up with the Parallel Haskell community. Did you have a chance to see Simon Peyton Jones's talk The Future is Parallel, and the Future of Parallel is Declarative? It's a good survey of some of the directions that parallel Haskell has been taking, and if you're somewhat new to this stuff, a great feel for the breadth of the community. You'll get a better idea why you see people in this digest muttering about arrays, others trying to make channels and transactions work together, and yet others talking up the 0MQ protocol. So check it out!

We at Well-Typed are writing the digest as part of our mission to push Haskell parallelism and concurrency out into the real world. We are very excited about these technologies and we want to do whatever it takes to make it more accessible to everybody. More news below on how we're doing in this Parallel GHC project.

News

Jobs

Parallel GHC Project Update

ThreadScope 0.2.1 has been released! This version contains the features we had demonstrated at the Haskell Implementor's Workshop in September 2011. Since our workshop prototype, we have greatly refined the spark histogram feature, tuning the visualisations so that they are easier to understand. We've also written a small tutorial to go with the new release. The ThreadScope Tour works through concrete examples on using ThreadScope to debug the performance of parallel programs. We'd love feedback you have about the tutorial, especially things you feel like you need a little more help with.

Along with the new ThreadScope and tutorial, we also have a new version of the ghc-events package which now provides an encoding of the meanings of events in state machines. This makes it possible to validate eventlogs, and doubles as an always up-to-date source of code as documentation.

We've made some progress on our work in developing a swappable transport layer for Cloud Haskell. We now have a prototype implementation “distributed-process” (intended to be the sucessor to “remote”, the current Clound Haskell implementation). For more details, see the distributed-process GitHub page, particularly the examples and the design document, which incorporates feedback on our initial proposal.

Finally a bit of partner news to wrap things up:

Word of the month

Over the next few digests, we'll be switching our focus from parallelism to concurrency. We tend to stress the distinction because Haskell offers ways to write parallel programs without making explicit use of concurrency. Parallelism done right gets us faster programs. Concurrency on the other hand buys us… concurrency. It's not going away. If every multicore computer in existence were to vanish, we would want to solve concurrent problems. Whether the simultaneity is real or simulated, we would still want to do more than one thing at the same time – accept user input, display progress messages, serve multiple clients.

So let's dig in! We first got a taste of concurrency in second Parallel Haskell digest, where we introduced the notion of threads. As an abstraction, threads give us a way to express the separation of concerns between different jobs. But this isn't enough. Sometimes we need to undo the separation just enough to pass information from one thread to another.

This brings us to our word of the month: MVar. The humble MVar (pronounced “em-var”) is one of many solutions for this communication problem, a fairly low-level one by Haskell standards, but one that is still useful enough that you'll see it used very frequently. An MVar is like a burri… wait, wrong tutorial. Actually, it is helpful to think of an MVar as a box in the sense that it holds values and can either be full or empty. The MVar type takes a type variable, so an MVar Int might hold an integer , an MVar String a String, an MVar [String] a list of strings and so on.

   -- Control.Concurrent.MVar
   data MVar a
   newEmptyMVar :: IO (MVar a)
   takeMVar :: MVar a -> IO a
   putMVar  :: MVar a -> a -> IO ()

To give an idea how this might be used, below is a small program that fetches some URL in one thread while doing something else in the other. We fork off a Haskell thread that does the fetching and write to MVar to indicate what we've retrieved. In the main thread, we do our other work and then just wait until the page has been fetched.

 main = do
    m <- newEmptyMVar
    
    forkIO $ do
      r <- getURL "http://en.wikipedia.org/wiki/Shovel"
      putMVar m r
    
    doSomethingElse
    
    r <- takeMVar m
    putStr r

These MVar's may look a little familiar if you've used IORefs in Haskell. Here is a mini API for comparison:

   -- Data.IORef
   data IORef a
   newIORef   :: IO (IORef a)
   readIORef  :: IORef a -> IO a
   writeIORef :: IORef a -> a -> IO ()

So what exactly do MVar's buy us? Why not just use IORefs to share mutable variable across threads? The reason is that coordination between threads can get messy: we want to make sure that any value we pass from one thread to another is accounted for (and not accidentally overwritten before being consumed), that we don't try to consume values that are somehow out of date with respect to other threads (that updated values are received instead of an old value being read twice). Suppose we wanted to fetch a URL while doing something else at the same time. How do we know when we have successfully retrieved it?

 -- don't write this at home!
 inadvisableMain = do
    m <- newIORef "" -- default value? :-(
    
    forkIO $ do
      r <- getURL "http://en.wikipedia.org/wiki/Shovel"
      writeIORef m r -- are we overwriting something? :-(
    
    doSomethingElse
     	 	
    r <- readIORef m -- is there something to read? :-(
    putStr r

In the example above, we have no idea if the page at URL would have been fetched by the time we try to display its contents. What we are looking for is a synchronisation mechanism. We need a way to indicate that our shared values are ready to be used. For example, we might hit upon the idea of combining IORef with Maybe. Now we have a some extra bureaucracy to handle. If we read a value and get Nothing we would then know that there isn't yet a value ready to be read. One way or another we would have to account for this case, for example busy waiting until we get a Just. On the flip side, we want to make sure that when somebody has written a value, the intended recipient has read it before we write over it. This sort of bureaucracy would be best packaged up into helper functions, functions that look awfully like takeMVar and putMVar might. Notice the change in name even. Now we're not just reading, but taking, emptying the box to signal that it's been read; and we're not just writing, but putting, only writing when the box is empty. Throw in a little help from the runtime system so that we're doing something smarter than busy waiting and we'll have worked our way back to the MVar.

So the MVar combines references with locking to provide for synchronisation between threads. If you're coming from other languages, this should sound rather familiar. C programmers may have used mutexes (MVar ()) or semaphores (MVar Int) to protect shared data. Java programmers may have used synchronized methods and statements to prevent thread interference and memory inconsistency problems. The MVar is just a slightly nicer Haskell packaging to the same sort of idea. This means it suffers the same problems as its locky sisters. Sure, having the locks be implicit and putting them where they count (the data being shared) makes life a bit simpler, but at the end of the day locks are still locks. Take them in the wrong order, and you can still get deadlocks, races and all those subtle hard-to-reproduce bugs that keep programmers up at night.

What is the hapless foot-shooting programmer to do? The good news is that MVar's are only one of several mechanisms for dealing with concurrency. Safer mechanisms exist, albeit at a cost. MVar's present a compromise between performance and safety. If you are extra extra careful, you can on the one hand squeeze some serious performance out of atomicallyModifyIORef for concurrent data structures. If on the other hand, if you're willing to take a potential performance penalty in exchange for never worrying about a deadlock again, stay tuned for our next word of the month, “transaction”. For more about MVar's in the meantime, have a look at Edward Z. Yang's blog for an MVar overview as well as the updated API documentation, and finally the concurrency chapter from Real World Haskell.

Videos

Blogs and packages

Mailing lists

Parallelism

Concurrency

StackOverflow and Reddit

Help and Feedback

If you'd like to make an announcement in the next Haskell Parallel Digest, then get in touch with me, Eric Kow, at parallel@well-typed.com. Please feel free to leave any comments and feedback!