The Haskell Consultants
info@well-typed.com

Well-Typed are hiring: Haskell consultant

Thu, 12 Jan 2012 12:44:17 GMT, by andres.
Filed under well-typed.

In order to keep up with customer demand, we are looking to hire a Haskell expert to work with us at Well-Typed as a Haskell consultant.

This is an exciting opportunity for someone who is passionate about Haskell and who is keen to improve and promote Haskell in a professional context.

The role is quite general and could cover any of the projects and activities that we are involved in as a company. The tasks may involve:

  • working on the Haskell compilers, libraries and tools;
  • Haskell application development;
  • working directly with clients to solve their problems.

Well-Typed has a variety of clients. For some we do proprietary Haskell development and consulting. For others, much of the work involves open-source development and cooperating with the rest of the Haskell community: the commercial, open-source and academic users.

At the moment, we are running the Parallel GHC Project. It is likely that initial tasks will have some connection with parallel and/or concurrent programming in Haskell. We are also doing quite a bit of GHC maintenance, and some knowledge or interest in compiler internals, operating systems, the foreign language interface, and/or deployment issues would be welcome.

Our ideal candidate has excellent knowledge of Haskell, whether from industry, academia, or personal interest. Familiarity with other languages, low-level programming, and good software engineering practices are also useful. Good organisation and ablity to manage your own time, and reliably meet deadlines, is important. You are likely to have a bachelor's degree or higher in computer science or a related field, although this isn't a requirement. Experience of consulting, or running a business, is also a bonus.

The position is initially as a contractor for one year with a salary of 150 GBP per day. We offer flexible hours and work from home. Living in England is not required.

In the longer term there is the opportunity to become a member of the partnership with a full stake in the business: being involved in business decisions, and fully sharing the risks and rewards.

If you are interested, please apply via info@well-typed.com. Tell us why you are interested and why you would be a good fit for the job, and attach your CV. Please also indicate when you might be able to start. We are more than happy to answer informal enquiries. Contact Duncan Coutts, Ian Lynagh or Andres Löh for further information, either by email or IRC.

The deadline for applications is Friday 27th January 2012.

About Well-Typed

Well-Typed LLP is a Haskell services company, providing consultancy services, writing bespoke applications, and offering commercial training in Haskell and related topics.

http://www.well-typed.com/


Parallel Haskell Digest 7

Sat, 24 Dec 2011 20:32:44 GMT, by eric.
Filed under parallel, ph-digest.

GHC 7.4 is coming! There is loads to look forward to, but sometimes, it's the little things that count. For example, do you hate the fact that you can't just flip on an +RTS -N without having to first recompile your program, this time remembering to throw an -rtsopts on it? Duncan Coutts has relaxed the requirement so that commonly used RTS options can be used without it. This flag was originally implemented to counter security problems for CGI or setuid programs; however, it was also a hassle for regular users because it got in the way of common options like -eventlog, -N, or -prof. The GHC 7.4 RTS will make a better tradeoff between security and convenience, allowing a common set of benign flags without needing -rtsopts.

That's the sort of thing that the Parallel GHC Project is about. We want to push parallel Haskell out into the real world, first by helping real users (our guinea pigs industrial partners) to apply it to their work, second by making it easier to use (tools, libraries), and finally communicating more about it (this digest).

In this month's digest, we'll be catching up on news from the community. After the holidays, we'll be back with some new words of the month exploring a bit of concurrent Haskell. In the meantime, happy hacking and Merry Christmas!

News

Job Opportunity at Parallel Scientific

Peter Braam wants you, parallel Haskeller!

Parallel Scientific, LLC is a Boulder, CO based early stage, but funded startup company working in the area of scalable parallelization for scientific and large data computing. We are implementing radically new software tools for the creation and optimization of parallel programs benefiting applications and leveraging modern systems architecture. We build on our mathematical knowledge, cutting edge programming languages and our understanding of systems software and hardware. We are currently working with the Haskell development team and major HPC laboratories world wide on libraries and compiler extensions for parallel programming.

Note the mandatory Haskell experience and the desirability of “in depth knowledge of core Haskell libraries for parallel programming (NDP, REPA etc)”.

Parallel GHC Project Update

The Parallel GHC Project is an MSR-funded project, run by Well-Typed, with the aim of demonstrating that parallel Haskell can be employed successfully in real world projects.

Our most recent work has been in polishing the upcoming ThreadScope release that we previewed this September at the Haskell Implementor's Workshop. This new release comes with goodies for users of Strategies or the basic par/pseq parallelism: spark creation/conversion graphs, visualisations showing your spark pools filling and emptying, and histograms displaying the distribution of spark sizes. All this with the aim of helping you gain deeper insight, not just what your program is doing but why.

We've also done backend work to make ThreadScope even more useful further down the road. First, we have improved the ghc-events package by encoding the meanings of events in state machines. This makes it possible to validate eventlogs, and doubles as an always up-to-date source of code as documentation. Second, we have extended the GHC RTS to emit the startup wall-clock time and Haskell threads labels to the eventlog. The wall-clock time event allows us to synchronise logs for simultaneous processes, brining us a step closer to using ThreadScope on distributed programs. Named Haskell thread make it easier to distinguish threads from each other.

Finally, we have been exploring the use of Cloud Haskell for high performance computing on clusters. To do this we would need to abstract Cloud Haskell over different transport mechanisms, that is to develop a robust Cloud Haskell implementation sitting on top of a swappable transport layer. We have posted an initial design for this layer on the parallel-haskell list. We have taken the substantial feedback into consideration and will be sending a revised design and recording it in a page on the GHC wiki. Meanwhile, we are working to further validate our design on simple models of both the transport layer and a cloud Haskell layer on top. Longer term, we aim to implement some transports, an IP transport in particular and perhaps a single-node multi-process transport using forks and pipes.

Tutorials and Papers

  • Tutorial: Deterministic Parallel Programming in Haskell (7 Oct)

    Well-Typed's Andres Löh presented a parallel programming tutorial at the recent Haskell in Leipzig meeting. The tutorial comes with slides, exercises, sample code. It paints a picture of the parallel Haskell landscape, and then focuses on one of the many possible approaches (namely, strategies). One nice feature of the tutorial is an emphasis on practicalities, for example, on using ThreadScope to figure out where performance goes wrong in a program. So if you're looking for a way to get started using on parallelism to speed up your Haskell code, give Andres' tutorial a try!

  • Parallel Genome Assembly with Software Transactional Memory (27 Oct)

    Ketil Malde wrote up some of his experiences using STM to parallelise an inherently complicated program best solved with multiple interacting threads. His article demonstrates that a program using STM is able to successfully parallelize the genome scaffolding process with a near linear speedup. Ketil would be interested in any feedback the community may have.

Blogs and Packages

Actors, actors everywhere

  • remote: Cloud Haskell is here! (27 Oct)

    You may have been hearing a lot about Cloud Haskell lately, the new Erlang-ish distributed programming library for Haskell. Now's your chance to see what all the fuss is about! Jeff Epstein has uploaded the remote package to Hackage, so take it for a spin by doing

    cabal update
    cabal install remote
    

    Library documentation is on the Hackage page, and more details are available in the paper Towards Haskell in the Cloud

  • Distributed storage in Haskell (30 Oct)

    So what are people doing with Cloud Haskell? Julian Porter for one has been working on a distributed monadic MapReduce implementation. Along the way he's produced a general proof of concept for distributed storage. Have a look at Julian's page for a short paper and GitHub page.

  • simple-actors 0.1.0 released (11 Oct)

    Brandon Simmons accounced simple-actors, an EDSL-style library for writing more structured concurrent programs, based on the Actor Model. It was designed for local concurrency, as an alternative to ad-hoc use of Chans, but could be extended to a distributed system by defining appropriate SplitChan instances for some network "channel".

  • Haskell Actors (28 Oct)

    Martin Sulzmann wishes he'd named his actor package “multi-headed-actor”. With the recent interest in actor style concurrency in Haskell, there may be some confusion about the various packages that are out there. The point in Martin's library is being to pattern match over multiple events in the message queue, which makes it easier elegantly express ideas like a marketplace actor which matchmakes buyer/seller messages. While Martin's library is built on concurrent channels, it could be adapted to use distributed channels provided by haskell-mpi or Cloud Haskell. See the paper for more information Actors with Multi-Headed Message Receive Patterns.

More concurrency

  • stm-stats: Retry statistics for STM transaction (9 Oct)

    Joachim Breitner blogged about the stm-stats package, which provides wrappers around atomically to track how often a transaction was initiated and how often it was retried. The stm-stats library is used interally by Factis research, but recently released to the wider Haskell community. In fact, Factis have recently hired Joachim to help them contribute back to the Free Software community where possible. So, thanks, Factis and congratulations, Joachim!

  • How to deal with concurrent external events? (11 Oct)

    Apfelmus has been scratching his head over a design problem for event-based frameworks such as GUI libraries: how do you deal with events that occur while you are currently handling another event? Apfelmus gave a simple wxHaskell demonstrator illustrating the problem, (A) reacting to an event while handling another one may expose internal invariants but (B) reacting to an event after finishing another one may render it “impossible”, i.e. it should not have happened in the first place. Any thoughts on the dilema?

  • Concurrency And Foreign Functions In The Glasgow Haskell Compiler (24 Oct)

    Leon P. Smith posted an overview of the interaction between Haskell concurrency and FFI calls in GHC. Leon's post walks us through some the basic concepts: capabilities, Haskell threads, OS threads, and bound threads. This could be good place to start before delving into papers or library documentation.

  • iteratee-stm (4 Nov)

    John Lato announced the new iteratee-stm library recently uploaded to Hackage. Iteratee-stm provides an iteratee interface that uses bounded TChans for communication. This makes it simple to run IO in a separate thread from processing.

Parallelism

  • Automatic deparallelization (17 Nov)

    Ken Takusagawa explored a different perspective on parallelism. Instead of adding parallelism to programs, what if we started with too much parallelism and stripped it away to fit reality?

    Consider always writing code in a style using egregious fine grained parallelism: assume lots of cores with no communication latency and no overhead. It is the compiler's job to deparallelize (unparallelize, serialize) the program to run on the actual number of cores available, taking into account communication latency and the overhead of parallelization

    Oh, and [qkhsskbg]

  • Introducing Speculation (22 Jul 2010)

    Recently, I got a chance to catch up with Edward Kmett, getting my mind twisted into delightful funny shapes in the process. Edward mentioned his speculation library, yet more parallelism in Haskell! The library is based on the paper Safe Programmable Speculative Parallelism by Prakash Prabhu et al. It provides a way to parallelise inherently sequential algorithms (eg. lexing, Huffman decoding) by guessing the value of intermediate results. You start working in parallel to build work off the guess, only discarding it if the guess turns out to be wrong later on. Check out Edward's blog and slides for more details.

  • Quasicrystals as sums of waves in the plane (24 Oct)

    Keegan McAllister posted an somewhat hypnotic animation of quasicrystals. His post comes with complete source code for his program using the Repa parallel arrays library. Repa was useful to Keegan because it provides

    • Immutable arrays, supporting clean, expressive code
    • A fast implementation, including automatic parallelization
    • Easy output to image files, via repa-devil
  • Simple library for CAS posted (7 Dec)

    Ryan Newton released IORefCAS, which provides a drop-in replacement for atomicModifyIORef that takes advantage of the new casMutVar# primop from GHC 7.2. Ryan says that “[b]ecause it's an easy change it might be worth trying that for hot IOrefs in your parallel app.”

  • OpenCL 10.2.2 (23 Nov)

    Luis Cabellos has updated the Haskell OpenCL package with better documentation and improved error handling using Control.Exception instead of Either error.

Mailing list discussions

Help wanted

  • Parallel Matrix Multiplication (10 Dec)

    Mukesh Tiwari is trying to teach himself parallel Haskell (welcome!). He's gone through Real World Haskell and the tutorial by Simon Peyton-Jones and Satnam Singh, but now trying to implement a parallel matrix multiplication function, he finds himself with no sparks converted. Can anybody give Mukesh a hand?

    Mukesh also asked about resources for Parallel Haskell, which would be where I come in. Mukesh, have a look at the parallel Haskell portal: http://www.haskell.org/haskellwiki/Parallel

Cloud Haskell

  • Cloud Haskell now on Hackage (27 Oct)

    Jeff Epstein's announcement that he had uploaded “remote” to Hackage was greeted with joy and a somewhat lengthy discussion on package/module naming. It looks like the modules will be moved from 'Remote' to 'Control.Distributed.Actor' or 'Control.Distributed.Process' to match the approach used for the concurrency packages. The final package name seems to be distributed-process.

    Anybody got a paintbrush?
  • Haskell Cloud and Closures (1 Oct)

    Fred Smith gave Cloud Haskell a try, using it to remotely compute the plus function. Now he wants to be able to send a function to a remote host, no matter if the function is locally declared or at the top level. Erik de Castro Lopo replied that this was a known limitation with the only known workaround being to move the required function to the top-level. Chris Smith pointed out that while the current restrictions may be too tight, there is good reason to have them. As for alternatives approaches to serialising functions, David Barbour suggested maybe looking at the tangible values work by Conal Elliot.

  • Feedback on Cloud Haskell transport layer interface (2 Nov)

    As I mentioned in the Parallel GHC Project update, we've been looking quite a bit into Cloud Haskell lately. Duncan Coutts posted a request for feedback on the design for a Cloud Haskell transport layer interface. We're hoping one day to make use of Cloud Haskell on for high performance computing on clusters. To do this, we hope to develop a robust Cloud Haskell implementation sitting on top of a swappable transport layer, for example, an IP transport, or a single-node multi-process transport using forks and pipes.

    One issues that emerged from the discussion is how to deal with potentially a plethora of paramaters (eg. buffered vs eager? ordered? reliable?) associated with connection/endpoint creation. It doesn't help that each connection type may have its own set of parameters. Is it enough to be able to set and forget them during transport session initialisation, or is it essential for Cloud Haskell be able to set these parameters differently for different connections in the same session?

  • Parallel Haskell in industry (7 Nov)

    Sébastien Lannez also got a chance to try out Cloud Haskell. The remote package uploaded by Jeff seems to work well and — dabblers take note — the examples shipped with the code are very easy to adapt. Before digging deeper, Sébastien wanted to know more about

    1. performance limitations
    2. communication requirements/overheads
    3. stability
    4. already developed applications

    Jeff cautioned that while he thinks Cloud Haskell could be a good platform to develop distributed applications, it's still very much research software and a work in progress. Don't stake your company on Cloud Haskell just yet.

    That said, Duncan Coutts added, we are pretty happy with the design and optimistic about developing a robust implementation, because we can build it as an ordinary Haskell library without requiring tricky extensions to the runtime system. As for Sébastien's fourth question, a couple of Parallel GHC Project partners are rather keen on Cloud Haskell. We are working on the implementation and will hopefully have more to report on performance, overheads and other issues we encounter.

Multicore performance

  • SMP parallelism increasing GC time dramatically (5 Oct)

    It takes a village to tune a program. Tom Thorne has a program with a function does some fairly intensive calculations on with hmatrix. When Tom tries to get some simple parallelism on his 12 core machine, replacing a map with a parMap rdeepseq, he finds GC time going through the roof, from 1s (1.7%) to 248s (40.5%). Is the big scary number just an artefact of how GC time is reported, or is something really wrong?

    ThreadScope is a good first response here and Tom was duly nagged by the community. Tom promises to give it a go, although the last time he tried, the event log output produced about 1.8GB, and then crashed. The ThreadScope team would love to get hold of any hints about reproducing the crash.

    Ryan Newton observed that GC aside, the program does not appear to be scaling; the mutator time itself isn't going down with parallelism. Tom improved the parallelism a bit, breaking the work into chunks and spreading it around more evenly, and provided he disables the parallel GC, it turns much faster and outperforms the sequential version. Having loads of RAM to play and code that doesn't use much memory, Tom then tried telling the RTS to perform GC less often. This worked. Increasing the minimum allocation area size from its default 512K with +RTS -A32M allows Tom to get performance with the parallel GC comparable to that without. Hooray! But there's still this little problem… now Tom's program intermittently segfaults. Getting a bug report out of this may take a while though as Tom attempts to boil it down.

    Meanwhile, Oliver Batchelor offered his experience that enabling more threads than he has cores makes his program get drastically slower. Alexander Kjeldaas and Austin Seipp commented that this is due to GC needing to co-ordinate with blocked threads, and that the problem of oversaturating is well known. There's also the "dreaded last core slowdown" bug which once affected Linux users but seems to have gone away in recent Linux/GHC.

  • AMD Bulldozer modules and Haskell parallelism (13 Oct)

    Herbert Valerio Riedel has been eyeing the AMD FX-8120 Bulldozer processor. Bulldozer cores are not independent from each other, but grouped into pairs. So Herbert wanted to know how this might affect Haskell parallelism; would 8 cores really mean 8 or just 4 with slightly better SMT capability? Simon Marlow does not know (benchmarks). Duncan Coutts believes that it should be all fine as the pairing is not at all like hyperthreading.

  • Estimating contention on an IORef hammered with atomicModifyIORef (27 Oct)

    Ryan Newton starts us off with a hypothetical scaling bottleneck: all threads frequently accessing a single IORef using atomicModifyIORef (Data.IORef). This is commonly understood to be likely a bad idea, but how do we go about measuring just how bad it is? This sort of design appears in monad-par, as pointed out by Johan Tibbell, in the GHC IO manager, so it would be good to know how much it really hurts. (See also Ryan's IORefCAS package which seems to be partly a result of this discussion)

    One approach is to use GHC events to count operations on particular IORefs, then put that through a model that reports whether if the IORef is being used acceptably, or is "hot". Duncan Coutts suggests a simple way to get partway there: stick something like a traceEvent "IORef #3" on each use of atomicModifyIORef and do something like a ghc-events show | grep IORef to at least get an idea which IORefs are hotter than others and some orders of magnitude. We'll hear back from Ryan when he's had a chance to try it.

    Also for the interested, it's worth mentioning that GHC 7.4 will be sporting a new and improved traceEvent, this time exported through Debug.Trace and offering versions for use in pure code and IO both.

  • Way to expose BLACKHOLES through an API? (7 Nov)

    A BLACKHOLE in GHC acts as a placeholder for a thunk that is currently being evaluated. When the thunk is forced, GHC replaces it with a BLACKHOLE object, which it later replaces when it has the evaluation result. In a parallel/concurrent setting, it may happen that two threads are trying to evaluate the same thunk at the same time. In that case, the first thread creates the blackhole, which the second thread notices and blocks on until the evaluation result is available.

    Ryan Newton observes that this blocking is implicit, whereas “[w]hen implementing certain concurrent systems-level software in Haskell it is good to be aware of all potentially blocking operations”. He proposes a mechanism to expose blackholes, for example with a evaluateNonblocking :: a -> IO (Maybe a) that returns Nothing if the value is blackholed. Simon Marlow points out that this may be slightly problematic as thunks depend on each other and “you might be a long way into evaluating the argument and have accumulated a deep stack before you encounter the BLACKHOLE” See the discussion for a counter-proposal.

Data structures and concurrency

  • Efficient mutable arrays in STM (25 Oct)

    Ben Franksen has large arrays (millions of elements) with mostly small elements (Int or Double) and largely chunk-wise access patterns. The current implementation of Control.Concurrent.STM.TArray as Array ix (TVar e) is not nearly efficient enough for his use case. A more efficient implementation would be most welcome, but for now Ben is eyeing Data.Vector.Unboxed from the vector package instead. The idea is to use unsafeIOToSTM to provide shared transactional access to his arrays. Ben thinks he can live with the consequences: IO code being rerun, aborting, and inconsistent views.

    But does the STM transaction actually "see" that he changed part of the underlying array so that the transaction is retried? If not, how does he go about manually implementing this behaviour? Antoine Latter reports that no unsafeIOToSTM is not transactional - IO actions will be performed immediately and are not rolled back, and are then re-performed on retry. David Barbour and Ketil Malde suggested possible implementations, either keeping an extra TVar Int for every chunk in the array, or (B) cleaner and safer: create a “chunked” TArray that works with fixed-width immutable chunks in a spine.

    Another issue that came up is that transactions scale quadratically with the number of TVars touched. Bryan O'Sullivan and Ryan Ingram explained that this is due to choice of data structure (a list) for the STM transaction log, and should be easy to fix.

  • High performance threadsafe mutable data structures in Haskell? (27 Oct)

    Ryan Newton wanted to know if anybody else was working on threadsafe mutable data structures in Haskell. He and the monad-par team were planning to replace their work stealing deques with something more efficient. If anybody else is working in the same general area, teaming up would be great!

    Ryan will be exploring both a pure Haskell approach and one based on wrapping foreign data structures with the FFI. Ultimately, Ryan is aiming for an "abstract-deque" parameterizable interface that abstracts over many variants (bounded/unbounded, concurrent/non-concurrent, single/1.5/double-ended, etc). His current prototype makes use of phantom types and the type families extension to handle all this abstraction, with the intended end result being that someone can create a new queue by setting all the switches on the type (eg. q :: Deque NT T SingleEnd SingleEnd Bound Safe Int <- newQ), but this brings up a set of Haskell language and type system questions. More details in the thread!

  • Persistent Concurrent Data Structures (1 Nov)

    Like Ryan, Dmitri Kondratiev is interested in concurrent mutable data structures, but this time with persistence to boot. His goal is to program at a higher level of abstraction, avoiding the detail bloat that would result from directly using some data storage API (eg. SimpleDB). Dmitri's idea: a module tree of data structures mirroring Data.List, Data.Map, etc but with concurrency and persistence. One would be able to configure through the type interfaces:

    1. media to persist data (file? DBMS?)
    2. caching policy
    3. concurrency configuration (optimistic/pessimistic locking?).

    Dmitri's post prompted some suggestions for packages to look into:

    • safecopy: addresses both the issues of serializing the data and migrating it when the datastructure changes
    • acid-state: builds on top of safecopy to add a notion of transactions to any Haskell data structure
    • TCache: a transactional cache with configurable persistence
    • Haskell web server frameworks (eg. Yesod, Happstack [acid-state was formerly happstack-state]), as some come with persistence support

    Jeremy Shaw and David Barbour had reservations about what Dmitri had in mind when he said "concurrent". How would he deal with transaction boundaries, and would a concurrently modified Data.List variant still be a list? Evan Laforge also expressed skepticism about the viability of abstracting over data stores with potentially very different needs.

Threads, blocking

  • Waiting on input with hWaitForInput' orthreadWaitRead' (17 Oct)

    Jason Dusek would like to use evented I/O for a proxying application, in particular, to fork a thread for each new connection and then to wait for data on either socket in this thread, writing to one or the other socket as needed. He's found two functions which could help, System.IO.hWaitForInput and Control.Concurrent.threadWaitRead but each comes with some difficulties. Is there something like select() that works with handles rather than file descriptors?

    Ertugrul Soeylemez suggested an alternative approach, just plain Concurrent Haskell because “[a] hundred Haskell threads reading from Handles are translated to one or more OS threads using whatever polling mechanism (select(), poll(), epoll) your operating system supports”. He pasted a small echo server to demonstrate the idea. It wasn't entirely clear for Jason how to apply this to a proxy server. Jason has a lazyBridge :: Handle -> Handle -> IO () which writes everything it reads from one handle into the other and vice-versa, but it blocks and does not allow packets to go back and forth. Gregory Collins sketched out a possible solution: how about forkIOing two threads (one for the read end, one for the write end), with a loop over lazy I/O? This works, but is still somewhat surprising.

  • System calls and Haskell threads (3 Nov)

    Andreas Voellmy noticed this in Kazu Yamamoto's Monad Reader article on a high performance web server.

    When a user thread issues a system call, a context switch occurs. This means that all Haskell user threads stop, and instead the kernel is given the CPU time.

    Can that be right? Andreas thought, and Johan Tibell confirms, that when a Haskell thread is blocking a particular OS threads, other Haskell threads can continue run concurrently on other OS threads on other CPUs (see Extending the Haskell Foreign Function Interface with Concurrency).

    Further clarification comes from David Barbour, who points out why Kazu's original statement was correct in the context of the article. While Mighttpd uses Haskell threads for concurrency; it does not go the traditional route of using the RTS -Nx argument to generate OS threads. Instead it gets its parallelism from a "prefork" model that creates separate processes to balance user invocations (each process may itself be running multiple Haskell threads). This unusual approach is chosen to avoid issues with garbage collection.

  • Where threadSleep is defined? (6 Dec)

    Dmitri Kondratiev was looking for a function to make the current process (executing thread) go to sleep for a given time. Felipe Almeida Lessa pointed to the threadDelay function in Control.Concurrent.

Stack Overflow and Reddit

Help and Feedback

If you'd like to make an announcement in the next Haskell Parallel Digest, then get in touch with me, Eric Kow, at parallel@well-typed.com. Please feel free to leave any comments and feedback!

Bikeshed image by banlon1964 available under a CC-NC-ND-2.0 license.


Tutorial on Parallel Programming in Haskell

Fri, 07 Oct 2011 12:42:38 GMT, by andres.
Filed under parallel.

Today, I presented a tutorial at the Haskell in Leipzig meeting in Germany. The topic was parallel programming in Haskell, mainly using strategies and the parallel package.

The slides and the code that I have used are available for download. Feedback is welcome!


Parallel Haskell Digest 6

Thu, 06 Oct 2011 15:18:54 GMT, by eric .
Filed under parallel, ph-digest.

This edition of the Parallel Haskell Digest kicks off with some warm feelings from BioHaskeller Ketil Malde

Blessing of the day goes to the authors of threadscope. In fact, let me look it up: Donnie Jones, Simon Marlow, and Satnam Singh. I had an old STM experiment lying around, but couldn't quite make it go faster than the single threaded version.

Today I installed threadscope, and a glance at the output sufficed to identify the problem, and now at least it's faster. Great tool! (But you all knew that, of course)

Thanks indeed to the original authors of ThreadScope! Ketil made good use of the original ThreadScope. Since the original release, the folks at Well-Typed have taken over development of ThreadScope (Andres, Duncan, and Mikolaj) and are working to extend it so that it becomes even easier to squash those pesky performance bugs. These extensions, much like this digest, are made possible by the Parallel GHC Project. We hope you'll love the results.

News

Data Parallelism in Haskell slides and video: Manuel Chakravarty came all the way from Sydney (that's 931 km!) to tell the Brisbane FP Group about work going on at UNSW on data parallel programming in Haskell. The talk covers a lot of ground. It's an hour and a half long and

[It] motivates the use of functional programming for parallel, and in particular, data parallel programming and explains the difference between regular and irregular (or nested) data parallelism. It also discusses the Repa library (regular data parallelism for multicore CPUs), the embedded language Accelerate (regular data parallelism for GPUs), and Data Parallel Haskell (nested data parallelism for multicore CPUs).
Both video and slides are available, so check it out!

Word of the Month

The word of the month is dataflow as in dataflow parallelism. (And if you've just seen Manuel's talk, not to be confused with data parallelism). With dataflow, we will be wrapping up our now four-part series on parallel programming in Haskell. In recent words of the month, we've been looking at various approaches for parallelising your way to faster code. We started with the parallel arrays library Repa, then moved on to the Haskell par and pseq primitives, and built from there to get Strategies. Now we'll be looking at monad-par, a library for dataflow parallelism.

First, the bigger picture: why do we have another way of doing parallel programming in Haskell when we already have parallel arrays and Strategies? Part of the answer is parallelism is still a research topic, with new ideas coming out from time to time and some old ideas slowly withering away. For another part of the answer, it may help to think in terms of a trade-off between implicitness and performance, in other words, between Easy and Fast.

Not all problems are subject to the trade-off in the same way, and it's not always clear to what extent they are. As a rough sketch, problems that require applying the same operation on a large number of simple object can get fast performance from a parallel arrays library. This is a form of data parallelism. If the problem does not lend itself to data parallelism, the next port of call is likely Strategies. If Strategies are not adequate for the desired speedups, the problem may require an even more explicit approach.

Until recently, this meant “rolling your own” parallelism by using Concurrent Haskell: forking off threads, assigning tasks to them, and communicating finished work between them. Using concurrency for DIY parallelism is risky. It means venturing into IO, giving up determinism, exposing ourselves to all manner of side-effects, not to mention subtle concurrency bugs like deadlock. As Haskellers, we should demand better: safer, more predictable and easier to reason about. We want to have explicit fine-grained control without all the nasty side-effects. And now we can have it!

The latest addition to the parallel Haskell quiver is the monad-par library. Programming in monad-par looks a lot like programming in Concurrent Haskell:

data Par a
instance Monad Par
 
runPar :: Par a -> a     
fork :: Par () -> Par ()

data IVar a
 
new :: Par (IVar a)
put :: NFData a => IVar a -> a -> Par ()
get :: IVar a -> Par a

Instead of using forkIO, we use fork to create threads (not sparks!) and instead of using MVar's to communicate, we use IVar's. MVar and IVar behave in somewhat similar ways, in that any attempt to read from an IVar will wait until it has been filled. But this is also where differences emerge. Unlike their concurrent counterparts IVar may only be written to once; subsequent writes result in an error. This aligns with our desire for determinism. We never have to worry about the IVar changing values over time. Moreover, the Par monad does not allow for any IO (feature! not a bug) and computations in the Par monad can always be extracted with a simple runPar.

But this doesn't really tell us all that much about how to use this library. For now we come back to our word dataflow. The dataflow model treats program as a little black box, an input goes in, an output comes out, and what happens in between is immaterial:

The program is represented as a directed graph (a dataflow network), with each node representing an individual operation (black boxes in their own right), and connections between the nodes expressing dependencies. In the graph above, g and h depend on the output of f (the same output - there is only one) and in turn, i depends on the output of both g and h.

Dataflow networks are handy thinking tools because one of the things that makes it hard to get fast parallel code is the inherent sequentially that dependencies introduce. A dataflow network nicely captures which parts of the program are parallel (g and h are independent of each other) and which parts are sequential (g depends on f). This is a sort of parallelism topline; we can't go any faster than what this network allows, but at the very least we should take the shape of the network into account so we don't make matters worse and find ourselves waiting a lot more on dependencies than is strictly necessary.

Using the Par monad essentially consists in translating dataflow networks into code.

Suppose we have functions

f :: In -> A
g :: A  -> B
h :: A  -> C
j :: B  -> C -> Out

For the graph above, we might say:

network :: IVar In -> Par Out
network inp = do
 [vf,vg,vh] <- sequence [new,new,new]
 
 fork $ do x <- get inp
           put vf (f x)
  
 fork $ do x <- get vf
           put vg (g x)
 
 fork $ do x <- get vf
           put vh (h x)
 
 x <- get vg
 y <- get vh
 return (j x y)

It's useful to note that here dataflow graphs need not be static; they may be dynamically generated to fit the shape of the input. To see this and also a more realistic taste of monad-par, check out Simon Marlow's Parallel and Concurrent Programming Haskell tutorial. It shows how a parallel type inferencer might be implemented, its dataflow graph arising from the set of bindings it receives as input.

Finally what about Strategies? How do we choose between the two? Both are IO-free and deterministic, both require some but not a whole lot of comfort with monads. Par may be easier to use correctly than Strategies because it is strict and forces you to explicitly delineate the dependencies in parallel computations. This also makes it more verbose on the other hand. Aside from being more verbose, Par can be a little bit “messier” than Strategies because you don't get the same separation between algorithm and parallelism. And since we're ultimately concerned with performance, it's worth knowing that the Par monad currently involves more overhead than the Eval monad from Strategies. The latter may be more appropriate at finer granularities.

Perhaps one way to go about this is to start off with Strategies as they have been around longer and may require less invasive modifications to your code. But if you find Strategies to be difficult, or you just can't get the performance you want, don't reach for those forkIO's yet. Give Par a try.

For more about monad-par, do visit Simon's tutorial and the API on hackage. To dig deeper, you may also find it instructive to read A Monad of Deterministic Parallelism (Marlow et al) presented in this year's Haskell Symposium. Related to this are his CUFP monad-par slides tutorial slides.

This concludes our tour of current approaches to Haskell parallelism. There is some new technology on the horizon which we may cover in a later digest, particularly Data Parallel Haskell and nested data parallelism (data parallelism not to be confused with dataflow parallelism), but for next month, let's take a little break from parallelism and look at something a little different.

Parallel GHC Project Update

The Parallel GHC Project is an MSR-funded project, run by Well-Typed, with the aim of demonstrating that parallel Haskell can be employed successfully in real world projects.

This has been a month for releasing software. We've recently put on Hackage updates to ThreadScope and some helper libraries

  • ThreadScope (0.2.0), with a new spark profiling visualisation, bookmarks, and other enhancements
  • gtk2hs (0.12.1), adding GHC 7.2 compatibility,
  • ghc-events (0.3.0.1), adding support for spark events (needs GHC 7.3),

Duncan Coutts presented our recent work on ThreadScope at the Haskell Implementors Workshop in Tokyo (23 Sep). He talked about the new spark visualisation feature which show a graphical representation of spark creation and conversion statistics with a demo inspired by a program we are working on for Parallel GHC.

Meanwhile, we have also completed a pure Haskell implementation of the “Modified Additive Lagged Fibonacci” random number generator. This generator is attractive for use in Monte Carlo simulations because it is splittable and has good statistical quality, while providing high performance. The LFG implementation will be released on Hackage when it has undergone more extensive quality testing.

Blogs, Papers, and Packages

  • First Class Concurrency in Haskell (PDF) (31 Aug)

    Keegan McAllister posted slides from a talk he gave last year at the Boston Area Haskell Users' Group. His slides give an overview of topics in concurrent imperative programming:

    • Using closures with concurrency primitives to improve your API
    • Using MVar to retrieve thread results
    • How System.Timeout works
    • Implementing IORef and Chan in terms of other primitives
    • Implementing the π-calculus, and compiling the λ-calculus to it

    You might find Keegan's slides to be a useful introduction to Haskell Concurrency.

  • Lambda to pi (9 Sep)

    “If the λ-calculus is a minimal functional language, then the π-calculus is a minimal concurrent language”. Following up on his recent slides, Keegan wrote a follow-up post elaborating on his pi-calculus interpreter and lambda-calculus to pi-calculus compiler. If you want to run his blog post just pass the lhs file to GHCi.

    (note that I had to use -hide-package=monads-tf)

  • Sirkle (17 Sep)

    Morten Olsen Lysgaard released Sirkle. It provides an implementation of the Chord Distributed Hash Table, and a simple storage layer with fault tolerance and replication similar to DHash. The Sirkle DHT implementation uses Cloud Haskell, which we saw some interest about in the last edition of this digest. Sirkle is available on hackage and GitHub. “If you think distributed computing is fun”, Morten invites us, “and would love a hobby project, join me in discovering what can be done with a DHT in Haskell!”

  • ThreadScope (5 Sep)

    Duncan Coutts released ThreadScope 0.2.0, the performance visualiser that we saw Ketil used to make his STM experiment faster. The updated ThreadScope has a new spark profiling visualisation, bookmarks, and other enhancements. Now if only we had some sort of user documentation…

Mailing list discussions

  • Performance of concurrent array access (23 Aug)

    As a first step to his foray into writing concurrent parallel Haskell programs Andreas Voellmy wrote a bit of code to read that reads and writes concurrently to an IOArray. Unfortunately, the performance with multiple cores was not very good and using ThreadScope did not reveal anything obvious. What did help on the other hand was to switch to IOUArray, which puzzled Andreas. Why would this improve scalability to multiple threads and cores? Andrew Coppin and Brandon Allbery suggested it may have something to do with strictness. Johan Tibbell pointed out some places where Andreas would likely want to add a bit of strictness to his original IOArray version. But this only seems to have helped slightly. Anybody else have an idea?

  • (Repa) hFlush: illegal operation (24 Aug)

    Michael Orlitzky is using Repa to process MRI data, but when writing the results to file, he gets the error message spline3: output.txt: hFlush: illegal operation (handle is closed) Trial and error is a bit hard to do here because it takes 45 minutes to get to the point of failure. Ben Lippmeier offered to have a look if Michael could provide some data to go with the code he posted. Ben also pointed out that the implementation of Repa's text IO functions is naive, and may have problem with massive files. If the data is in some standard binary form, it may be worthwhile to write a Repa loader for it.

  • how to read CPU time vs wall time report from GHC? (14 Aug)

    Wishnu Prasetya is just getting started with Parallel Haskell. He's having some trouble interpreting runtime statistics from the RTS (eg. 8.97s ( 2.36s elapsed)).

    SPARKS: 5 (5 converted, 0 pruned)
       
    INIT  time    0.02s  (  0.01s elapsed)
    MUT   time    3.46s  (  0.89s elapsed)
    GC    time    5.49s  (  1.46s elapsed)
    EXIT  time    0.00s  (  0.00s elapsed)
    Total time    8.97s  (  2.36s elapsed)
    

    Why does the CPU time on the left exceed the wall clock time on the right? Iustin Pop pointed out that CPU time accumulates over the multiple CPUs. The roughly 4:1 ratio suggest Wish must be making 95% use of 4 CPUs (or partial use of more CPUs). This may sound efficient except that at the end of day what counts is the difference in wall clock times. Iustin points out that the MUT/GC times show Wish spending 60% of his time in garbage collection, perhaps a sign of a space leak or an otherwise inefficient algorithm.

  • Can't link OpenCL on Windows (12 Sep)

    OpenCL (Open Computing Language) is a framework for writing parallel programs that run on GPUs (more generally, on heterogeneous systems with perhaps a mix of CPUs and other processors). Jason Dagit is trying to get the OpenCLRaw bindings to work on Windows. On link time, however, he gets undefined symbol errors for everything he uses in the OpenCL API. Mystery solved: Jason later reported that the binding was set to use ccall whereas OpenCL uses stdcall.

  • Turn GC off (14 Sep) and Build local-gc branch of GHC (26 Sep)

    Andreas Voellmy (who we saw above with IOArray woes) is writing a server handling multiple long-lived TCP connections with a bit of shared state among connections. Ideally, it'd be a case of more cores = more clients, but the current stop-the-world garbage collection in GHC makes this expensive. His first thought was to completely disable garbage collection, and just run until he is out of memory. Perhaps there is an alternative.

    David Barbour commented that controlling the amount of live memory is important here, much more than avoiding allocations. Austin Seipp pointed us to recent work on the local-gc branch of GHC (see the paper Multicore Garbage Collection with Local Heaps). This work has not been merged into GHC though (as it's not clear if the improvements are worth the complexity increase).

    Andreas decided to give it a shot. Unfortunately, he can't get it to build. Help?

  • A Missing Issue on Second Generation Strategies (24 Sep)

    Burak Ekici is trying to parallelize RSA encryption and decryption with Strategies, but all his sparks keep getting pruned! Antoine Latter noticed that Burak's strategy wasn't actually doing anything to its input; it simply sparks off computations (themselves evaluated with rdeepseq) that nobody ever looks at. Daniel Fischer made a similar point, providing some improved code with a refactor and clean up of repeated code along the way.

  • Parallel Haskell Digest 5 followup

    Following up on the last word of the month (Strategies). Kevin Hammond observed that the first parMap example interweaves behavioural and functional code, defeating the point of strategies. The issue here is pedagogical: although I later follow up with the version using parList rseq, I run the risk of people simply copy and pasting while missing out on the benefits. Thanks for the feedback, Kevin! I'll see if I can avoid the try-the-wrong-way-first style in the future.

    As a more general note, feedback people give, particularly criticism and suggestions for improvement are very valuable, so keep it coming, folks!

Stack Overflow and Reddit

Help and Feedback

If you'd like to make an announcement in the next Haskell Parallel Digest, then get in touch with me, Eric Kow, at parallel@well-typed.com. Please feel free to leave any comments and feedback!


Upcoming Events

Sat, 10 Sep 2011 19:51:15 GMT, by andres.
Filed under well-typed.

Here's two upcoming events that Well-Typed will be involved in:

Hal6 Haskell-Workshop, 7 October 2011, Leipzig, Germany

This is a one-day workshop with various talks and tutorials. I will be giving a tutorial (90 minutes) on parallel programming in Haskell. In addition, Kevin Hammond will be talking about parallel refactorings.

Most of the talks during the day will be in German. So if you speak German, you might want to consider coming to Leipzig!

(More info)

FP Day, 14 October 2011, Cambridge, UK

This is a one-day training event for Haskell, F#, and Clojure, aimed mainly at participants from industry. Simon Peyton Jones and Don Syme are keynote speakers. Well-Typed is running a hands-on introduction to Haskell (180 minutes). There are a limited number of tickets available, and registration is open. It would be great to see you there!

(More info)


Parallel Haskell Digest 5

Wed, 31 Aug 2011 11:04:42 GMT, by eric.
Filed under parallel, ph-digest.

Hello Haskellers!

Eric here, reprising my role as Parallel Haskell Digester. Many thanks to Nick for good stewardship of the digest and nice new directions for the future. This month we have Erlang PhDs (and a postdoc), new partners, two Monad Reader articles in progress and some strategies. As usual, this digest is made possible by the Parallel GHC Project.

News

Scalable Erlang PostDoc and 2 PhD Studentships (8 Aug)

What, Erlang?! Yes, you are reading the right digest. If you're generally into parallelism and concurrency in functional programming languages, you may be especially interested to know about this announcement from Phil Trinder.

RELEASE project - A High-level Paradigm for Reliable Large-scale Server Software - is funded by the EU Framework 7 programme for 36 months from October 2011. Its aim is to scale the radical concurrency-oriented programming paradigm to build reliable general-purpose software, such as server-based systems, on massively parallel machines (100 000 cores).

Word of the Month

Last month, we had par and pseq as our twin words of the month. Let's pursue this little train of parallel thought; our next word the month is strategy. Strategies have been around since the 1993 paper Algorithm + Strategy = Parallelism; that's before even we started using monads in Haskell! They've recently been revamped in Simon Marlow's Seq no More paper, and it's this version of strategies that we'll be exploring here.

Strategies are built on top of the par and pseq primitives we saw in the last digest. They provide a nice way to express the often complicated logic we need to make the best use of parallelism in our code. Use of strategies can also help to make parallel code easier to read and maintain because they allow us to more cleanly separate the core logic from our code that which pertains to our use of parallelism.

Before delving into strategies, let's take a small notational detour by introducing the Eval monad. Suppose we wanted a parallel version of the map function, something that would apply a function to each item of a list. Using the par and pseq from the last digest, we might express this function

parMap :: (a -> b) -> [a] -> [b]
parMap f [] = []
parMap f (a:as) = b `par` bs `pseq` (b:bs)
 where
  b  = f a
  bs = parMap f as

If we look carefully at the code we can observe that there is something inherently sequential in the way we have expressed this parallel computation: first spark off f a then recurse to the tail of the list, and finally cons.

The Eval monad builds off the insight that expressing parallelism is fundamentally (perhaps counterintuitively) about ordering things. Monads are well-suited for expressing ordering relationships, and so they have been been pressed to work for expressing parallel computation as well.

data Eval a
instance Monad Eval

runEval :: Eval a -> a
rpar :: a -> Eval a
rseq :: a -> Eval a

Eval is just a strict identity monad, with rpar and rseq as counterparts to par and pseq. We use Eval to compose sequences of parallel computation, which we extract the results of by using the runEval function. If you're not familiar with monads, you can get away with just treating Eval as new notation, or rather, borrowed notation, the same that we use IO, parser combinator libraries, QuickCheck and a plethora of other useful monads. It's worth noting also that despite appearances, we are still in purely functional territory — no IO here! — with the notion of sequencing being limited to controlling parallelism and evaluation depth.

To make use of Eval for our parMap function, we could write a version like the below. It introduces a change of type, from returning [b] to Eval [b]. In the general case, we could just use the runEval function to get our result back, but we are not baking it into parMap because we would typically want to use then function within a greater Eval context anyway.

parMap :: (a -> b) -> [a] -> Eval [b]
parMap f [] = return []
parMap f (a:as) = do
  b  <- rpar  (f a)
  bs <- parMap f as
  return (b:bs)

As before, this function captures the basic idea of its sequential counterpart map: apply function, recurse to tail, cons new head to new tail. This is a passable parallel map, but there are still two things which are unfortunate about it. First, we have repeated the implementation of map, not a big deal for such a small function but a potential maintenance problem for more complex code. Second, we have only captured one sort of parallel evaluation, firing off sparks for all the cons cells, whereas in practice getting parallelism right requires some often careful tuning to get the right level of granularity and account for dependencies. For instance, what if instead of doing each cell in parallel, we wanted to bunch them up into chunks so as to minimise the coordination overhead? What we need is a way to express ideas about running code in parallel (such as “break into chunks and run each chunk simultaneously"), preferably avoid duplicating existing code or otherwise making it more complicated than it needs to be.

This is where strategies come in. A strategy is just a function that turns some data into an Eval sequence. We've already encountered two strategies above, rpar which sparks off a parallel evaluation, and rseq which evaluates its argument to weak head normal form. Many more strategies possible particularly when they are custom designed for specific data types. Also, if we have a strategy, we can imagine wanting to use it. This we capture with using, which applies the strategy function and runs the resulting sequence. Parallel evaluation aside, you can almost pretty much think of x `using` s as being equivalent to as id x, although you'll want to watch out for a couple of caveats described the tutorial Parallel and Concurrent Programming in Haskell.

Strategy a = a -> Eval a

using :: a -> Strategy a -> a
x `using` s = runEval (s x)

Now that we've put a name to the idea of building Eval sequences, we can think about trying to generalise our parMap. One way would be to write a higher-order strategy. This parList function is almost identical to parMap, except that instead of applying any function to a list we limit ourselves to just applying some strategy on it.

parList :: Strategy a -> Strategy [a]
parList strat []     = return []
parList strat (x:xs) = do
  x'  <- rpar (x `using` strat)
  xs' <- parList xs strat
  return (x':xs')

We can then define parMap by parameterising parList with rseq to get a parallel list strategy which we then apply to map f xs.

parMap :: (a -> b) -> [a] -> Eval [b]
parMap f xs = map f xs `using` parList rseq

We have now achieved two things. First we've improved the modularity of our code by separating algorithm (map f xs) from coordination (parList rseq). Notice how we are now reusing map instead of reimplementing it, and notice as well how we now have a reusable parList that we can use apply to any situation that involves a list. Second, by isolating the algorithm from the coordination code we have given ourselves a lot more flexibility to switch strategies. To bunch our list up into coarser-grained chunks, for example, we could just swap out parList with parListChunk

parMap :: (a -> b) -> [a] -> Eval [b]
parMap f xs = map f xs `using` parListChunk 100 rseq

In this word of the month, we have very lightly touched on the idea of strategies as a compositional way of expressing parallel coordination and improving the modularity of our code. To make use of strategies, we basically need to be aware of two or three things:

1. How to apply a strategy: foo `using` s evaluates the value foo with strategy s

2. Useful strategies from the library: the basics are r0 which does nothing, rseq which evaluates its argument to weak head normal form, rdeepseq which evaluates it all the way down to normal form, and rpar which sparks the value for parallel evaluation. Be sure to check out Control.Parallel.Strategies for more sophisticated strategies such as parListChunk, which divides the list into chunks and applies a strategy to each one of the chunks in parallel.

3. (Optionally) How to build strategies, the simplest way being to use the dot function to compose two strategies sequentially. If you implement your own strategies instead of combining those from the library, you may want to have a look at the Seq no More for a little safety note.

If you want to know more about strategies, the first place to look is probably the tutorial Parallel and Concurrent Programming in Haskell (there is a nice example in there, implementing a parallel K-means algorithm), followed by the Control.Parallel.Strategies API. You could also just dive in and give it a try on some of your own code. It's sometimes just a matter of grabbing a strategy and using it.

Parallel GHC Project Update

The Parallel GHC Project is an MSR-funded project to push forward the use of parallel Haskell in the real world. The aim is to demonstrate that parallel Haskell can be employed successfully in industrial projects.

Speaking of industrial projects, our recent search for a new project partner has been successful! In fact, we will be welcoming two new partners to the project, the research and development group of Spanish telecoms company Telefónica, and VETT a UK-based payment processing company. We are excited to be working with the teams at Telefónica I+D and VETT. There may be some Cloud Haskell in our future; stay tuned for more details about the respective partner projects.

Also coming up are a couple of Monad Reader articles featuring work from the Parallel GHC project.

Kazu Yamamoto has been writing an article for the upcoming Monad Reader special edition on parallelism and concurrency. He'll be revisiting Mighttpd (pronounced "Mighty"), a high-performance web server written in Haskell in late 2009. Since Mighttpd came out the Haskell web programming landscape has evolved considerably, with the release of GHC 7 and development of several web application frameworks. The new mighttpd takes advantage of GHC 7's new IO manager as well as the Yesod Web Application Interface (WAI) and the HTTP engine Warp. Do check out his article when it comes out for proof that "we can implement a high performance web server in Haskell, which is comparable to highly tuned web servers written in C", or if you can't wait, try his most recent slides on the topic.

Bernie Pope and Dmitry Astapov have been working on an article about the Haskell MPI binding developed in the context of this project. You can find out more about the binding on its Hackage and GitHub pages.

Blogs, Papers, and Packages

  • Data Parallel Haskell and Repa for GHC 7.2.1 (11 Aug)

    Ben Lippmeier has updated the Data Parallel Haskell libraries on Hackage to go with the newly released GHC 7.2.1. While DPH is still a technology preview, this new version is significantly more robust than previous ones. If nested data parallelism isn't what you're after, Ben has also updated the Repa library for parallel arrays. See PH Digest 3 for more details.

  • thespian: Lightweight Erlang-style Actors for Haskell (18 Aug)

    Alex Constandache updated Hackage with this intriguing library. I wrote to Alex asking about about the relationship with Cloud Haskell. It turns out Alex had somewhat similar goals in mind and may be focusing on Cloud Haskell instead.

  • dbus-core 0.9 (23 Jul)

    John Millikin announced "the first significant release to dbus-core in a while" with significant improvements to both the API and implementation. In particular, dbus-client has been merged into dbus-core. See the announcement for more details then a quick code sample showing the new simple API in action. For the curious, D-Bus is an inter-process communication mechanism used in desktop environments like Gnome.

Mailing list discussions

  • How to ensure code executes in the context of a specific OS thread? (4 Jul).

    Jason Dagit has found the correct handling of GUI events on OS X requires checking for events and responding to them from the main thread. He wanted to know if such thing was even possible on the threaded RTS, particularly so that he could use the library from GHCi. David Pollak has a bit of code that provides a runOnMain function using an FFI call to a bit objective C code. Simon Marlow pointed out that for compiled code, the main thread is a bound thread, bound to the main thread of the process. Simon is specifically talking about the threaded RTS here as this is already trivially true of the non-thread one. The question still remains open for GHCi.

  • Cloud Haskell (22 Jul)

    Tom Murphy was curious to see anybody was using Cloud Haskell yet. It looks like we have a couple people who are thinking of using it: Tim Cowlishaw might be using it for his masters thesis project (A Haskell EDSL for agent-based simulation). Julian Porter, who we mentioned in the last digest for his Monad Reader article, is planning to make his MapReduce monad framework work on the distribute cluster. Now how about those of us who have actually given it try?

    As for the Parallel GHC project, it's worth mentioning that our new partners are planning to use Cloud Haskell in their projects. We'll be sure to report back to the community about the results.

  • A language that runs on the JVM or .NET... (31 Jul)

    KC was hoping for a version of Haskell "that runs on the JVM or .NET has the advantage of Oracle & Microsoft making those layers more parallelizable". Austin Seipp replied that it basically boils down to it being a lot of work. Chris Smith also cautions that while there may be many good reasons to easy to want a Haskell.Net or a Jaskell, Haskell parallelism and concurrency may not be one of them. Parallel and concurrent Haskell code relies generates a large volume of lightweight threads. Simply using their JVM or CLR counterparts as they are would not do.

    Are you interested in seeing a serious and long-term effort towards Haskell on the JVM? See the FAQ and get in touch!

All you need to know about asynchronous exceptions is: don't use them! They're too difficult to reason about, in terms of failure modes and such. Use TVars or MVars instead.

  • Haskell Actors, Linda, publish/subscribe models? (13 Aug)

    Dmitri O. Kondratiev needs to "build a framework to coordinate task producers / consumers distributed in the same and different address spaces. He needs to scale a data processing application somewhat Hadoop-like way yet in more flexible manner, without Hadoop-specific distributed FS constraints." At this point, Ryan Newton suggested "Cloud Haskell", which was greeted by with joy and great enthusiasm. "Finally Erlang actor model of communication comes to Haskell!"

  • More generic parfoldr than this... (23 Jul)

    Prabhat Totoo tried to implement a parallel foldr function by breaking the input listed chunks, mapping foldr over each chunk, and running foldr over the list of results. This solution only works if the input and output of the folded function have the same type. Prabhat wanted to know how he could go about generalising his parallel fold. Also while doing some timings, he noticed that the regular presumably sequential fold improved with multiple cores. What's up with that?

    Conal Elliott make the point that Prabhat's current parallelise-by-reassociating solution is only correct for an associative operation, which is forcibly of type (a -> a -> a). Have a look at the the rest of the thread for an exploration of parallel fold by Kevin Hammond, Conal Elliot and Sebastien Fischer.

Stack Overflow and Reddit

Help and Feedback

If you'd like to make an announcement in the next Haskell Parallel Digest, then get in touch with me, Eric Kow, at parallel@well-typed.com. Please feel free to leave any comments and feedback!


Hacking on the hackage-server at the Haskell Hackathon

Mon, 25 Jul 2011 23:45:05 GMT, by duncan.
Filed under community.

Several people have been wondering recently what is up with the new hackage-server implementation. At the weekend at HacPDX-II we made quite a bit of progress in fixing things and getting more people involved.

The HacPDX-II hackathon took place this last weekend in Portland Oregon. Thomas DuBuisson contacted me a week or two before to see if we could make the new hackage-server a focus for the hackathon and get several people involved. This worked out quite well, we got several people who attended HacPDX-II helping out. Also, just the fact that Thomas declared that this project would be the focus meant that it energised some of the rest of us who are interested in the hackage-server but who could not attend HacPDX-II in person, including Matthew Gruen and Jeremy Shaw. So before getting into the technical details I'd really like to thank Thomas for organising things and to Matthew and Jeremy for giving up much of their weekends to help us out.

Jeremy Shaw, of Happstack fame amongst other things, sent in two excellent patches to shift hackage-server to using newer infrastructure. One patch updated everything to using the latest Happstack 6.x version, and the other converted from using happstack-data to the newer acid-state package.

David Lazar and Thomas DuBuisson got stuck in, tried things out and found and fixed various bugs. In particular, David fixed some things to do with the interface that package maintainers can use to mark packages as deprecated, and Thomas fixed up some stuff to do with package tags.

Matthew Gruen, who did his GSoC project with me on the hackage-server last summer, spent much of the weekend hacking, debugging and answering questions from the rest of us. In particular he and I spent quite a while thinking about HTTP authentication issues.

Originally, last summer, we had the crazy idea that we could offer either HTTP Basic or Digest authentication based on the user account. The reason we wanted to do this is because the current ~800 user accounts on the central hackage.haskell.org server all use HTTP basic auth (stored in Apache's standard htpasswd database format) but really we'd like to migrate to using HTTP digest auth because it's more secure than basic auth. Unfortunately, due to the way HTTP authentication works, having some user accounts using basic and some using digest authentication just will not work. That means we cannot do a smooth transition from basic to digest for the existing user accounts. So we had a think about what system to go for and how to clean up the mess in the current code that tries to offer different auth systems to different users.

For the moment what we've decided is to go for digest authentication exclusively and to think about how to do migration separately. This is partly so that we can get something working now, because several people want to use the hackage-server now, it's not just the central hackage.haskell.org that we have to think about.

One option for migration is to import all the old accounts and have a special re-registration page for existing users. That would authenticate them using their old passwords and then enable their new accounts with the new digest authentication system. The point is it would not be transparent, but it should not be too much of a bother.

Going for digest authentication means that we had to fix a few bugs in the client side HTTP implementation. Matthew tracked down the problem and I've sent the patches off to the maintainer of the HTTP package that cabal-install uses.

I spent a few days before the hackathon working on the hackage-mirror client. This is a program included in the hackage-server package that can synchronise packages between servers. In particular it can copy packages from the old hackage.haskell.org to your own hackage-server instance. Along with the HTTP digest work, on the final day of the hackathon I managed to get it to sync packages from hackage.haskell.org to my hackage-server running on localhost and to do the uploads authenticated using digest auth. So that was quite a good conclusion to things.

One thing that worked quite well for the hackathon was that we used a common darcs repository and gave everyone write access. We all just worked away and darcs pushed and pulled using the common repository. This is a model where darcs works really well. Now that the hackathon is over, I'll move the changes into the main repository.

So there's still plenty to do, but I think we've shown that the new hackage-server implementation is nearly ready to be used, for new server instances at least. I'm going to be doing more work on the mirroring, because a client of Well-Typed wants to use it, so I expect to post an update on that sometime soon.


Parallel Haskell Digest 4

Fri, 22 Jul 2011 15:46:17 GMT, by nick.
Filed under parallel, ph-digest.

Hello Haskellers!

It's time for the fourth edition of the Parallel Haskell Digest, bringing you a summary of the news and discussions of parallelism and concurrency in Haskell. The digest is made possible by the Parallel GHC Project.

News

  • The Monad.Reader: special issue on parallelism and concurrency

    In the words of The Monad Reader itself:

    Whether you're an established academic or have only just started learning Haskell, if you have something to say, please consider writing an article for The Monad.Reader! Issue 19 will be a special issue focusing on articles related to parallelism and concurrency, construed broadly. The submission deadline for Issue 19 will be: Tuesday, August 16.

  • Threadscope Implementor's Summit

    The threadscope implementor's summit was held this month at Microsoft Research, Cambridge. The summit brought together developers who are currently working with Threadscope, whether that be hacking on generating the events that are emitted by GHC for analysis in Threadscope, using the event trace that is produced for detailed profiling information, or working on improving Threadscope itself to provide better tools for parallel profile analysis.

    The meeting was full of ideas, and covered topics such as: adding extensions to the current eventlog format to enable additional information to be tagged to events; improving the visualisation of information in threadscope; formalising the transitions of thread states into a finite state machine; and matching up executed code with corresponding source locations. With all this food for thought, we should expect plenty of interesting work in this area.

Word of the Month

In this issue we have two words of the month: par and pseq.

Haskell provides two annotations, par and pseq, that allow the programmer to give hints to the compiler about where there are opportunities to exploit parallelism. While these annotations are not typically used directly by programmers, it is useful to understand them because they are the underlying mechanism for higher level tools such as "parallel strategies" and the Eval monad.

The two combinators have the following signatures:

par  :: a -> b -> b
pseq :: a -> b -> b

While their signatures are the same, they are used to annotate different things. The par combinator hints to the Haskell implementation that it might be beneficial to evaluate the first argument in parallel. However, since Haskell does not impose an evaluation order, we also need pseq, which instructs the compiler to ensure that its first argument is evaluated before the second.

Let's take a look at an example inspired by Parallel Performance Tuning for Haskell by Jones, Marlow, and Singh, which illustrates this nicely. Suppose you're interested in the sum of two expensive computations. To keep things simple, we'll use a naive implementation of fib (the point here isn't to have an efficient computation, I'm trying to show an expensive one):

fib :: Int -> Int
fib 0 = 0
fib 1 = 1
fib n = fib (n-1) + fib (n-2)

For a second expensive computation, we'll calculate the negafibonacci number, which works on negative numbers:

negafib :: Int -> Int
negafib 0 = 0
negafib (-1) = 1
negafib n = nfib (n+2) - nfib (n+1)

The sum of these two can be calculated by the following sequential function:

sumfib :: Int -> Int
sumfib n = x + y
 where
  x = fib n
  y = negafib (-n)

There's obvious room for improvement here when we have two cores: we simply calculate the expensive computations on separate cores. Annotating the code above is a fairly simple process. We first use par to annotate the fact that x must be calculated in parallel with the rest of the computation. Second, we ensure that y gets computed before x + y by annotating with pseq. The result is as follows:

import Control.Parallel (par, pseq)

psumfib :: Int -> Int
psumfib n = x `par` (y `pseq` x + y)
 where
  x = fib n
  y = negafib (-n)

We can write a simple program that outputs the result of running this computation with the following main function:

main :: IO ()
main = putStrLn . show . sumfib $ 37

We should hope for the parallel version to work twice as fast, since the two expensive functions should take about the same time to compute. Here's the output of compiling and running the sequential version of the program:

$ ghc -rtsopts Main.hs
[1 of 1] Compiling Main             ( Main.hs, Main.o )
Linking Main ...
$ time ./Main

real        0m6.113s
user        0m6.090s
sys         0m0.010s

Now replacing sumfib with psumfib produces the following results:

$ ghc -rtsopts -threaded Main.hs
[1 of 1] Compiling Main             ( Main.hs, Main.o )
Linking Main ...
$ time ./Main +RTS -N2

real        0m3.402s
user        0m6.660s
sys         0m0.040s

This is obviously a very trivial example, but the point is that annotations provide a powerful way of expressing parallel algorithms. It's interesting to note that for this simple program, the timings for the parallel version on a single core performs as well as the single core version compiled without threading.

While annotations are a simple mechanism for expressing where parallelism might be exploited in a program, beware that there are a number of pitfalls to using this technique: all that glitters is not gold! The main difficulty in using par and pseq directly is that you really need to have a clear understanding of evaluation order. In particular, unless you understand what laziness is doing to the evaluation order, then you might find that the computations you're sparking off with par might not occur when you expected they should.

Then there are all the general difficulties that you face with parallel programming, like getting the right granularity of work to do in parallel. Using par is quite lightweight so can be used to exploit reasonably fine grained parallelism, but it is certainly not free. Finally, parallel performance suffers when there are too many garbage collections, so keeping this to a minimum by either using more efficient data structures or increasing available memory, becomes an important factor.

Nevertheless, it's well worth having a play with par and pseq. The next step after that is to look at parallel strategies. Strategies is a layer of abstraction on top of par and pseq. You might like to to read Seq no more: Better Strategies for Parallel Haskell, by Simon Marlow et al. which describes Strategies and the Eval monad. It's all available in the parallel library on Hackage. More recently, the Par monad has also been introduced as yet another way of describing parallel evaluations. These key topics will no doubt feature in a future word of the month, so stay tuned!

Parallel GHC Project Update

The Parallel GHC Project is an MSR-funded project to push the real-world use of parallel Haskell. The aim is to demonstrate that parallel Haskell can be employed successfully in industrial projects. This month we're having a guest column from the team over at Los Alamos National Laboratory, one of the partners involved in the project (you can see the full details in report LA-UR 11-0341). They have been working on writing Monte Carlo physics simulations in Haskell, which has given them high levels of parallelism, along with useful tools for abstraction. So, without further ado, over to Michael Buksas from LANL:

Our goal is to build highly efficient Monte Carlo physics simulations using parallel Haskell. We're focusing on SMP performance though some combination of explicit threading and pure parallel annotations.

The Monte Carlo approach involves randomly sampling the space of solutions to generate data which contributes to the solution. For these physical problems, our samples are the tracks of particles as they move through space, interacting with a physical material as they go. Data collected from each particle trajectory is then combined into information needed to compute the solution. For example, the detailed information about the particle's interaction with the material is collected into a collective effect on the material properties.

To date, we have a code base which includes two approaches to the problem. One is a specific and parallel-tuned application code targeting relativistic neutrino transport in stellar atmospheres. The other is building a more general environment for creating specific applications, such as this one.

We recently presented to our colleagues in LANL some preliminary results on the parallel performance of the targeted application code.

To give a sense of the approach to parallelization in this code, consider these high-level functions from an earlier serial version:

main :: IO ()
main = do
  (n, rest) <- parseCL
  let tally = runMany infMesh simpleMat n
  writeTally &quot;tally&quot; tally

runMany :: Mesh -> Material -> Word32 -> RNG -> Tally
runMany msh mat ntot rng = let
  ps = genParticles ntot msh rng
  tallies = map (runParticle msh mat) $ ps
  in foldl' merge emptyTally tallies

And consider the following changes for the parallel version:

main :: IO ()
main = do
  (n,sz) <- parseCL
  let tally = feed infMesh simpleMat n sz prand
  writeTally &quot;tally&quot; tally

feed :: Mesh -> Material -> Word32 -> Word32 -> RNG -> Tally
feed msh mat ntot chunkSz rng
    | ntot <= chunkSz = runMany msh mat ntot rng
    | otherwise       = t `par` (ts `pseq` (merge t ts))
    where t  = runMany msh mat chunkSz g1
          ts = feed msh mat (ntot - chunkSz) chunkSz g2
          (g1,g2) = split g

We've wrapped function runMany in feed, which partitions the collection of generated particles into groups of size chunkSz, and issues these particles to runMany in parallel.

With this simple change, we seeing upwards of 80% utilization of up to 8 cores, for a performance improvement greater than a factor of 6. We believe that performance can be further improved with different strategies for breaking down the work, and looking for additional parallelization opportunities in the collection of results.

Our other branch of development is focused on finding useful abstractions and high-level functions to support programming a variety of Monte Carlo problems of this kind. We have identified a few such useful abstractions, and implemented them as type classes and type families.

For example, Space is a general term for the physical space and imposed symmetries in which we can perform a simulation. We express this as follows:

class Space s where
  type Position s  :: *
  type Direction s :: *
  stream    :: s -> Distance -> s
  position  :: s -> Position s
  direction :: s -> Direction s
  make      :: Position s -> Direction s -> s

and implement specific spaces, such as one with the symmetry of the unit sphere:

instance Space Spherical1D where
    type Position  Spherical1D = Radius
    type Direction Spherical1D = Normalized Vector2
    stream (Vector2 x y) (Distance d) = Vector2 (x+d) y
    position s  = Radius $ vmag s
    direction s = normalize s
    make (Radius pos) dir = pos *| (normalized_value dir)

This allows the specific space data types to be used in a variety of contexts. Using ordinary parametric polymorphism is also effective:

-- | Stream a single particle:
stream :: (p -> (e,p))   -- ^ Function to produce each step. Comes from a model.
          -> (e -> Bool) -- ^ Check for terminal events to stop streaming
          -> p           -- ^ Initial particle
          -> [(e, p)]    -- ^ Resulting list of events and particle states.
stream stepper continue p = next p
  where next p =
          let (e, p') = stepper p
          in  (e, p') : if continue e then next p' else []

The above is our high-level routine function for generating a history from a single particle, recorded as a list of (event, particle) pairs, where the event and particle data types are provided for each problem.

Blogs, Papers, and Packages

  • Fun with parallel monad comprehensions (19 July 2011)

    This blog post by Thomas Petricek featured in the Monad Reader 18, and covers some of the interesting things that can be achieved with monad comprehensions when viewed from a parallel perspective. Along the way, he deals with examples such as the parallel composition of parsers.

  • Parallelizing a nonogram solver (05 July 2011)

    Jasper Van der Jeugt detailed his implementation of a parallel nonogram solver. Nonograms also go by the name of Paint Sudoku: the aim is to colour in a grid where a list of numbers is given for each row and column and these numbers indicate consecutive runs of filled-in squares in the corresponding row or column. For large puzzles, grids that are 20x20, Jasper reports that on a dual core machine his a parallel algorithm reduces execution by 37.9% compared to its sequential counterpart.

  • The IVar monad (29 June)

    Edward Z. Yang has written a series of posts discussing IVars, which are immutable variables which are a write-once, read-many (these are particularly handy for communicating results from a child process to its parent). Edward's post outlines the difficulties involved in defining a monad for IVars.

  • Map reduce as a monad (05 July 2011)

    Julian Porter wrote an article for The Monad Reader 18 about how MapReduce could be expressed as a monad. The MapReduce framework finds its roots in functional programming, and this is an interesting take on the problem.

Mailing list discussions

  • NVIDIA's CUDA and Haskell (5 July 2011)

    Vasili Galchin was wondering whether or not there had been any efforts to build bridges between NVIDIA's CUDA and Haskell. Don Stewart was quick to respond with a number of links to active work in the area:

    Trevor McDonell noted that the accelerate package was best accessed from the source repository on github, and that the CUDA bindings hadn't yet been tested or updated for the latest toolkit release.

  • Unbelievable parallel speedup (3 June 2011)

    While reading Simon Marlow's tutorial on parallel and concurrent programming, John Ramsdell reported some remarkable (slightly superlinear!) performance gains for one of his programs. Thomas Schilling guessed that this was due to the large variance in the figures reported, but went on to describe how it might be possible to obvserve such performance boosts due to reduced local cache misses when using several cores. Without more information about the program in question, it's difficult to do any kind of diagnosis, but nevertheless, it's great to hear about good results from a happy Haskeller!

  • Automatic Reference Counting (2 July 2011)

    After hearing about the new static analysis tools in Clang that does automatic reference counting (ARC), Thomas Davie was wondering if some compiler gurus might be able to comment on the applicability of this kind of analysis to Haskell, as an alternative to garbage collection. This led to an enlightening discussion about reference counting versus garbage collection.

  • Haskell on NUMA (16 June 2011)

    Michael Lesniak was wondering what the state of parallel performance of Haskell on Non-Uniform Memory Access (NUMA) machines was like, since he's having problems and can't find much useful information online. Nobody seems to have answered this one, are there any suggestions?

  • Parallel compilation and execution? (26 May 2011)

    Michael Rice was trying to figure out how to compile and run a simple program that outputs the result of a parallel fibonacci algorithm. After a quick reminder to use pseq rather than seq to force sequential evaluation, Daniel Fischer suggested that recompilation might be required, and that passing --fforce-recomp would be a good way to ensure that this occured.

    Michael was also keen to know whether Control.Parallel was comparable to OpenMP. Alex Mason gave a detailed reply and gave an example of parallel mergesort as a means of comparison.

  • parMap doesn't work fine (12 May 2011)

    After just starting out with parallel computations in Haskell, Grigory Sarnitskiy ran into troubles making parMap work with lazy structures. To resolve these issues, Brandon Moore pointed to using rdeepseq, and Maciej Piechotka suggested deepseq.

  • efficient parallel foldMap for lists/sequences (17 June 2011)

    Sebastian Fischer re-posted his question about efficient parallel foldMap for lists to the parallel mailing list. In essence he was seeking an efficient implementation of foldMap, where a list is folded into a single value before a map is applied to the result. Johannes Waldmann advised against using ordinary lists, and mentioned that he was using Data.Vector instead. Additionally, he recommended switching to a sequential fold once a parallel fold had been used to a certain depth. Christopher Brown further confirmed that it was a good idea to spark off computations when the granularity is high enough to make it worthwhile, and also mentioned that it was best to spark computations that were evaluated to normal form.

  • Wanted: parallel Haskell tutorial/talk/demonstration in Leipzig, Germany, October 7 (8 July 2011)

    Johannes Waldmann is looking for volunteers who might be able to present at their local Haskell Workshop, and welcomes submissions on parallel and distributed computing using Haskell. The submission deadline is 20 August.

Stack Overflow

Help and Feedback

If you'd like to make an announcement in the next Haskell Parallel Digest, then get in touch with me, Nicolas Wu, at parallel@well-typed.com. Please feel free to leave any comments and feedback!


Parallel Haskell Digest 3

Thu, 16 Jun 2011 08:39:15 GMT, by nick.
Filed under parallel, ph-digest.

Hello Haskellers!

Welcome to the third edition of the Parallel Haskell Digest, bringing you news, discussions and tasters of parallelism and concurrency in Haskell. The digest is made possible by the Parallel GHC project. More news about how we're doing below.

This digest is brought to you by Eric Kow and Nicolas Wu. Nick will be taking over as maintainer of the Parallel Digest for the next few months.

It's tutorial season in the Parallel Haskell world, with Simon Marlow's Parallel and Concurrent Programming in Haskell, and Don Stewart's Numeric Haskell: A Repa Tutorial. The former gives a broad tour of parallelism and concurrency techniques in Haskell and the latter focuses on the Repa parallel array library. Both are very concrete and focus on real working code. Give them a try!

News

  • Haskell in the Economist! The Economist article Parallel Bars discusses the rise of multicore computing, and the essential obstacle that programs have to be specially written with parallelism in mind. The article gives a tour of some problems (overhead and dependencies between subtasks, programmers being trained for sequential programming, debugging parallel programs) and possible solutions, among which is functional programming:

    Meanwhile, a group of obscure programming languages used in academia seems to be making slow but steady progress, crunching large amounts of data in industrial applications and behind the scenes at large websites. Two examples are Erlang and Haskell, both of which are “functional programming” languages.

    Are we doomed to succeed?

Word of the Month

The word of the month (well, phrase really!) for this digest is parallel arrays. Parallel array manipulation fits nicely into Haskell's arsenal of parallel processing techniques. In fact, you might have seen the Repa library, as mentioned above, in the news a while back. And now there's a new tutorial on the Haskell Wiki.
So what's all the fuss?

Parallel arrays manipulation is a particularly nice way of writing parallel programs: it's pure and it has the potential to scale very well to large numbers of CPUs. The main limitation is that not all programs can be written in this style. Parallel arrays are a way of doing data parallelism (as opposed to control parallelism).

Repa (REgular Parallel Arrays) is a Haskell library for high performance, regular, multi-dimensional parallel arrays. All numeric data is stored unboxed and functions written with the Repa combinators are automatically parallel. This means that you can write simple array code and get parallelism for free.

As an example, Repa has been used for real time edge detection using a Canny edge algorithm, which has resulted in performance comparable to the standard OpenCV library, an industry standard library of computer vision algorithms written in C and C++ (a single threaded Haskell implementation is about 4 times slower than the OpenCV implementation, but is on par when using 8 threads on large images).

Repa performing Canny edge detection

Repa performing Canny edge detection

While the Canny algorithm written with Repa doesn't quite match the speed of its procedural counterparts, it benefits from all of Haskell's built in support for purity and type safety, and painlessly scales to multiple cores. You can find more details on Ben Lippmeier's blog.

Parallel GHC project news

The Parallel GHC Project is an MSR-funded project to promote the real-world use of parallel Haskell. Part of this project involves effort by Well-Typed to provide tools for use by the general community.

Last week, Well-Typed were excited to announce that we have capacity to support another partner for the parallel project for at least another 12 months. So, if you have a project that involves scaling up to multiple cores, or that deals with some heavy duty concurrency, then we'd love to hear from you. In return for your time and commitment to the parallel project, we'll unleash our team of expert programmers to help build tools and provide support that will help you and the community towards making parallel Haskell even more accessible.

For more information on the Parallel GHC project, on the Parallel GHC Project wiki page

Blogs, papers and packages

  • Parallel and Concurrent Programming in Haskell (19 May)

    Simon Marlow released version 1.1 of his tutorial introducing the main programming models available for concurrent and parallel programming in Haskell. "This tutorial takes a deliberately practical approach: most of the examples are real Haskell programs that you can compile, run, measure, modify and experiment with". The tutorial covers a lot of ground, including use of the new Par monad, so if you're still not sure where to start, this might be the place.

  • Explicit speculative parallelism for Haskell's Par monad

    Tomas Petricek builds off the Par monad to provide a way to write speculative computations in Haskell. If there are two different ways to compute something and each of the ways is faster for different kinds of input (but we don't know exactly which), one useful trick is to run both ways in parallel and return the result of the one that finishes first. Tomas' speculative Par monad lets you do just that, starting computations in parallel and cancelling unwanted ones. You can check his code out on GitHub

  • Etage-Graph, data-flow based graph algorithms (3 Apr)

    Mitar announced the Etage-Graph package, which implements graph algorithms on top of his data-flow framework. He invites us to have a look and see how one might implement known control-flow algorithms in a data-flow manner, which has the appeal of being easy to parallelise.

  • stm-chans (4 Apr)

    The stm-chans library offers a collection of channel types, similar to Control.Concurrent.STM.TChan but with additional features. It offers bounded, closable, and bounded closable FIFO channels, along with a partial TChan compatibility layer.

  • timeplot 0.3.0, Eugene Kirpichov (30 Apr)

    Eugene Kirpichov announced the release of timeplot 0.3.0, a tool to help you visualize arbitrary time-dependent data, e.g. log files, into big picture visualisations. Eugene also linked to a presentation with plenty of graphical examples for the tool and its sister "splot", mostly on cluster computing use cases. The tools follow the basic philosophy of depending neither on the log format (the expected workflow is like cat log | awk foo | plot), nor on the type of data (Eugene wants to visualise arbitrary signals).

    You can also find more information in Eugene's presentation.

Mailing list discussions

  • Weird multi-threading runtime behaviour of single-threaded program with GHC-7.0.3 (1 Apr) Herbert Valerio Riedel has a program which parses a roughly 8 MiB JSON file (with the aeson library) without any use of parallelism or concurrency constructs: just a straightforward, demanding, single-threaded computation. He compiled the program with -threaded and ran with +RTS -N12 on a 12 core machine. By rights, only one core should be doing work but when he looks at the output of top it seems like all of them consume a substantial amount of CPU cycle. What's up with the remaining 11?

  • A parallelization problem (25 Apr)

    Wren Ng Thornton has some loopy code that he would like to parallelise. He was hoping to use pure parallelism (par and friends) instead of explicit concurrency because he's concerned that "lightweight as Haskell's threads are [...] the overhead would swamp the benefits." Wren's code has an outer and inner loop fleshing out a map of maps. Wren wants to speed up his inner loop, which he thinks should be possible because the keys in the inner map are independent of each other.

  • Multi agent system (2 Apr)

    Yves Parès wants to build a simple game with multiple agents that communicate with each other. He asked if it was reasonable to launch one Haskell thread per agent and have them communicate over Chans (he will have something like 200 agents at a time). Felipe Lessa responded that hundreds of agents should be no problem but hundreds of thousands of agents might not work.

  • Using DPH (12 Apr)

    When experimenting with Repa and DPH, Wilfried Kirschenmann encountered strange compiler errors when parallelising his code; in particular, a strange pattern match failure in a do expression. Wilfried supplied a small demonstrator computing the norm of a vector. Ben Lippmeier responded with help and also a reminder that Repa and DPH are separate projects (the latter still being in the "research prototype" stage). Ben's version with more tweaks from Wilfried was 15x faster than his original (non-parallel?) version but still 15x slower than the C version. Parallelising the Repa sum and fold functions should help.

  • Service Component Architecture and Distributed Haskell (13 Apr)

    James Keats noticed the recent talk about distributed Haskell and wished to see "interest from an experienced member of the Haskell community in implementing Haskell components with SCA/Tuscany", as this would help him to use Haskell in the same project with Java and Scala.

  • parallel-haskell mailing list (14 Apr)

    Eric Kow announced the parallel-haskell mailing list and provided instructions for subscribing to the list. Whereas the Haskell Cafe list and Stack Overflow are the best places to ask your parallelism and concurrency questions, the parallel-haskell list is a great place to follow and discuss the research.

  • Questioning seq (14 Apr)

    Among other questions about seq, Andrew Coppin asked how pseq was different from it. Austin Seipp responded that while pseq and seq are semantically equivalent, (A) pseq is only strict in its first argument whereas seq is strict in both and (B) pseq guarantees the order of evaluation so that pseq a b means a will be strictly evaluated before b whereas for the purpose of potential optimisations seq does not. Albert Y. C. Lai provided a short example showing that "there are non-unique evaluation orders to fulfill the mere strictness promise of seq".

  • select(2) or poll(2)-like function? (17 Apr)

    Matthias Kilian wanted to know if the Haskell standard libraries had anything like a select or poll like function. Answering the immediate question, Don Stewart pointed to Control.Concurrent.threadWaitRead. Meanwhile, the thread opened into a long discussion on using high-level Haskell concurrency constructs vs. low-level polling.

    For background, select (along with variants like poll, epoll, kqueue, etc) is a system-call which is often used for asynchronous IO, concurrently doing some processing work while some IO operation is underway (Matthias' use case is "networking stuff listening on v4 and v6 sockets at the same time"). A typical way of using "select" or its cousins would be to implement an event-driven programming model with a "select loop": sleeping until some condition occurs on a file descriptor and, when woken, executing the appropriate bit of code depending on the file descriptor and the condition.

    Some programmers opt for this approach because they believe using processes and OS threads would be too inefficient. Ertugrul Soeylemez and others argue that this is unnecessary in Haskell because Haskell threads are so lightweight (epoll under the hood); we can just stick with high-level concurrency constructs and trust the compiler and RTS to be smarter than us.

    On the other hand, Mike Meyer is sceptical about the scalability and robustness of this solution. The sort of code Mike wants to write is for systems "that run 7x24 for weeks or even months on end [where] [h]ardware hiccups, network failures, bogus input, hung clients, etc. are all just facts of life." Using select loops in his experience is not about efficiency, but avoiding the problems that arise from using high-level concurrency constructs in other languages. He favours to achieve concurrency with a "more robust and understandable" approach using separate Unix processes and an event-driven programming model. So far the only high-level approach Mike has found that provides for the sort of scalability is Eiffel's SCOOP

    How does Haskell stack up? And what about Haskell for the Cloud?

  • Killing threads in foreign calls. (17 Apr)

    Jason Dusek is building an application using Postgres for storage (with the help of the libpq binding on Hackage). He would like to kill off threads for queries that take too long to run, but trying Control.Concurrent.killThread seems not to work. This turns out to be a fairly non-trivial problem: most C code is not written to gracefully handle things like its underlying OS thread being killed off. The solution is to use whatever graceful cancellation mechanism is supplied by the foreign library; the PGCancel procedure in Jason's case.

  • Painless parallelization. (19 Apr)

    Grigory Sarnitskiy believes that parallel programming is actually easier than sequential programming because it allows one "to avoid sophisticated algorithms that were developed to gain performance on sequential architecture". So what are his options (preferably "release state" for writing pure functional parallel code with a Haskellish level of abstraction? Mats Rauhala pointed Grigory to the use of par and pseq, and the RWH chapter on parallelism and concurrency. Don Stewart followed up with a list of options:

    • the "parallel" package Repa (parallel arrays)
    • DPH (experimental)
    • using concurrency

    Grigory's question also prompted Eric Kow to prioritise the Parallel Haskell portal. Hopefully it will give us a standard resource to point new parallel Haskellers like Grigory to.

Stack Overflow

Help and feedback

Got something for the digest? Send us an email at . We're particularly interested in short parallelism/concurrency puzzles, and cool projects for featured code. Other comments and feedback (criticism and corrections especially!) would be welcome too.


Parallel GHC project: new opportunity for an organisation to participate

Wed, 08 Jun 2011 18:55:48 GMT, by duncan.
Filed under parallel, well-typed.

GHC HQ and Well-Typed are pleased to announce a new opportunity for an organisation to take part in the Parallel GHC Project.

The project started in November 2010 with four industrial partners, and consulting and engineering support from Well-Typed. Each organisation is working on its own particular project making use of parallel Haskell. The overall goal is to demonstrate successful use of parallel Haskell and along the way to apply engineering effort to any problems with the tools that the partner organisations might run into.

We have capacity to support another partner organisation for the remaining duration of the project (at least another 12 months). Organisations do not need to contribute financially but should be prepared to make a significant commitment of their own time. Familiarity with Haskell would be helpful, but Haskell expertise is not needed. Partner organisations' choice of projects is similarly open-ended and could be based on anything from pre-existing code bases to green field endeavours.

We would welcome organisations interested in pure parallelism, concurrency and/or distributed Haskell. Presently, two of our partner organisations are using mainly pure parallelism and two are using concurrency. What would be especially interesting for us is to diversify this mix further by working with an organisation interested in making use of of distributed Haskell, in particular the work highlighted in the recent paper "Haskell for the Cloud" (pdf).

To help give an idea what participating in the Parallel GHC Project is like, here is what some of what our current partner organisations have to say:

The Parallel GHC Project has enabled us to make steady progress towards our goals. Well-typed has provided support in the form of best practice recommendations, general engagement with the project, and directly coding up key components.

I have been getting lots of help from Well-Typed, and enjoy our weekly meetings.

― Finlay Thompson, Dragonfly

My organization is now trying to implement highly concurrent Web servers. After GHC 7 was released we faced several critical bugs in the new IO manager and one guy at Well-Typed kindly fixed all the bugs. This has been a big benefit for our organization.

Another benefit is feedback/suggestions from Well-Typed. Well-Typed and our organization have meetings every other week and we report progress to each other. During the discussions, we can set the right direction to go in.

― Kazu Yamamoto, IIJ Innovation Institute Inc.

Well-Typed is coordinating the project, working directly with the participating organisations and the Simons at GHC HQ. If you think your organisation may be interested then get in touch with me, Duncan Coutts, via info@well-typed.com.


Previous entries

Copyright © 2008-2011, Well-Typed LLP.