Fixing foldl

Tue, 01 Apr 2014 10:38:20 GMT, by duncan.
Filed under community.

The foldl function is broken. Everyone knows it’s broken. It’s been broken for nearly a quarter of a century. We should finally fix it!

Today I am proposing that Prelude.foldl be redefined using the implementation currently known as Data.List.foldl'.

foldl is broken!

I’m sure you knew that already, but just in case…

Have you ever noticed that Haskellers usually recommend using either foldr or foldl' but not foldl? For example Real World Haskell has this to say:

Due to the thunking behaviour of foldl, it is wise to avoid this function in real programs: even if it doesn’t fail outright, it will be unnecessarily inefficient. Instead, import Data.List and use foldl'.

In the online version of the book the first user comments on that paragraph are

Why isn’t Data.List foldl implementation placed in Prelude?

I second the question: Why isn’t foldl' the default?

Good question.

Ok, so obviously we’re talking about the difference between foldl and foldl':

foldl :: (b -> a -> b) -> b -> [a] -> b
foldl f a []     = a
foldl f a (x:xs) = foldl f (f a x) xs

foldl' :: (b -> a -> b) -> b -> [a] -> b
foldl' f a []     = a
foldl' f a (x:xs) = let a' = f a x in a' `seq` foldl' f a' xs

The dry technical difference is that foldl' evaluates the call to f before making the next recursive call. The consequences of that are perhaps not immediately obvious, so we’ll take a step back and look at a slightly bigger picture.

Folding left and right

When we first learn Haskell we learn that there are two ways to fold a list, from the left or right.

foldl f z [x1, x2, ..., xn] = (...((z `f` x1) `f` x2) `f`...) `f` xn

foldr f z [x1, x2, ..., xn] = x1 `f` (x2 `f` ... (xn `f` z)...)

Saying “from the left” or “from the right” is a description of what foldl and foldr calculate, with the parenthesis nesting to the left or to the right. At runtime of course we always have to start from the left (front) of the list.

We later learn other ways of thinking about left and right folds, that a left fold can be used like a classic loop where we go through the whole list eagerly, while a right fold can be used like a demand-driven iterator. For the left fold that means using a function that is strict in its first argument (like (+)) while for the right fold that means using a function that is not strict in its second argument (like (:)).

Indeed when looking at whether we want foldl or foldr in any particular case our choice is usually governed by whether we want “all at once” behaviour (foldl) or if we want incremental or short-cut behaviour (foldr).

Accumulating thunks

Again, as we are learning Haskell, we get told that foldl has this crazy behaviour

foldl (+) 0 (1:2:3:[])
          =  foldl (+) (0 + 1)             (2:3:[])
          =  foldl (+) ((0 + 1) + 2)       (3:[])
          =  foldl (+) (((0 + 1) + 2) + 3) []
          =            (((0 + 1) + 2) + 3)

when what we had in mind when we thought of an accumulating loop was

foldl' (+) 0 (1:2:3:[])
          =  foldl' (+) 1 (2:3:[])
          =  foldl' (+) 3 (3:[])
          =  foldl' (+) 6 []
          =             6

Of course that’s just what foldl' does, it evaluates the call to + before making the next recursive call.

When is foldl (rather than foldl') useful?

The short answer is “almost never”.

As beginners we often assume that foldl must still make sense in some cases (or why have both?) but it turns out that’s not really the case.

When the f argument of foldl is a strict function then delaying the evaluation does not gain us anything as it all has to be evaluated at the end anyway. The only time when delaying the evaluation could save us anything is when the f function is not strict in its first argument – in which case you either don’t care or probably should have been using foldr in the first place.

In fact even if our f function is non-strict in its first argument, we probably do not gain anything from delaying evaluation and it usually does no harm to evaluate earlier. Remember that we still have to traverse the whole list, we don’t get any short-cutting behaviour like with foldr.

We can, if we think about it, construct examples where foldl' would be too strict. We could define last and last' like this:

last  = foldl  (\_ y -> y) (error "empty list")

last' = foldl' (\_ y -> y) (error "empty list")

Now if we try

> last [1,undefined,3]
3
> last' [1,undefined,3]
*** Exception: Prelude.undefined

This is because our accumulator is always the latest element that we’ve seen but we don’t actually want to evaluate the elements (except the last one).

So it’s true that foldl' fails in this case, but it’s also a silly definition, the usual definition is a lot clearer

last [x]    = x
last (_:xs) = last xs
last []     = error "empty list"

That goes for pretty much all the other examples you might be able to think of where foldl would work but foldl' would not: the examples are either artificial or are clearer written in other ways.

People sometimes point out that sum is defined using foldl and not foldl' and claim that this is something to do with Haskell’s designers wanting to allow Num instances where (+) might be lazy. This is pretty much nonsense. If that were the case then sum would have been defined using foldr rather than foldl to properly take advantage of short-cutting behaviour. A much simpler explanation is that foldl' was not available in early versions of Haskell when sum was defined.

In nearly 15 years as a Haskell programmer I think I’ve specifically needed foldl rather than foldl' about three times. I say “about” because I can only actually remember one. That one case was in a slightly cunning bit of code for doing cache updates in a web server. It would almost certainly have been clearer as a local recursion but I was amused to find a real use case for foldl and couldn’t help myself from using it just for fun. Of course it needed a comment to say that I was using it on purpose rather than by mistake!

So why do we have foldl and foldl'?

If foldl is almost always a mistake (or merely benign) then why do we have it in the first place?

I don’t know for sure, but here’s my guess…

When Haskell 1.0 was published on this day 24 years ago there was no seq function at all, so there was no choice but to define foldl in the “classic” way.

Eventually, six years later after much discussion, we got the seq function in Haskell 1.3. Though actually in Haskell 1.3 seq was part of an Eval class, so you couldn’t use it just anywhere, such as in foldl. In Haskell 1.3 you would have had to define foldl' with the type:

foldl' :: Eval b => (b -> a -> b) -> b -> [a] -> b

Haskell 1.4 and Haskell 98 got rid of the Eval class constraint for seq but foldl was not changed. Hugs and GHC and other implementations added the non-standard foldl'.

I suspect that people then considered it a compatibility and inertia issue. It was easy enough to add a non-standard foldl' but you can’t so easily change the standard.

I suspect that if we had had seq from the beginning then we would have defined foldl using it.

Miranda, one of Haskell’s predecessor languages, already had seq 5 years before Haskell 1.0.

A strict foldl in Orwell!

Orwell is an interesting case. Orwell was another Haskell predecessor, very similar to Miranda and early Haskell. An informed source told me that Orwell had defined its foldl in the way that we now define foldl', ie with strict evaluation. Information on Orwell is a little hard to get ahold of online these days so I asked Philip Wadler. Phil very kindly fished out the manuals and looked up the definitions for me.

In the original version:

An Introduction to Orwell
(DRAFT)
Philip Wadler
1 April 1985

In the standard prelude:

lred f a []  =  a
lred f a (x:xs) = lred f (f a x) xs

But five years later, by the time Haskell 1.0 is being published…

An Introduction to Orwell 6.00
by Philip Wadler
revised by Quentin Miller
Copyright 1990 Oxford University Computing Lab

In the standard prelude:

foldl :: (a -> b -> a) -> a -> [b] -> a
foldl f a []  =  a
foldl f a (x:xs)  =  strict (foldl f) (f a x) xs

Note the use of strict. Presumably Orwell’s strict function was defined as (or equivalent to)

strict :: (a -> b) -> a -> b
strict f x = x `seq` f x

(These days in Haskell we call this function ($!).)

So my source was right, Orwell did change foldl to be the strict version!

I contend that this was and is the right decision, and that it was just a consequence of the late arrival of seq in Haskell and inertia and fears about backwards compatibility that have kept us from fixing foldl.

Just do it!

It’d help all of us who are sometimes tempted to use foldl because we can’t be bothered to import Data.List. It’d help confused beginners. It’d save teachers from the embarrassment of having to first explain foldl and then why you should never use it.

Orwell fixed this mistake at least 24 years ago, probably before Haskell 1.0 was released. Just because it’s an old mistake doesn’t mean we shouldn’t fix it now!

A postscript: which foldl'?

I hate to complicate a simple story but I should admit that there are two plausible definitions of foldl' and I’ve never seen any serious discussion of why we use one rather than the other (I suspect it’s another historical accident).

So the version above is the “standard” version, perhaps more clearly written using bang patterns as

foldl' :: (b -> a -> b) -> b -> [a] -> b
foldl' f a []     = a
foldl' f a (x:xs) = let !a' = f a x
                     in foldl' f a' xs

But an equally plausible version is

foldl'' :: (b -> a -> b) -> b -> [a] -> b
foldl'' f !a []     = a
foldl'' f !a (x:xs) = foldl'' f (f a x) xs

The difference is this: in the first version we evaluate the new accumulating parameter before making the recursive call, while in the second version we ensure that the accumulating parameter is always evaluated (to WHNF) on entry into each call.

These two ways of forcing the evaluation have almost the same effect. It takes a moment or two to find a case where it makes a difference, but here’s one:

foldl'  (\_ y -> y) undefined [1] = 1
foldl'' (\_ y -> y) undefined [1] = undefined

The standard foldl' ensures that all the new accumulating parameter values are evaluated, but still allows the initial value to be unevaluated. The alternative version simply says that the accumulating parameter is always evaluated.

The second version is attractive from a code generation point of view. One of the clever things that GHC can do with foldl' (and strict function arguments generally) is to unbox the values, so for example an Int can be unboxed to a Int# and passed in registers rather than on the heap. With the standard foldl' it needs a special case for the first iteration of the loop where the initial value of the accumulating parameter which might not be evaluated yet. With foldl'', that’s not a problem, we can unbox things right from the start. In practice, GHC can often see that the initial value is evaluated anyway, but not always.

(Don Stewart and I noticed this a few years back when we were working on stream fusion for lists. We had defined foldl' on streams in a way that corresponded to the second form above and then got a test failure when doing strictness testing comparing against the standard list functions.)

So if we’re going to fix foldl to be the strict version, then perhaps it should be the fully strict version, not just the “strict after the first iteration” version.


Fun and Profit with Strongly-Typed Data Schemas

Tue, 01 Apr 2014 07:24:01 GMT, by adam.
Filed under community, training, well-typed.

Over the past few months, Duncan and I have been working with Chris Dornan and Alfredo Di Napoli on api-tools, a DSL for specifying data schemas for REST-like APIs in Haskell. If you’re interested in the real-world use of Haskell, static types and DSLs, why not come along to hear Chris talk about it?

Wednesday 9th April, 6:30pm, London

Find out more and register for free over at Skills Matter:

Typical business apps store structured data, transform it and send it hither and thither. They are typically made of multiple components that have to agree on the schema of the data they exchange. So a large part of what it means to be “flexible” is for it to be easy to modify and extend the schema of the data that the system handles.

Strong typing can help with this, ensuring that the code that accesses the data is consistent with the schema. One idea that has been tried with databases is to generate Haskell types from the database schema, or generate the database schema from the Haskell types, or both from some standalone schema.

In this talk we will describe how we have applied this same idea but for REST APIs. We have a DSL that we use to specify the schema of the data available via a REST API. With a machine description of the schema we can generate the code and documentation for many parts of the system, and of course those parts are then easily adaptable when the schema changes. We generate Haskell types, JSON conversion functions, REST API documentation, disk persistence and more. We also have a story for managing schema changes and migrating data. This system is in use at Iris Connect and is now available on Hackage.

This talk will also discuss further applications we have found for these tools and some of the experience from the development project at Iris Connect, including problems we have had to solve building the Haskell components of the system and the wider challenge of integrating it into the overall development project.

And if that isn’t enough for you, Well-Typed’s Haskell courses are back at the end of April, with an all-new course on advanced features of the Haskell type system. Stay tuned for more events coming soon…


(New) Haskell Courses in London, April/May 2014

Wed, 19 Mar 2014 13:51:39 GMT, by andres.
Filed under well-typed, training.

On May 2, I’m going to teach an all new course at Skills Matter in London:

Well-Typed’s Guide to the Haskell Type System

May 2, 2014 in London (1 day)
£595 + VAT

This course is going to cover advanced type system features of Haskell and GHC in detail. It is aimed at developers who are fascinated by Haskell and the power of its type system, and want to know what’s possible and learn how to get even more out of it.

Among other topics, we’re going to cover

The course includes exercises, and we’re going to look at actual code on Hackage and see how these advanced features of the type system are being used in practice.

You don’t need to be a Haskell expert in order to participate, but familiarity with the basics will be assumed.

The course is open for registration via the Skills Matter site.

Regular courses

In the same week, we’re also offering our regular two-day introductory and advanced Haskell courses again:

Well-Typed’s Fast Track to Haskell

April 28–29 in London (2 days)
£1195 + VAT

This highly practical course teaches you the basics of Haskell. Functions and datatypes, pattern matching, higher-order functions, input and output, and (of course) monads.

Well-Typed’s Advanced Haskell

April 30–May 1 in London (2 days)
£1195 + VAT

In this hands-on course, you’ll learn about data structures, how data is represented in memory, reasoning about lazy evaluation and performance, parallelism and concurrency, and design patterns such as monads, applicative functors, and monad transformers.

Questions?

If you have any questions about which course might be the right one for you, or if you’d be interested in a course that isn’t listed here, please check our training page for further information or email us.


GHC 7.8.1 RC2

Mon, 03 Mar 2014 18:24:40 GMT, by austin.
Filed under ghc, community.

Today, GHC 7.8.1 RC2 has been released.

We've closed approximately 45 issues people filed against RC1. Thanks to everyone who has helped out!

There are actually quite a few nice improvements since then, and some stuff we'd like to announce.

iOS cross compiler binaries

This release features iOS cross compiler binaries, courtesy of Luke Iannini. You can download either ghc-7.8.0.20140228-arm-apple-ios or ghc-7.8.0.20140228-i386-apple-ios for the iOS ARM compiler or iOS Simulator, respectively.

To properly use the binary distribution, be sure to carefully read the README! Hopefully this process can be made less tedious in the future.

LZMA support for binary distributions

In my experience, lots of people use the binary distributions of GHC for their provided platform (to set up test environments, or because the version for their OS is out of date), and a smaller download for users always helps.

We're now offering LZMA (.tar.xz) compressed binary distributions. On average, these are half the size of their bzip2 counterparts – in some cases even more. The iOS ARM build, for example, is 125M for LZMA vs 267M for bzip2 – a ratio of greater than 50%.

One question now is: with these savings, should we keep supporting .tar.bzip2 distributions? GHC bindists are rather large – xz offers a lot of savings, and is available for every major platform, although it's not typically standard like bzip. But perhaps it's not too much of a burden. Please let us know what you think!

(In the future, this will also save considerable space with nightly build infrastructure as well!)

New PPC64 machine

GHC HQ now has a shiny, modern 64-bit POWER7 machine available, courtesy of Gustavo Luiz Duarte. Gustavo is a Fedora PPC64 maintainer, and approached me earlier this year and said he could give us access to a build machine. And now we've got a really nice one! And 7.8.1 works well on it, including dynamic GHCi and shared libraries.

This box is running Fedora 20, has 8 hardware threads split across two cores (one per socket) in a NUMA configuration, and a nice amount of RAM.

Many thanks to Gustavo for letting us use this – and also to OSUOSL, who is hosting it in their data center for us! Hopefully we'll be able to keep supporting this platform (we get a surprising amount of PPC/PPC64 related tickets!)

i386/amd64 Ubuntu PPA

Earlier this year, Herbert Valerio Riedel took the time to set up Ubuntu PPAs for Travis-CI, making it easy to test your Haskell packages using multiple GHC versions – from GHC 6.12 to 7.8 and HEAD.

This PPA repository now contains 7.8.1 RC2 binaries, available for download for both i386 and amd64 on Ubuntu 12.04 Precise. You should immediately be able to take advantage of these builds in your .travis.yml to test your packages. Furthermore, the binaries are separated from other versions and can coexist with existing packages.

To get started, add the PPA and install the 7.8.1 build:

$ sudo add-apt-repository -y ppa:hvr/ghc
$ sudo apt-get update
$ sudo apt-get install ghc-7.8.1
$ export PATH=/opt/ghc/7.8.1/bin:$PATH
$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 7.8.0.20140228

This PPA is also integrated into Herbert's multi-ghc-travis project that I mentioned earlier. Just follow the instructions in the README.md, and be sure to enable:

GHCVER=7.8.1

inside your .travis.yml. You'll then automatically get build reports with RC2, and any future updates we push to the PPA (including the final version of 7.8.1).

For example, lens now builds with GHC 7.8.1!

Hopefully we can provide builds for all supported Ubuntu configurations in the future – not just Precise.

deb.haskell.org for your Debian needs

Thanks to Joachim Breitner, our resident Debian Developer, we now have GHC HEAD builds available as Debian packages!

Currently there aren't 7.8.1 builds yet I'm afraid – but there will be a final version of 7.8.1 available upon release. In the mean time, feel free to use these repositories to test out the latest GHC if you're daring, and report issues to us!

And this leads me to...

Rackspace support

Earlier this year, I contacted Rackspace about one of their developer programs. Rackspace is a huge supporter of open source and has some great initiatives to help open source projects. I asked if they could help GHC – we wanted build machines.

Rackspace was extremely kind, and with the help of Jesse Noller, they've provided GHC with a few thousand USD per month in services! This was extremely nice of them, and really blew me away.

Rackspace is what powers http://deb.haskell.org – and not only that. GHC HQ now has machines available solely for testing/build purposes, including many new Windows and Linux instances.

In addition to that, we're using them for haskell.org too – including a FreeBSD backup server (to be revamped soon), and the creation of our new Nagios monitoring infrastructure at monitor.haskell.org.

Right now, we've still got budget available to run more services. This venture is a bit of a joint one between GHC HQ and the current haskell.org administrators, as we're using it for both of our needs at the moment.

What's next

We're hoping RC2 will be the final RC, and 7.8.1 proper will follow soon. In the mean time, please download the RC and file bugs if anything goes wrong!


Performance profiling with ghc-events-analyze

Wed, 12 Feb 2014 20:25:18 GMT, by edsko, duncan.
Filed under coding, well-typed.

ghc-events-analyze is a new simple Haskell profiling tool that uses GHC's eventlog system. It helps with some profiling use cases that are not covered by the existing GHC profiling modes or tools. It has two major features:

It is very useful for profiling code when ghc's normal profiling mode is not available, or when using profiling mode would perturb the code too much. It is also useful when you want time-profiling information with a breakdown over time rather than totals for the whole run.

We developed the tool at Well-Typed while working on client projects where we had profiling needs that were not covered by the existing tools. We are releasing the tool in the hope that it will be useful to others.

Motivating Example

Suppose we want to understand the runtime performance of the following simple multi-threaded application:

import Control.Concurrent (threadDelay)
import Control.Concurrent.Async (async, wait)

-- Intentionally slow fib
fib :: Integer -> Integer
fib 0 = 1
fib 1 = 1
fib n = fib (n - 1) + fib (n - 2)

printFib :: Integer -> IO ()
printFib n = print (fib n)

blips :: IO ()
blips = do
  putStrLn "BLIP"
  threadDelay 5000000
  putStrLn "BLIP"

main :: IO ()
main = do
  a1 <- async $ mapM_ printFib [30, 32 .. 38]
  a2 <- async $ mapM_ printFib [31, 33 .. 39]
  threadDelay 5000000
  a3 <- async $ blips
  mapM_ wait [a1, a2, a3]

We can compile this application and ask it to produce an eventlog:

ghc ex0 -eventlog
./ex0 +RTS -l

But when we open this eventlog in ThreadScope the result is not particularly enlightening:

Example 0 with ThreadScope

The program was compiled without the -threaded flag, forcing all work to run on a single HEC (Haskell Execution Context — roughly, a CPU core). This makes it impossible to see the distribution of workload across the various application threads. This will be true whenever multiple threads are executed by a single HEC (i.e., almost always). Of course ThreadScope is really designed for looking at parallel programs, and that's not what we've got here. Here we're trying to understand the behaviour of a simple concurrent program.

If we run the same eventlog through ghc-events-analyze instead we get

Example 0 with ghc-events-analyze, no instrumentation

Some points to note:

Instrumentation

If we instrument our code, we can improve this diagram in a number of ways. We can use labelThread from GHC.Conc to give our threads names, so that it becomes easier to see what's what.

Moreover, ghc-events-analyze lets us give labels to periods of time during execution which we can then visualise or get statistics on. To label a period of time we use the event tracing functions from Debug.Trace. We mark the start of a period with

traceEventIO "START <eventName>"

and the end with

traceEventIO "STOP <eventName>"

Use traceEventIO if you are in an IO context, while in a pure context you can use traceEvent.

Note that these labelled time periods are completely independent of threads; they can overlap each other, span multiple threads, etc. Here's our example application again, but with some instrumentation added:

import Control.Concurrent (myThreadId, threadDelay)
import Control.Concurrent.Async (Async, async, wait)
import Control.Exception (bracket_)
import Debug.Trace (traceEventIO)
import GHC.Conc (labelThread)

event :: String -> IO a -> IO a
event label =
  bracket_ (traceEventIO $ "START " ++ label)
           (traceEventIO $ "STOP "  ++ label)

async' :: String -> IO a -> IO (Async a)
async' label act = async $ do
  tid <- myThreadId
  labelThread tid label
  act

-- Intentionally slow fib
fib :: Integer -> Integer
fib 0 = 1
fib 1 = 1
fib n = fib (n - 1) + fib (n - 2)

printFib :: Integer -> IO ()
printFib n = event ("fib" ++ show n) $ print (fib n)

blips :: IO ()
blips = do
  putStrLn "BLIP"
  threadDelay 5000000
  putStrLn "BLIP"

main :: IO ()
main = do
  a1 <- async' "evens" $ mapM_ printFib [30, 32 .. 38]
  a2 <- async' "odds"  $ mapM_ printFib [31, 33 .. 39]
  threadDelay 5000000
  a3 <- async' "blips"  $ blips
  mapM_ wait [a1, a2, a3]

Running ghc-events-analyze over the eventlog generated by this code yields

Example 1 with ghc-events-analyze with instrumentation, not threaded

If we run the same code using the threaded runtime (but still on a single core), we get

Example 1 with ghc-events-analyze with instrumentation, one core

and if we run it on two cores

Example 1 with ghc-events-analyze with instrumentation, two cores

We can see that the evens and odds threads are now in fact running in parallel, and that the computation of fib 38 is finished well before the computation of fib 39.

Totals

Bear in mind, however, that ghc-events-analyze divides the total time up into n buckets, so what you can not see from these last two diagrams is that the total time taken is less when running on two cores.

ghc-events-analyze also outputs some totals. For the single core case it tells us

GC               1343672000ns    1.344s

USER EVENTS (user events are corrected for GC)
fib39           24480557000ns   24.481s
fib38           21493145000ns   21.493s
fib37           12702151000ns   12.702s
fib36            7823058000ns    7.823s
fib35            4797324000ns    4.797s
fib34            2966990000ns    2.967s
fib33            1800136000ns    1.800s
fib32            1097888000ns    1.098s
fib31             663900000ns    0.664s
fib30             419270000ns    0.419s
TOTAL           78244419000ns   78.244s

THREAD EVENTS
1                    138000ns    0.000s
IOManager (2)        296000ns    0.000s
3                    106000ns    0.000s
evens (4)       16826523000ns   16.827s
odds (5)        27488818000ns   27.489s
blips (6)             63000ns    0.000s
7                     27000ns    0.000s
TOTAL           44315971000ns   44.316s

and for the two cores case

GC               1171012000ns    1.171s

USER EVENTS (user events are corrected for GC)
fib39           18769541000ns   18.770s
fib38           12009913000ns   12.010s
fib37            7515686000ns    7.516s
fib36            4692912000ns    4.693s
fib35            2852639000ns    2.853s
fib34            1774758000ns    1.775s
fib33            1095500000ns    1.096s
fib32             674125000ns    0.674s
fib31             395699000ns    0.396s
fib30             240785000ns    0.241s
TOTAL           50021558000ns   50.022s

THREAD EVENTS
1                    138000ns    0.000s
IOManager (2)        269000ns    0.000s
3                     88000ns    0.000s
evens (4)       19338615000ns   19.339s
odds (5)        30086294000ns   30.086s
blips (6)             73000ns    0.000s
7                      9000ns    0.000s
TOTAL           49425486000ns   49.425s

For the user-labelled time periods the tool is giving us the wall-clock time between the "START" and "STOP" events, excluding time spent doing GC. If there are multiple start/stop periods for the same label then it gives us the total time. We exclude GC time because GC happens at essentially arbitrary points and it would not be helpful to account the full cost of a GC to one user-labelled time period (which might otherwise be very short indeed).

Some notes for this example:

Real World Application 1

Well-Typed have been developing a server application for a client. The client reported that after certain kinds of requests the server had unexpected spikes in CPU usage. For technical reasons we could not compile the server application in profiling mode, and hence profiling information was not available. Moreover, GHC's normal time profiling would have given us totals across the whole program run (broken down by cost centre), but we needed a breakdown of CPU usage over time. We could however generate an eventlog. Visualizing the eventlog with threadscope yielded

server threadscope

We can make certain educated guesses from this picture: the spikes in activity are probably different requests coming in to the server, and the reported unexpected CPU usage reported by the client might be related to garbage collection (the orange blobs that threadscope shows). However, instrumenting the code (by labelling some threads and labelling time periods that correspond to the server handling different kinds of requests) and then running it through ghc-events-analyze yielded a more informative picture:

server with -I0.3

(ghc-events-analyze's reports are fully customizable through a simple scripting language; many of the diagrams in this blogpost are generated using custom scripts in order to improve readability.) The labelled time periods now clearly show when the server is handing requests of type A and B, and we see corresponding spikes in CPU activity in the server's main thread (with ID 6). Threads 4 and 5 handle communication between the client and server, and we see "blips" at the start and end of each request, as expected.

The garbage collection during the A requests is expected, both because of domain specific knowledge about what type A requests are, but also from the diagram: there are spikes in CPU usage of the server's Main thread. However, garbage collection during the B requests is not expected: again, both from domain specific knowledge about type B requests, but also from the diagram: there is barely any activity in the system at all, so why so much garbage collection?

This lead us to suspect "idle GC". The GHC garbage collector will run in two cases: (1) when required when we're out of memory, and (2) after the whole program has been idle for a bit. The latter is known as idle GC. The point of idle GC is the hope that we might be able to return some memory to the OS. The default is to do a GC 0.3 seconds after the program becomes idle. This means if your program does a tiny bit of work every 0.4 seconds but is otherwise idle then you're going to be paying for a major GC every 0.4 seconds. We can adjust the timeout for when idle GC happens, or even disable it entirely using the +RTS -Ix flag. In our case, running the server with a much longer timeout for idle GC cycles yielded this picture:

server with -I10

Note how we no longer see much garbage collection during B requests; we still get garbage collection during A requests, but that is expected. Moreover, we don't see any garbage collection after the second B request either. We found that this had been due to a new thread (199) that was spawned by the second B request. This thread was running occasionally but because it was the only active thread it determined whether the whole system was idle. It was letting the system go idle just long enough to trigger idle GC, then doing a tiny bit of work, more idle GC etc. These collections are not cheap because they are major collections that scan the whole heap.

The simple thing to do in this situation is to just disable idle GC entirely with +RTS -I0. We decided to keep it because it is still useful to return memory to the system in this case, we just use a much longer timeout.

Real World Application 2

Well-Typed was asked by a client to help improve the performance of an application. At one point during this work we needed to determine what proportion of overall execution time a certain family of functions were taking, and a breakdown between these functions. GHC's normal time profiling was not appropriate for a few reasons:

The overhead added by enabling the eventlog is negligible however. Moreover, we can easily use traceEvent to label the execution of our class-overloaded function for each of its various instances (like we did in the fib example, above). The totals reported by ghc-events-analyze enabled us to easily get the total time taken by this family of functions and the breakdown by instance, and to guide our subsequent improvements (like normal profiling would).

GC    25421789435ns   25.422s
A-X   53959674392ns   53.960s

DETAILED BREAKDOWN
T     10574202528ns   10.574s
F      5776369439ns    5.776s
D      4389066320ns    4.389s
A      3939896208ns    3.940s
K      3135897321ns    3.136s
Q      2706127000ns    2.706s
O      2586121945ns    2.586s
W      2295049375ns    2.295s
C      2198859200ns    2.199s
R      1791326834ns    1.791s
X      1734910406ns    1.735s
V      1727701880ns    1.728s
B      1709562291ns    1.710s
E      1385853161ns    1.386s
P      1383600793ns    1.384s
S      1165241932ns    1.165s
H      1128639979ns    1.129s
M       860537704ns    0.861s
L       810106269ns    0.810s
I       691345817ns    0.691s
U       595493497ns    0.595s
N       499041244ns    0.499s
G       462912372ns    0.463s
J       411810877ns    0.412s

We have replaced the real names from this program with labels AX. By comparing the overall execution time excluding GC (which we get from +RTS -s) we can see that the total of all these calls made up the vast majority of the program execution time. So this tells us that we don't need to worry about optimising the rest of the program and can concentrate on this family of functions. We can also see that the top few most expensive variants of the function account for the majority of that time. Thus we were able to focus our attention on the parts of the code where there was greatest opportunity to make improvements.

In this case the visualization of CPU usage over time does not tell us much extra:

except that our family of functions are indeed busy very consistently after an initial setup (at about 60% of CPU, with the garbage collector running at about 25% CPU).

It is worth noting that when we generated the eventlog for this application we selected only user events and GC events (+RTS -l-agu -RTS). Excluding the thread scheduler events dramatically reduces the size of the eventlog, which in this case would have been too big otherwise. ghc-events-analyze does still work without the thread scheduler events being available, but you do then miss out on the per-thread breakdown of CPU activity. Moreover, since the creation of this diagram is relatively time consuming for large eventlogs, you can ask ghc-events-analyze to omit it if you are interested only the totals and the breakdown.

Availability

ghc-events-analyze is available from Hackage; the source code is available from github.

Patches for bug fixes or feature enhancements are welcome! The code is written to be easily extensible with additional reports or analyses.


GHC 7.8 first release candidate

Mon, 03 Feb 2014 22:30:29 GMT, by austin.
Filed under ghc.

This is the first blog post by our colleague Austin. Austin is one of our GHC engineers, part of the GHC HQ team and carries the heavy burden of being GHC release manager.

Today, GHC 7.8.1 RC1 has been released after many months of work, with a huge number of new features.

There are some great highlights in the new release, including:

See the release notes for the full details.

You can download the sources or pre-built binaries for your platform.

How big has this release been? The other day I did an approximated diffstat between the 7.6 and 7.8 branches:

2784 non-merge changesets
964 files changed, 99952 insertions(+), 116187 deletions(-)

That's just in GHC itself, not including the extra repositories. We've had contributions from over 70 people this release, including 15 new committers in 2013 and 2014. As always, GHC HQ and our many other contributors have yet more stuff planned for the future (Overloaded Record Fields, Applicative Do, further integer-gmp optimizations, and more type system features).

This release includes over a year of hard work by many people, and hopefully it will turn out to be a great release (finally). But please try the RC — it's much easier for us to fix bugs before it's done!

And on that note: we're planning on doing an RC2 soon, as we know there are some current issues for people using platforms like OS X Mavericks, and packages like lens. As we've learned, there's eventually diminishing returns, so we're getting it out now in the hopes we can swat the last few things swiftly with your help.


Overloaded record fields for GHC

Fri, 29 Nov 2013 16:38:21 GMT, by adam.
Filed under ghc.

TL;DR: GHC HEAD (but not GHC 7.8) will soon support OverloadedRecordFields, an extension to permit datatypes to reuse field labels and even turn them into lenses.

Introduction

The Haskell records system is a frequent source of frustration to Haskell programmers working on large projects. On the face of it, it is simple: a datatype such as

data Person = Person { id :: Int, name :: String }

gives rise to top-level projection functions:

id   :: Person -> Int
name :: Person -> String

However, this means that using multiple records with the same field label leads to name clashes. Workarounds are possible, such as declaring one datatype per module (so the module prefix can be used to disambiguate) or adding a per-datatype prefix to all the field labels, but they are somewhat clumsy.

Over the summer before I started working for Well-Typed, I implemented a new GHC extension, OverloadedRecordFields, that allows the same name to be used for multiple fields. This is not a whole new records system for Haskell, and does not include everything one might want (such as anonymous records), but it is a small step forward in a notorious quagmire. Proposals for better systems are welcome, but while it is easy to propose a more powerful design in isolation, integrating it with other extensions and syntax in GHC is another matter!

Unfortunately, the extension will not be ready for GHC 7.8, to allow time for the design to settle and the codebase changes to mature. However, it should land in HEAD soon after the 7.8 release is cut, so the adventurous are encouraged to build GHC and try it out. Feedback from users will let us polish the design before it is finally released in 7.10.

Record projection

The essential point of the Haskell records system is unchanged by this extension: record projections are still functions, except now they are polymorphic in the choice of datatype. Instead of

name :: Person -> String

we have

name :: (r { name :: t }) => r -> t

where r { name :: t } is a constraint meaning that the type r has a field name of type t. The typechecker will automatically solve such constraints, based on the datatypes in scope. This allows module abstraction boundaries to be maintained just as at present. If a module does not export a field, clients of that module cannot use the OverloadedRecordFields machinery to access it anyway.

For example, the following code will be valid:

data Company { name :: String, employees :: [Person] }
companyNames :: Company -> [String]
companyNames c = name c : map name (employees c)

Notice that name can be used at two different record types, and the typechecker will figure out which type is meant, just like any other polymorphic function. Similarly, functions can be polymorphic in the record type:

nameDefined :: (r { name :: [a] }) => r -> Bool
nameDefined = not . null . name

Record update

The traditional Haskell record update syntax is very powerful: an expression like

e { id = 3, name = "Me" }

can update (and potentially change the types of) multiple fields. The OverloadedRecordFields extension does not attempt to generalise this syntax, so a record update expression will always refer to a single datatype, and if the field names do not determine the type uniquely, a type signature may be required. With the Person and Company types as defined above, the previous expression is accepted but the definition

f x = x { name = "Me" }

is ambiguous, so a type signature must be given to f (or a local signature given on x or the update expression).

On the other hand, type-changing update of single fields is possible in some circumstances using lenses, discussed below.

Technical details

The constraint r { name :: t } introduced above is in fact syntactic sugar for a pair of constraints

(Has r "name", t ~ FldTy r "name")

where the Has r n class constraint means that the type r has a field named n, and the FldTy r n type family gives the type of the field. They use type-level strings (with the DataKinds extension) to name fields. The type family is used so that the field type is a function of the data type and field name, which improves type inference.

Unlike normal classes and type families, the instances of Has and FldTy are generated automatically by the compiler, and cannot be given by the user. Moreover, these instances are only in scope if the corresponding record projection is in scope. For example, if the name field of Person is in scope, the extension works as if the following declarations had been given:

instance Has Person "name"
type instance FldTy Person "name" = String

Lenses

An excellent way to deal with nested data structures, and to alleviate the shortcomings of the Haskell record system, is to use one of the many Haskell lens libraries, such as lens, data-lens, fclabels and so on. These pair together a "getter" and a "setter" for a field (or more generally, a substructure of a larger type), allowing them to be composed as a single unit. Many of these libraries use Template Haskell to generate lenses corresponding to record fields. A key design question for OverloadedRecordFields was how to fit neatly into the lenses ecosystem.

The solution is to generalise the type of field functions still further. Instead of the desugared

name :: (Has r "name") => r -> FldTy r "name"

we in fact generate

name :: (Accessor p r "name") => p r (FldTy r "name")

where Accessor is a normal typeclass (no instances are generated automatically). In particular, it has an instance

instance Has r n => Accessor (->) r n

so when p = (->), the new field type specialises to the previous type. Thus a field can still be used as a (record-polymorphic) function. Moreover, each lens library can give its own instance of Accessor, allowing fields to be used directly as lenses of that type. (Unfortunately this doesn't quite work for van Laarhoven lenses, as in the lens library, because they do not have the shape p r (FldTy r n). A wrapper datatype can be used to define a combinator that converts fields into van Laarhoven lenses.)

Concluding remarks

For more information, see the detailed discussion of the design and implementation on the GHC wiki. A prototype implementation, that works in GHC 7.6.3, shows how the classes are defined and gives examples of the instances that will be generated by GHC. Keep an eye out for the extension to land in GHC HEAD later this year, and please try it out and give your feedback!

I'm very grateful to Simon Peyton Jones for acting as mentor for this project and as my guide to the GHC codebase. Edward Kmett threw several wrenches in the works, and the final design owes a great deal to his careful scrutiny and helpful suggestions. Anthony Clayden also gave many helpful comments on the design. My thanks go also to the many denizens of the ghc-devs and glasgow-haskell-users mailing lists who gave feedback at various stages in the development process. This work was supported by the Google Summer of Code programme.


(Another) status update on GHC maintenance

Tue, 08 Oct 2013 19:26:48 GMT, by duncan.
Filed under ghc, community, well-typed.

In the previous status update I described how Well-Typed were going through a process of transition in how we help Simon PJ with GHC development and maintenance. This change was due to our long-time colleague Ian Lynagh leaving for new challenges. People who follow GHC development at all will already be aware of our new people and arrangements, but for everyone else here is a brief update.

As I mentioned previously we decided to split the GHC work between two people part-time. This is because GHC work requires a wide range of specialised skills and it also provides a bit of variety for the people working on GHC.

We have been very pleased to welcome Austin Seipp and Adam Gundry to our team at Well-Typed. We decided to assign Austin and our existing colleague Edsko to the task. Austin has been working on GHC as a volunteer for several years, and has become increasingly involved in the last year or so. He has a great understanding of the various technical and social pieces and processes involved in GHC development, so he's been able to hit the ground running. Edsko's academic background (PhD and post-doc in advanced type-system hackery), plus general Haskell wizardry puts him in a good position to help out with some of the more rarefied parts of the compiler. In addition he has been working on GHC itself quite a lot in the last few months as part of other projects at Well-Typed, so he knows his way around the codebase and development processes.

So far this has all been working out very well. The flow of patches has continued and the preparations for GHC 7.8 have been going well. Simon PJ gave a good report on this at the Haskell Implementors Workshop in Boston a couple weeks ago. (Hopefully the video for that will be available soon.) Austin will also shortly post an update here on our blog about the release process and ETA for GHC 7.8.

In summary, GHC development and maintenance is looking pretty healthy. In particular as Simon noted in his talk at the implementors workshop, we've seen quite a few new people working on features, fixing bugs and helping with the release process. An important aspect of Austin and Edsko's job is to help those people and facilitate their contributions.

Other GHC hackery

I mentioned that Adam Gundry had also joined our team. Though we have not initially assigned Adam to GHC HQ, it is worth noting that he has been getting his hands dirty with GHC hackery. After finishing his PhD write-up this summer he took part in the Google Summer of Code (with Simon PJ as his mentor) to implement Simon's proposal for overloaded record fields. The point is to allow different records to share a single field label. This has been a hot topic for years but the huge range of deign options (and no shortage of opinions within the community) had been a drag on getting anything done. Adam and Simon's project is a great step in the right direction. It is both immediately useful and it should also help the design process by getting some practical feedback.


Hackage 2 now available for beta testing

Mon, 09 Sep 2013 18:13:28 GMT, by duncan.
Filed under industrial-haskell-group, well-typed, community.

Well-Typed and the Industrial Haskell Group (IHG) are very pleased to announce that Hackage 2 is now available for public beta testing. The plan is to do the final switchover in late September, to coincide with ICFP.

http://beta.hackage.haskell.org/

Read on for details of how to help with the public beta, an overview of the new features and what the IHG has been doing to help.

Support from the Industrial Haskell Group

The IHG is a consortium of companies that rely on Haskell. The IHG members have funded the effort to get Hackage 2 up to feature parity and get it ready for the switchover. The IHG funded this effort because while the volunteer effort got us the "first 90%" of the way there (including adding a number of new features) there was still the "last 90%" to do to get it production ready.

The IHG members decided to fund Hackage 2 not just because they are good citizens, but out of enlightened self-interest. Hackage has over 5000 packages written by over 1000 people – including the world's best Haskell developers. This is a massive resource. The IHG members recognise that improvements to the tools and infrastructure that the community uses helps the community to produce more and better code. This is a benefit to everyone in the community – including the commercial users.

The IHG is keen to increase its membership so that more resources can be dedicated to improving the Haskell development platform. If your organisation relies on Haskell in some way then you may want to consider joining. See the IHG website for more details or contact info@industry.haskell.org.

Despite the help of the IHG in getting to this point, Hackage is a community project, and its success depends on the community maintaining and further improving the new server. The code is now on github so it is easier to contribute, and now that the server is live there is more immediate gratification for volunteers contributing fixes and new features.

Public beta

We would like to encourage you to take part in the public beta testing. We need help both from package authors as well as other users of the site.

Please report any problems you find using the issue tracker on the hackage-server github site.

We are mirroring packages from the old server (every 30min) so it is suitable to use as your main hackage server with some caveats: we are allowing package authors to upload (as well as doing the mirroring) so you may find a slightly different set of packages on this server.

If you are a package author then you are welcome to poke about and upload packages. We have imported user accounts from the old server (except for a small number of early adopters of the original server who will need to contact an administrator). Note that before we do the final switchover we will delete everything from the beta instance and do a fresh import from the old hackage server.

Configuring cabal-install

Edit your ~/.cabal/config file. Comment-out the existing remote-repo line near the top of the file and add in a new one like this:

--remote-repo: hackage.haskell.org:http://hackage.haskell.org/packages/archive
remote-repo: beta.hackage.haskell.org:http://beta.hackage.haskell.org/

New features

Though our main priority has been feature parity so that we can switch over, volunteers have contributed several new features, including better package search, a new site theme, improved security, the ability to fix package dependencies after a release, changelogs, and a REST-style interface.

See the beta site for more details on these new features, plus details of other features that are partially implemented or that are in need of improvement.

Contributing to the development

The code is on github and we welcome pull requests.

There are open tickets describing existing bugs and features that we want or that are in need of improvement. Help on any of these would be greatly appreciated.

There is some developer and user documentation on the wiki, including a quick guide to getting your own server instance up and running.

You can ask questions on the cabal-devel mailing list or on IRC in the #hackage channel on freenode.


Status update on GHC maintenance

Sun, 04 Aug 2013 14:45:03 GMT, by duncan.
Filed under ghc, community, well-typed.

As many in the community are aware, our long-time colleague and co-founder Ian Lynagh is leaving for new challenges. We are of course sad to see him go and wish him all the best with his new endeavours. Since he and I founded the company together five years ago he has worked tirelessly to make it a success. It has been a privilege to work with him. It has also been great fun.

Ian has of course played an important role in the Haskell community as one of the pillars of GHC development. It is understandable that people in the community have concerns about what his departure means, so I want to give everyone an update on our plans which I hope will allay concerns.

As a company, Well-Typed is fully committed to GHC and helping Simon PJ with its development and maintenance. We are currently in the process of recruiting. In addition, our colleague Edsko has already started working on GHC and he will continue during the transition period and perhaps also in the longer term.

Our plan is actually to have two people work on GHC, each about half-time. This is partly because GHC work requires a wide range of specialised skills and partly to provide a bit of variety for the people working on GHC. Ian worked primarily on GHC for seven years, but he is a bit superhuman in that respect.

Once we have determined who will be working on GHC we will make further announcements. There will inevitably be a lull in activity as Edsko gets up to speed and the new people come on board, but I am reasonably confident that this will not be too significant.

The first big challenge will be to manage the process of the GHC 7.8 release. The target is to have a release candidate available shortly before ICFP in late September, and then a final release a month or two later.

I also want to reiterate what Simon has been saying recently, that we have to see GHC as being owned by the community and that community involvement in GHC development and maintenance is vital. Well-Typed's roll in GHC maintenance is not to do it on behalf of the community. It is better to think of our role as like that of a full-time maintainer. A maintainer can help volunteers get their patches reviewed and merged. They provide continuity, knowledge of the codebase, can help make decisions, and can organise volunteer efforts and processes like releases.

Ian has of course shaped and defined the role, but it is not fixed in stone. Our new people working on GHC maintenance will have some scope to define the role, in consultation with Simon and the community.

I should also say that the number of volunteers working on GHC has increased noticeably since Simon PJ called on the community to get more involved following Simon Marlow's departure. This is a very promising sign and we will play our part to assist those volunteers and to encourage more people.

So all in all, we agree with Simon that the future of the Glorious Haskell Compiler is indeed glorious!


Previous entries