# Fixing foldl

## Duncan Coutts – Tuesday, 01 April 2014

community

The `foldl` function is broken. Everyone knows it’s broken. It’s been broken for nearly a quarter of a century. We should finally fix it!

Today I am proposing that `Prelude.foldl` be redefined using the implementation currently known as `Data.List.foldl'`.

## foldl is broken!

I’m sure you knew that already, but just in case…

Have you ever noticed that Haskellers usually recommend using either `foldr` or `foldl'` but not `foldl`? For example Real World Haskell has this to say:

Due to the thunking behaviour of `foldl`, it is wise to avoid this function in real programs: even if it doesn’t fail outright, it will be unnecessarily inefficient. Instead, import `Data.List` and use `foldl'`.

In the online version of the book the first user comments on that paragraph are

Why isn’t `Data.List foldl` implementation placed in Prelude?

I second the question: Why isn’t `foldl'` the default?

Good question.

Ok, so obviously we’re talking about the difference between `foldl` and `foldl'`:

``````foldl :: (b -> a -> b) -> b -> [a] -> b
foldl f a []     = a
foldl f a (x:xs) = foldl f (f a x) xs

foldl' :: (b -> a -> b) -> b -> [a] -> b
foldl' f a []     = a
foldl' f a (x:xs) = let a' = f a x in a' `seq` foldl' f a' xs``````

The dry technical difference is that `foldl'` evaluates the call to `f` before making the next recursive call. The consequences of that are perhaps not immediately obvious, so we’ll take a step back and look at a slightly bigger picture.

## Folding left and right

When we first learn Haskell we learn that there are two ways to fold a list, from the left or right.

``````foldl f z [x1, x2, ..., xn] = (...((z `f` x1) `f` x2) `f`...) `f` xn

foldr f z [x1, x2, ..., xn] = x1 `f` (x2 `f` ... (xn `f` z)...)``````

Saying “from the left” or “from the right” is a description of what `foldl` and `foldr` calculate, with the parenthesis nesting to the left or to the right. At runtime of course we always have to start from the left (front) of the list.

We later learn other ways of thinking about left and right folds, that a left fold can be used like a classic loop where we go through the whole list eagerly, while a right fold can be used like a demand-driven iterator. For the left fold that means using a function that is strict in its first argument (like `(+)`) while for the right fold that means using a function that is not strict in its second argument (like `(:)`).

Indeed when looking at whether we want `foldl` or `foldr` in any particular case our choice is usually governed by whether we want “all at once” behaviour (`foldl`) or if we want incremental or short-cut behaviour (`foldr`).

## Accumulating thunks

Again, as we are learning Haskell, we get told that `foldl` has this crazy behaviour

``````foldl (+) 0 (1:2:3:[])
=  foldl (+) (0 + 1)             (2:3:[])
=  foldl (+) ((0 + 1) + 2)       (3:[])
=  foldl (+) (((0 + 1) + 2) + 3) []
=            (((0 + 1) + 2) + 3)``````

when what we had in mind when we thought of an accumulating loop was

``````foldl' (+) 0 (1:2:3:[])
=  foldl' (+) 1 (2:3:[])
=  foldl' (+) 3 (3:[])
=  foldl' (+) 6 []
=             6``````

Of course that’s just what `foldl'` does, it evaluates the call to `+` before making the next recursive call.

## When is `foldl` (rather than `foldl'`) useful?

The short answer is “almost never”.

As beginners we often assume that `foldl` must still make sense in some cases (or why have both?) but it turns out that’s not really the case.

When the `f` argument of `foldl` is a strict function then delaying the evaluation does not gain us anything as it all has to be evaluated at the end anyway. The only time when delaying the evaluation could save us anything is when the `f` function is not strict in its first argument – in which case you either don’t care or probably should have been using `foldr` in the first place.

In fact even if our `f` function is non-strict in its first argument, we probably do not gain anything from delaying evaluation and it usually does no harm to evaluate earlier. Remember that we still have to traverse the whole list, we don’t get any short-cutting behaviour like with `foldr`.

We can, if we think about it, construct examples where `foldl'` would be too strict. We could define `last` and `last'` like this:

``````last  = foldl  (\_ y -> y) (error "empty list")

last' = foldl' (\_ y -> y) (error "empty list")``````

Now if we try

``````> last [1,undefined,3]
3
> last' [1,undefined,3]
*** Exception: Prelude.undefined``````

This is because our accumulator is always the latest element that we’ve seen but we don’t actually want to evaluate the elements (except the last one).

So it’s true that `foldl'` fails in this case, but it’s also a silly definition, the usual definition is a lot clearer

``````last [x]    = x
last (_:xs) = last xs
last []     = error "empty list"``````

That goes for pretty much all the other examples you might be able to think of where `foldl` would work but `foldl'` would not: the examples are either artificial or are clearer written in other ways.

People sometimes point out that `sum` is defined using `foldl` and not `foldl'` and claim that this is something to do with Haskell’s designers wanting to allow `Num` instances where `(+)` might be lazy. This is pretty much nonsense. If that were the case then `sum` would have been defined using `foldr` rather than `foldl` to properly take advantage of short-cutting behaviour. A much simpler explanation is that `foldl'` was not available in early versions of Haskell when `sum` was defined.

In nearly 15 years as a Haskell programmer I think I’ve specifically needed `foldl` rather than `foldl'` about three times. I say “about” because I can only actually remember one. That one case was in a slightly cunning bit of code for doing cache updates in a web server. It would almost certainly have been clearer as a local recursion but I was amused to find a real use case for `foldl` and couldn’t help myself from using it just for fun. Of course it needed a comment to say that I was using it on purpose rather than by mistake!

## So why do we have `foldl` and `foldl'`?

If `foldl` is almost always a mistake (or merely benign) then why do we have it in the first place?

I don’t know for sure, but here’s my guess…

When Haskell 1.0 was published on this day 24 years ago there was no `seq` function at all, so there was no choice but to define `foldl` in the “classic” way.

Eventually, six years later after much discussion, we got the `seq` function in Haskell 1.3. Though actually in Haskell 1.3 `seq` was part of an `Eval` class, so you couldn’t use it just anywhere, such as in `foldl`. In Haskell 1.3 you would have had to define `foldl'` with the type:

``foldl' :: Eval b => (b -> a -> b) -> b -> [a] -> b``

Haskell 1.4 and Haskell 98 got rid of the `Eval` class constraint for `seq` but `foldl` was not changed. Hugs and GHC and other implementations added the non-standard `foldl'`.

I suspect that people then considered it a compatibility and inertia issue. It was easy enough to add a non-standard `foldl'` but you can’t so easily change the standard.

I suspect that if we had had `seq` from the beginning then we would have defined `foldl` using it.

Miranda, one of Haskell’s predecessor languages, already had `seq` 5 years before Haskell 1.0.

## A strict `foldl` in Orwell!

Orwell is an interesting case. Orwell was another Haskell predecessor, very similar to Miranda and early Haskell. An informed source told me that Orwell had defined its `foldl` in the way that we now define `foldl'`, ie with strict evaluation. Information on Orwell is a little hard to get ahold of online these days so I asked Philip Wadler. Phil very kindly fished out the manuals and looked up the definitions for me.

In the original version:

``````An Introduction to Orwell
(DRAFT)
1 April 1985``````

In the standard prelude:

``````lred f a []  =  a
lred f a (x:xs) = lred f (f a x) xs``````

But five years later, by the time Haskell 1.0 is being published…

``````An Introduction to Orwell 6.00
revised by Quentin Miller
Copyright 1990 Oxford University Computing Lab``````

In the standard prelude:

``````foldl :: (a -> b -> a) -> a -> [b] -> a
foldl f a []  =  a
foldl f a (x:xs)  =  strict (foldl f) (f a x) xs``````

Note the use of `strict`. Presumably Orwell’s `strict` function was defined as (or equivalent to)

``````strict :: (a -> b) -> a -> b
strict f x = x `seq` f x``````

(These days in Haskell we call this function `(\$!)`.)

So my source was right, Orwell did change `foldl` to be the strict version!

I contend that this was and is the right decision, and that it was just a consequence of the late arrival of `seq` in Haskell and inertia and fears about backwards compatibility that have kept us from fixing `foldl`.

## Just do it!

It’d help all of us who are sometimes tempted to use `foldl` because we can’t be bothered to import `Data.List`. It’d help confused beginners. It’d save teachers from the embarrassment of having to first explain `foldl` and then why you should never use it.

Orwell fixed this mistake at least 24 years ago, probably before Haskell 1.0 was released. Just because it’s an old mistake doesn’t mean we shouldn’t fix it now!

## A postscript: which `foldl'`?

I hate to complicate a simple story but I should admit that there are two plausible definitions of `foldl'` and I’ve never seen any serious discussion of why we use one rather than the other (I suspect it’s another historical accident).

So the version above is the “standard” version, perhaps more clearly written using bang patterns as

``````foldl' :: (b -> a -> b) -> b -> [a] -> b
foldl' f a []     = a
foldl' f a (x:xs) = let !a' = f a x
in foldl' f a' xs``````

But an equally plausible version is

``````foldl'' :: (b -> a -> b) -> b -> [a] -> b
foldl'' f !a []     = a
foldl'' f !a (x:xs) = foldl'' f (f a x) xs``````

The difference is this: in the first version we evaluate the new accumulating parameter before making the recursive call, while in the second version we ensure that the accumulating parameter is always evaluated (to WHNF) on entry into each call.

These two ways of forcing the evaluation have almost the same effect. It takes a moment or two to find a case where it makes a difference, but here’s one:

``````foldl'  (\_ y -> y) undefined  = 1
foldl'' (\_ y -> y) undefined  = undefined``````

The standard `foldl'` ensures that all the new accumulating parameter values are evaluated, but still allows the initial value to be unevaluated. The alternative version simply says that the accumulating parameter is always evaluated.

The second version is attractive from a code generation point of view. One of the clever things that GHC can do with `foldl'` (and strict function arguments generally) is to unbox the values, so for example an `Int` can be unboxed to a `Int#` and passed in registers rather than on the heap. With the standard `foldl'` it needs a special case for the first iteration of the loop where the initial value of the accumulating parameter which might not be evaluated yet. With `foldl''`, that’s not a problem, we can unbox things right from the start. In practice, GHC can often see that the initial value is evaluated anyway, but not always.

(Don Stewart and I noticed this a few years back when we were working on stream fusion for lists. We had defined `foldl'` on streams in a way that corresponded to the second form above and then got a test failure when doing strictness testing comparing against the standard list functions.)

So if we’re going to fix `foldl` to be the strict version, then perhaps it should be the fully strict version, not just the “strict after the first iteration” version.