Dealing with Asynchronous Exceptions during Resource Acquisition

Thu, 28 Aug 2014 15:48:49 GMT, by edsko, duncan.
Filed under coding.

Introduction

Consider the following code: we open a socket, compute with it, and finally close the socket again. The computation happens inside an exception handler (try), so even when an exception happens we still close the socket:

example1 :: (Socket -> IO a) -> IO a 
example1 compute = do -- WRONG
  s <- openSocket 
  r <- try $ compute s
  closeSocket s
  case r of
    Left ex -> throwIO (ex :: SomeException)
    Right a -> return a

Although this code correctly deals with synchronous exceptions–exceptions that are the direct result of the execution of the program–it does not deal correctly with asynchronous exceptions–exceptions that are raised as the result of an external event, such as a signal from another thread. For example, in

example2 :: (Socket -> IO a) -> IO (Maybe a)
example2 compute = timeout someTimeout $ example1 compute

it is possible that the timeout signal arrives after we have opened the socket but before we have installed the exception handler (or indeed, after we leave the scope of the exception handler but before we close the socket). In order to address this we have to control precisely where asynchronous exceptions can and cannot be delivered:

example3 :: (Socket -> IO a) -> IO a
example3 compute =
  mask $ \restore -> do
    s <- openSocket 
    r <- try $ restore $ compute s
    closeSocket s
    case r of
      Left ex -> throwIO (ex :: SomeException)
      Right a -> return a

We mask asynchronous exceptions, and then restore them only inside the scope of the exception handler. This very common pattern is captured by the higher level combinator bracket, and we might rewrite the example as

example4 :: (Socket -> IO a) -> IO a
example4 = bracket openSocket closeSocket

Allowing asynchronous exceptions during resource acquisition

Suppose that we wanted to define a derived operation that opens a socket and performs some kind of handshake with the server on the other end:

openHandshake :: IO Socket
openHandshake = do
  mask $ \restore -> do
    s <- openSocket
    r <- try $ restore $ handshake s
    case r of
      Left  ex -> closeSocket s >> throwIO (ex :: SomeException)
      Right () -> return s 

(These and the other examples can be defined in terms of bracket and similar, but we use mask directly so that it’s easier to see what is happening.) We might use openHandshake as follows:

example5 :: (Socket -> IO a) -> IO a
example5 compute = do
  mask $ \restore -> do
    s <- openHandshake 
    r <- try $ restore $ compute s
    closeSocket s
    case r of
      Left ex -> throwIO (ex :: SomeException)
      Right a -> return a

There are no resource leaks in this code, but there is a different problem: we call openHandshake with asynchronous exceptions masked. Although openHandshake calls restore before doing the handshake, restore restores the masking state to that of the enclosing context. Hence the handshake with the server cannot be timed out. This may not be what we want–we may want to be able to interrupt example5 with a timeout either during the handshake or during the argument computation.

Note that this is not a solution:

example6 :: (Socket -> IO a) -> IO a 
example6 compute = do
  mask $ \restore -> do
    s <- restore openHandshake -- WRONG
    r <- try $ restore $ compute s
    closeSocket s
    case r of
      Left ex -> throwIO (ex :: SomeException)
      Right a -> return a

Consider what might happen: if an asynchronous exception is raised after openHandshake returns the socket, but before we leave the scope of restore, the asynchronous exception will be raised and the socket will be leaked. Installing an exception handler does not help: since we don’t have a handle on the socket, we cannot release it.

Interruptible operations

Consider this definition from the standard libraries:

withMVar :: MVar a -> (a -> IO b) -> IO b
withMVar m io =
  mask $ \restore -> do
    a <- takeMVar m
    b <- restore (io a) `onException` putMVar m a
    putMVar m a
    return b

This follows almost exactly the same pattern as the examples we have seen so far; we mask asynchronous exceptions, take the contents of the MVar, and then execute some operation io with the contents of the MVar, finally putting the contents of the MVar back when the computation completes or when an exception is raised.

An MVar acts as a lock, with takeMVar taking the role of acquiring the lock. This may, of course, take a long time if the lock is currently held by another thread. But we call takeMVar with asynchronous exceptions masked. Does this mean that the takeMVar cannot be timed out? No, it does not: takeMVar is a so-called interruptible operation. From the Asynchronous Exceptions in Haskell paper:

Any operation which may need to wait indefinitely for a resource (e.g., takeMVar) may receive asynchronous exceptions even within an enclosing block, but only while the resource is unavailable. Such operations are termed interruptible operations. (..) takeMVar behaves atomatically when enclosed in block. The takeMVar may receive asynchronous exceptions right up until the point when it acquires the MVar, but not after.

(block has been replaced by mask since the publication of the paper, but the principle is the same.) Although the existence of interruptible operations makes understanding the semantics of mask harder, they are necessary: like in the previous section, wrapping takeMVar in restore is not safe. If we really want to mask asynchronous exceptions, even across interruptible operations, Control.Exception offers uninterruptibleMask.

Custom interruptible operations

So an interruptible operation is one that can be interrupted by an asynchronous exception even when asynchronous exceptions are masked. Can we define our own interruptible operations? Yes, we can:

-- | Open a socket and perform handshake with the server
--
-- Note: this is an interruptible operation.
openHandshake' :: IO Socket
openHandshake' = 
  mask_ $ do 
    s <- openSocket
    r <- try $ unsafeUnmask $ handshake s
    case r of
      Left  ex -> closeSocket s >> throwIO (ex :: SomeException)
      Right () -> return s 

unsafeUnmask is defined in GHC.IO, and unmasks asynchronous exceptions, no matter what the enclosing context is. This is of course somewhat dangerous, because now calling openHandshake' inside a mask suddenly opens up the possibility of an asynchronous exception being raised; and the only way to know is to look at the implementation of openHandshake', or its Haddock documentation. This is somewhat unsatisfactory, but exactly the same goes for takeMVar and any other interruptible operation, or any combinator that uses an interruptible operation under the hood. A sad state of affairs, perhaps, but one that we don’t currently have a better solution for.

Actually, using unsafeUnmask is a bit too crude. Control.Exception does not export it, but does export

allowInterrupt :: IO ()
allowInterrupt = unsafeUnmask $ return ()

with documentation

When invoked inside mask, this function allows a blocked asynchronous exception to be raised, if one exists. It is equivalent to performing an interruptible operation, but does not involve any actual blocking.

When called outside mask, or inside uninterruptibleMask, this function has no effect.

(emphasis mine.) Sadly, this documentation does not reflect the actual semantics: unsafeUnmask, and as a consequence allowInterrupt, unmasks asynchronous exceptions no matter what the enclosing context is: even inside uninterruptibleMask. We can however define our own operator to do this:

interruptible :: IO a -> IO a
interruptible act = do
  st <- getMaskingState
  case st of
    Unmasked              -> act
    MaskedInterruptible   -> unsafeUnmask act
    MaskedUninterruptible -> act 

where we call unsafeUnmask only if the enclosing context is mask, but not if it is uninterruptibleMask (TODO: What is the semantics when we nest these two?). We can use it as follows to define a better version of openHandshake:

-- | Open a socket and perform handshake with the server
--
-- Note: this is an interruptible operation.
openHandshake' :: IO Socket
openHandshake' = 
  mask_ $ do 
    s <- openSocket
    r <- try $ interruptible $ handshake s
    case r of
      Left  ex -> closeSocket s >> throwIO (ex :: SomeException)
      Right () -> return s 

Resource allocation timeout

If we wanted to timeout the allocation of the resource only, we might do

example7 :: (Socket -> IO a) -> IO a
example7 compute = do
  mask $ \restore -> do
    ms <- timeout someTimeout $ openHandshake'
    case ms of
      Nothing -> throwIO (userError "Server busy")
      Just s  -> do r <- try $ restore $ compute s
                    closeSocket s
                    case r of
                      Left ex -> throwIO (ex :: SomeException)
                      Right a -> return a

Exceptions are masked when we enter the scope of the timeout, and are unmasked only once we are inside the exception handler in openHandshake'–in other words, if a timeout happens, we are guaranteed to clean up the socket. The surrounding mask is however necessary. For example, suppose we are writing some unit tests and we are testing openHandshake'. This is wrong:

example8 :: IO ()
example8 = do 
  ms <- timeout someTimeout $ openHandshake'
  case ms of
    Just s  -> closeSocket s
    Nothing -> return ()

Even if we are sure that the example8 will not be interrupted by asynchronous exceptions, there is still a potential resource leak here: the timeout exception might be raised just after we leave the mask_ scope from openHandshake but just before we leave the timeout scope. If we are sure we don’t need to worry about other asynchronous exceptions we can write

example8 :: IO ()
example8 = do
  s <- mask_ $ timeout someTimeout $ openHandshake'
  case ms of
    Just s  -> closeSocket s
    Nothing -> return ()

although of course it might be better to simply write

example8 :: IO ()
example8 = 
  bracket (timeout someTimeout $ openHandshake')
          (\ms -> case ms of Just s  -> closeSocket s
                             Nothing -> return ())
          (\_ -> return ())

Conclusions

Making sure that resources are properly deallocated in the presence of asynchronous exceptions is difficult. It is very important to make sure that asynchronous exceptions are masked at crucial points; unmasking them at the point of calling a resource allocation function is not safe. If you nevertheless want to be able to timeout resource allocation, you need to make your resource allocation function interruptible.

For completeness’ sake, there are some other solutions that avoid the use of unsafeUnmask. One option is to thread the restore argument through (and compose multiple restore arguments if there are multiple nested calls to mask). This requires resource allocations to have a different signature, however, and it is very error prone: a single mask somewhere along the call chain where we forget to thread through the restore argument will mean the code is no longer interruptible. The other option is to run the code that we want to be interruptible in a separate thread, and wait for the thread to finish with, for example, a takeMVar. Getting this right is however no easy task, and it doesn’t change anything fundamentally anyway: rather than using unsafeUnmask we are now using a primitive interruptible operation; either way we introduce the possibility of exceptions even in the scope of mask_.

Finally, when your application does not fit the bracket pattern we have been using (implicitly or explicitly), you may want to have a look at resourcet and pipes or conduit, or my talk Lazy I/O and Alternatives in Haskell.


Debugging Haskell at assembly level
by scripting lldb in Python

Fri, 01 Aug 2014 09:17:34 GMT, by edsko.
Filed under coding.

Haskell programmers tend to spend far less time with debuggers than programmers in other languages. Partly this is because for pure code debuggers are of limited value anyway, and Haskell’s type system is so strong that a lot of bugs are caught at compile time rather than at runtime. Moreover, Haskell is a managed language – like Java, say – and errors are turned into exceptions. Unlike in unmanaged languages such as C “true” runtime errors such as segmentation faults almost never happen.

I say “almost” because they can happen: either because of bugs in ghc or the Haskell runtime, or because we are doing low level stuff in our own Haskell code. When they do happen we have to drop down to a system debugger such as lldb or gdb, but debugging Haskell at that level can be difficult because Haskell’s execution model is so different from the execution model of imperative languages. In particular, compiled Haskell code barely makes any use of the system stack or function calls, and uses a continuation passing style instead (see my previous blog posts Understanding the Stack and Understanding the RealWorld). In this blog post I will explain a technique I sometimes use to help diagnose low-level problems.

Since I work on OSX I will be using lldb as my debugger. if you are using gdb you can probably use similar techniques; The LLDB Debugger shows how gdb and lldb commands correlate, and the ghc wiki also lists some tips. However, I have no experience with scripting gdb so your mileage may vary.

Description of the problem

As our running example I will use a bug that I was tracking down in a client project. The details of the project don’t matter so much, except that this project happens to use the GHC API to compile Haskell code—at runtime—into bytecode and then run it; moreover, it also—dynamically, at runtime—loads C object files into memory.

In one example run it loads the (compiled) C code

#include <stdio.h>

void hello(void) { 
  printf("hello\n"); 
}

and then compiles and runs this Haskell code:

{-# LANGUAGE ForeignFunctionInterface #-}
module Main where

foreign import ccall "hello" hello :: IO ()

main = hello

Sadly, however, this resulted in total system crash.

Starting point

By attaching lldb to the running process we got a tiny bit more information about the crash:

* thread #4: tid = 0x3550aa, 0x000000010b3b8226, stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x000000010b3b8226
-> 0x10b3b8226:  addb   %al, (%rax)
   0x10b3b8228:  addb   %al, (%rax)
   0x10b3b822a:  addb   %al, (%rax)
   0x10b3b822c:  addb   %al, (%rax)

It turns out we have a null-pointer dereferencing here. Anybody who has spent any time debugging Intel assembly code however will realize that this particular instruction

addb   %al, (%rax)

is in fact the decoding of zero:

(lldb) memory read -c 8 0x10b3b8226
0x10b3b8226: 00 00 00 00 00 00 00 00                          ........

In other words, chances are good we were never meant to execute this instruction at all. Unfortunately, asking lldb for a backtrace tells us absolutely nothing new:

(lldb) bt
* thread #4: tid = 0x3550aa, 0x000000010b3b8226, stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x000000010b3b8226

Finding a call chain

The lack of a suitable backtrace in lldb is not surprising, since compiled Haskell code barely makes use of the system stack. Instead, the runtime maintains its own stack, and code is compiled into a continuation passing style. For example, if we have the Haskell code

functionA :: IO ()
functionA = do .. ; functionB ; ..

functionB :: ()
functionB = do .. ; functionC ; ..

functionC :: IO ()
functionC = .. crash ..

main :: IO ()
main = functionA

and we step through the execution of this program in lldb, and we ask for a backtrace when we start executing function A all we get is

(lldb) bt
* thread #1: tid = 0x379731, 0x0000000100000a20 Main`A_functionA_info, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
  * frame #0: 0x0000000100000a20 Main`A_functionA_info

with no mention of main. Similarly, the backtraces on entry to functions B and C are

* thread #1: tid = 0x379731, 0x0000000100000b90 Main`B_functionB_info, queue = 'com.apple.main-thread', stop reason = breakpoint 2.1
  * frame #0: 0x0000000100000b90 Main`B_functionB_info

and

* thread #1: tid = 0x379731, 0x0000000100000c88 Main`C_functionC_info, queue = 'com.apple.main-thread', stop reason = breakpoint 3.1
  * frame #0: 0x0000000100000c88 Main`C_functionC_info

none of which is particularly informative. However, stepping manually through the program we do first see function A on the (singleton) call stack, then function B, and finally function C. Thus, by the time we reach function C, we have discovered a call chain A, B, C—it’s just that it involves quite a bit of manual work.

Scripting lldb

Fortunately, lldb can be scripted (see Using Scripting and Python to Debug in LLDB and the LLDB Python Reference). What we want to do is keep stepping through the code, showing the top-level (and quite possibly only) function at the top of the call stack at each step, until we crash.

We can use the following Python script to do this:

import lldb

def step_func(debugger, command, result, internal_dict):
  thread = debugger.GetSelectedTarget().GetProcess().GetSelectedThread()
  
  while True:
    thread.StepOver()

    stream = lldb.SBStream()
    thread.GetStatus(stream)
    description = stream.GetData()
    print description

    if thread.GetStopReason() == lldb.eStopReasonException:
      break

def __lldb_init_module(debugger, dict):
  debugger.HandleCommand('command script add -f %s.step_func sf' % __name__)

For the above example, we might use this as follows: we load our application into lldb

# lldb Main
Current executable set to 'Main' (x86_64).

register our new command sf

(lldb) command script import mystep.py

set a breakpoint where we want to start stepping

(lldb) breakpoint set -n A_functionA_info
Breakpoint 1: where = Main`A_functionA_info, address = 0x0000000100000b90

run to the breakpoint:

(lldb) run
Process 54082 launched: 'Main' (x86_64)
Process 54082 stopped
* thread #1: tid = 0x384510, 0x0000000100000b90 Main`A_functionA_info, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x0000000100000b90 Main`A_functionA_info

and then use sf to find a call-trace until we crash:

(lldb) sf
...

* thread #1: tid = 0x384510, 0x0000000100000bf0 Main`B_functionB_info, queue = 'com.apple.main-thread', stop reason = instruction step over
    frame #0: 0x0000000100000bf0 Main`B_functionB_info
Main`B_functionB_info:

...

* thread #1: tid = 0x384510, 0x0000000100000c78 Main`C_functionC_info, queue = 'com.apple.main-thread', stop reason = instruction step over
    frame #0: 0x0000000100000c78 Main`C_functionC_info
Main`C_functionC_info:

...

* thread #1: tid = 0x384510, 0x0000000100000d20 Main`crash + 16 at crash.c:3, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x0000000100000d20 Main`crash + 16 at crash.c:3

Note that if you are using the threaded runtime, you may have to select which thread you want to step through before calling sf:

(lldb) thread select 4
(lldb) sf

Tweaking the script

You will probably want to tweak the above script in various ways. For instance, in the application I was debugging, I wanted to step into each assembly language instruction but over each C function call, mostly because lldb was getting confused with the call stack. I also added a maximum step count:

def step_func(debugger, command, result, internal_dict):
  args = shlex.split(command)
 
  if len(args) > 0:
    maxCount = int(args[0])
  else:
    maxCount = 100

  thread = debugger.GetSelectedTarget().GetProcess().GetSelectedThread()
  i      = 0;

  while True:
    frame = thread.GetFrameAtIndex(0)
    file  = frame.GetLineEntry().GetFileSpec().GetFilename()
    inC   = type(file) is str and file.endswith(".c")
   
    if inC:
      thread.StepOver()
    else:
      thread.StepInstruction(False)

    stream = lldb.SBStream()
    thread.GetStatus(stream)
    description = stream.GetData()

    print i
    print description

    i += 1;
    if thread.GetStopReason() == lldb.eStopReasonException or i > maxCount:
      break

You may want to tweak this step into/step over behaviour to suit your application; certainly you don’t want to have a call trace involving every step taken in the Haskell RTS or worse, in the libraries it depends on.

Back to the example

Rather than printing every step along the way, it may also be useful to simply remember the step-before-last and show that on a crash; often it is sufficient to know what happened just before the crash. Indeed, in the application I was debugging the call stack just before the crash was:

2583
  thread #3: tid = 0x35e743, 0x00000001099da56d libHSrts_thr_debug-ghc7.8.3.20140729.dylib`schedule(initialCapability=0x0000000109a2f840, task=0x00007fadda404550) + 1533 at Schedule.c:470, stop reason = step over
    frame #0: 0x00000001099da56d libHSrts_thr_debug-ghc7.8.3.20140729.dylib`schedule(initialCapability=0x0000000109a2f840, task=0x00007fadda404550) + 1533 at Schedule.c:470
   467 	    }
   468 	    
   469 	    case ThreadInterpret:
-> 470 		cap = interpretBCO(cap);
   471 		ret = cap->r.rRet;
   472 		break;
   473 		

2584
  thread #3: tid = 0x35e743, 0x0000000103106226, stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x0000000103106226
-> 0x103106226:  addb   %al, (%rax)
   0x103106228:  addb   %al, (%rax)
   0x10310622a:  addb   %al, (%rax)
   0x10310622c:  addb   %al, (%rax)

Which is a lot more helpful than the backtrace, as we now have a starting point: something went wrong when running the bytecode interpreter (remember that the application was compiling and running some Haskell code at runtime).

To pinpoint the problem further, we can set a breakpoint in interpretBCO and run sf again (the way we defined sf it steps over any C function calls by default). This time we get to:

4272
  thread #4: tid = 0x35f43a, 0x000000010e77c548 libHSrts_thr_debug-ghc7.8.3.20140729.dylib`interpretBCO(cap=0x000000010e7e7840) + 18584 at Interpreter.c:1463, stop reason = step over
    frame #0: 0x000000010e77c548 libHSrts_thr_debug-ghc7.8.3.20140729.dylib`interpretBCO(cap=0x000000010e7e7840) + 18584 at Interpreter.c:1463
   1460		    tok = suspendThread(&cap->r, interruptible ? rtsTrue : rtsFalse);
   1461	
   1462		    // We already made a copy of the arguments above.
-> 1463	            ffi_call(cif, fn, ret, argptrs);
   1464	
   1465	            // And restart the thread again, popping the stg_ret_p frame.
   1466		    cap = (Capability *)((void *)((unsigned char*)resumeThread(tok) - STG_FIELD_OFFSET(Capability,r)));

4273
  thread #4: tid = 0x35f43a, 0x0000000107eba226, stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x0000000107eba226
-> 0x107eba226:  addb   %al, (%rax)
   0x107eba228:  addb   %al, (%rax)
   0x107eba22a:  addb   %al, (%rax)
   0x107eba22c:  addb   %al, (%rax)

Ok, now we are really getting somewhere. Something is going wrong when are doing a foreign function call. Let’s re-run the application once more, setting a breakpoint at ffi_call:

(lldb) breakpoint set -n ffi_call
Breakpoint 1: where = libffi.dylib`ffi_call + 29 at ffi64.c:421, address = 0x0000000108e098dd
(lldb) cont
Process 51476 resuming
Process 51476 stopped
* thread #4: tid = 0x360fd3, 0x0000000108e098dd libffi.dylib`ffi_call(cif=0x00007fb262f00000, fn=0x00000001024a4210, rvalue=0x000000010b786ac0, avalue=0x000000010b7868a0) + 29 at ffi64.c:421, stop reason = breakpoint 1.1
    frame #0: 0x0000000108e098dd libffi.dylib`ffi_call(cif=0x00007fb262f00000, fn=0x00000001024a4210, rvalue=0x000000010b786ac0, avalue=0x000000010b7868a0) + 29 at ffi64.c:421
   418 	  /* If the return value is a struct and we don't have a return value
   419 	     address then we need to make one.  Note the setting of flags to
   420 	     VOID above in ffi_prep_cif_machdep.  */
-> 421 	  ret_in_memory = (cif->rtype->type == FFI_TYPE_STRUCT
   422 			   && (cif->flags & 0xff) == FFI_TYPE_VOID);
   423 	  if (rvalue == NULL && ret_in_memory)
   424 	    rvalue = alloca (cif->rtype->size);

and let’s take a look at the function we’re about to execute:

(lldb) disassemble -s fn
   0x1024a4210:  pushq  %rbp
   0x1024a4211:  movq   %rsp, %rbp
   0x1024a4214:  leaq   (%rip), %rdi
   0x1024a421b:  popq   %rbp
   0x1024a421c:  jmpq   0x1024a4221
   0x1024a4221:  pushq  $0x6f6c6c65
   0x1024a4226:  addb   %al, (%rax)
   0x1024a4228:  addb   %al, (%rax)
   0x1024a422a:  addb   %al, (%rax)
   0x1024a422c:  addb   %al, (%rax)
   0x1024a422e:  addb   %al, (%rax)

We were expecting to execute hello:

# otool -tV hello.o 
hello.o:
(__TEXT,__text) section
_hello:
0000000000000000	pushq	%rbp
0000000000000001	movq	%rsp, %rbp
0000000000000004	leaq	L_str(%rip), %rdi ## literal pool for: "hello"
000000000000000b	popq	%rbp
000000000000000c	jmpq	_puts

and if you compare this with the code loaded into memory it all becomes clear. The jump instruction in the object file

jmpq _puts

contains a symbolic reference to puts; but the jump in the code that we are about to execute in fact jumps to the next instruction in memory:

0x1024a421c:  jmpq   0x1024a4221
0x1024a4221:  ... 

In other words, the loaded object file has not been properly relocated, and when we try to call puts we end up jumping into nowhere. At this point the bug was easily resolved.

Further Reading

We have barely scratched the surface here of what we can do with lldb or gdb. In particular, ghc maintains quite a bit of runtime information that we can inspect with the debugger. Tim Schröder has an excellent blog post about inspecting ghc’s runtime data structures with gdb, and Nathan Howell has written some extensive lldb scripts to do the same, although they may now be somewhat outdated. See also the reddit discussion about this blog post.


Understanding the RealWorld

Thu, 12 Jun 2014 11:31:19 GMT, by edsko.
Filed under coding.

In my previous blog post Understanding the Stack I mentioned that “RealWorld tokens disappear” in the generated code. This is not entirely accurate. Simon Marlow gives a very brief explanation of this on the old glasgow-haskell-users mailing list. In this blog post we will explore this in more depth and hopefully come to a better understanding of the RealWorld, though perhaps not of the real world.

In order to make some points a bit easier to illustrate, we will be using a 32-bit version of ghc; and since there are no 32-bit builds of ghc for OSX later than version 7.4.2, that’s what we’ll use for most of this blog post.

Warmup exercise

Consider

constant_Int# :: () -> Int#
constant_Int# _ = 1234#

This compiles to the following C-- code:

R1 = 1234;
Sp = Sp + 4;
jump (I32[Sp + 0]) ();

On x86 architectures all Haskell function arguments are passed through the stack (on x86-64 the first few arguments are passed through registers). That means that on entry to constant_Int# the top of the stack contains the () argument, which we discard and pop off the stack by incrementing the stack pointer Sp. (I will explain why I am introducing this unused argument towards the end of this blog post.)

Since 1234# is already in weak head normal form we simply call the continuation on the top of the stack with #1234 as the first argument. If you followed along in the previous blog post this will all be old hat to you.

Heap allocation

What happens if we use a boxed integer instead?

constant_Int :: () -> Int
constant_Int _ = 1234

If we compare the STG code for constant_Int# and constant_Int we see that the only difference is indeed that constant_Int must box its argument:

constant_Int  = \r [ds_sf9] I# [1234];
constant_Int# = \r [ds_sfb] 1234;

This means however that we must create a heap object, and hence the C-- is significantly more complicated:

chJ:
    Hp = Hp + 8;
    if (Hp > I32[BaseReg + 92]) goto chO;
    I32[Hp - 4] = I#_con_info;
    I32[Hp + 0] = 1234;
    R1 = Hp - 3;
    Sp = Sp + 4;
    jump (I32[Sp + 0]) ();
chM:
    R1 = constant_Int_closure;
    jump (I32[BaseReg - 4]) ();
chO:
    I32[BaseReg + 116] = 8;
    goto chM;

The function starts with a heap overflow check. This works in much the same way as the stack overflow check that we discussed previously, so we will ignore it for the sake of this blog post, and omit heap and stack overflow checks from further code samples.

The interesting code happens after we conclude that the heap is okay. Every heap object consists of a code pointer followed by a payload. In this case, the code pointer is the one for the I# constructor, and the payload is 1234.

Since I# 1234 is already in weak head normal form, we don’t need to evaluate it any further, and we can call the continuation on the stack with the address of the heap object we just constructed as its argument. Incidentally, if you are confused by the Hp - 3, think of it as Hp - 4 + 1 instead; Hp - 4 is the address of the code pointer (the start of the heap object); the + 1 is pointer tagging at work again.

(Incidentally, ghc will “preallocate” small integers, so that when you write, say, 1 instead of 1234 it doesn’t actually create a new heap object but gives you a reference to a cached Int instead.)

Larger types

What happens if we move from Int to Double?

constant_Double# :: () -> Double#
constant_Double# _ = 1234.0##

constant_Double :: () -> Double
constant_Double _ = 1234.0

The C-- code for constant_Double is

F64[BaseReg + 56] = 1234.0 :: W64;
Sp = Sp + 4;
jump (I32[Sp + 0]) ();

Since a Double is 64-bits it cannot be passed through the 32-bit register R1/esi. On 32-bit machines ghc passes Double values through a “virtual register”. Virtual registers are just memory locations; BaseReg (ebx on x86) is the address of the first of them. (On x86-64 architectures ghc will use the SSE registers for Doubles.)

The code for constant_Double is similar to the code for constant_Int, except that it must allocate 12 bytes, rather than 8, for the heap object:

Hp = Hp + 12;
I32[Hp - 8] = D#_con_info;
F64[Hp - 4] = 1234.0 :: W64;
R1 = Hp - 7;
Sp = Sp + 4;
jump (I32[Sp + 0]) ();

Functions

So far we considered only constants (modulo that extra argument of type ()). Let’s consider some simple functions next:

want_Int# :: Int# -> Int#
want_Int# x = x

translates to

R1 = I32[Sp + 0];
Sp = Sp + 4;
jump (I32[Sp + 0]) ();

Since any Int# is by definition already in weak head normal form, the argument to want_Int# must already be in weak head normal form and hence we can just pop that Int# off the stack and pass it to the continuation through register R1. The situation for Double# is comparable, except that as before we need to use a different register, and pop two words off the stack rather than one. That is,

want_Double# :: Double# -> Double#
want_Double# x = x

translates to

F64[BaseReg + 56] = F64[Sp + 0];
Sp = Sp + 8;
jump (I32[Sp + 0]) ();

The translation for Int is different.

want_Int :: Int -> Int
want_Int x = x

Since we don’t know if x is in weak head normal form yet, we cannot call the continuation; instead, we must enter x itself:

R1 = I32[Sp + 0];
Sp = Sp + 4;
R1 = R1 & (-4);
jump I32[R1] ();

Remember from the previous blog post that when we enter a closure the address of the closure must be in register R1, in case the closure needs to lookup any free variables. So, we pop off the address of the closure from the stack, load it into R1, mask out any pointer tagging bits, and enter the closure. It will be the responsibility of the closure to eventually call the continuation on the stack once it reaches weak head normal form.

The translation of

want_Double :: Double -> Double
want_Double x = x

is precisely identical; indeed, both are identical to the translation of

id :: a -> a
id x = x

That’s the point of boxing, of course: every argument has the same shape, and hence we can universally quantify over any non-primitive type. Incidentally, this is precisely why Int# and Double# have kind # rather than kind *: it means you cannot apply a polymorphic function such as id to an argument of a primitive type.

Calling known functions

We saw how we can define functions; how do we call them? Consider

call_the_want_Int# :: () -> Int#
call_the_want_Int# _ = want_Int# 1234#

This translates to

I32[Sp + 0] = 1234;
jump want_Int#_info ();

This couldn’t be simpler. We overwrite the top of the stack (which contains the () argument that we are ignoring) with the argument for want_Int#, and then jump to the code for want_Int#.

For want_Double# the situation is slightly more complicated.

call_the_want_Double# :: () -> Double#
call_the_want_Double# _ = want_Double# 1234.0##

Since the Double is two words, we have to grow the stack by one word (and deal with a potential stack overflow, which we omit again):

F64[Sp - 4] = 1234.0 :: W64;
Sp = Sp - 4;
jump want_Double#_info ();

To call a function with a boxed Int or Double we have to create the heap object. For instance,

call_the_want_Int :: () -> Int
call_the_want_Int _ = want_Int 1234

translates to

Hp = Hp + 8;
I32[Hp - 4] = I#_con_info;
I32[Hp + 0] = 1234;
I32[Sp + 0] = Hp - 3;
jump want_Int_info ();

No real surprises here. We create the heap object, push the address of the newly created heap object onto the stack, and call want_Int_info. The same happens when we call want_Double, except of course that we need to create a larger heap object (12 bytes rather than 8).

Multiple arguments

Conceptually speaking Haskell does not have functions of multiple arguments. Instead, a function such as

want_ID## :: Int# -> Double# -> Int#
want_ID## x y = x +# double2Int# y

with two arguments is thought of as a function of a single argument of type Int#, which returns a new function, that takes an argument of type Double# and returns an Int#. The compiler does not implement that literally, however, as that would be much too slow. Instead, a function such as want_ID## really does take two arguments, both of which are on the stack (or in registers on architectures where that is possible); want_ID## gets compiled to

R1 = I32[Sp + 0] + %MO_FS_Conv_W64_W32(F64[Sp + 4]);
Sp = Sp + 12;
jump (I32[Sp + 0]) ();

We add together the Int# and Double# on the stack, store the result in R1, pop them off the stack and call the continuation. Similarly, when we call a function of multiple arguments, we push all arguments to the stack before jumping to the function entry code:

call_the_want_ID## :: () -> Int#
call_the_want_ID## _ = want_ID## 1234# 1234.0##

translates to

F64[Sp - 4] = 1234.0 :: W64;
I32[Sp - 8] = 1234;
Sp = Sp - 8;
jump want_ID##_info ();

Calling unknown functions

In the previous section we knew precisely what function we were calling. Things are more difficult if we don’t:

call_some_want_Int# :: (Int# -> a) -> a
call_some_want_Int# f = f 1234#

The reason that this is more difficult is that we do not know how many arguments f takes. If it takes just one, then we can call f in precisely the same way that we were calling want_Int# previously. But what if f takes more than one argument? In that case we must construct a PAP heap object to record the partial application of f. Clearly, the shape of this heap object must depend on the arguments that we have supplied: for an Int# the size of the payload must be one word, for a Double# it must be two, and so on.

We must also deal with the case that f is already such a PAP object. In that case, we must check if we now have all the arguments necessary to call f; if so, we can call f as we did before; if not, we must construct a new PAP object collecting the previously supplied arguments and the arguments we are supplying now.

Finally, if we are providing multiple arguments, we must deal with the case that f actually takes fewer arguments than we are providing and returns a new PAP object. In that case, we must provide all the arguments that f needs, call f, and then continue with the PAP object that f returns.

Rather than generating code to deal with all these possibilities at every call-to-unknown-function site, ghc delegates this to a bunch of specialized functions which are part of the RTS. The compilation of call_some_want_Int# therefore looks deceptively simple:

R1 = I32[Sp + 0];
I32[Sp + 0] = 1234;
jump stg_ap_n_fast ();

stg_ap_n_fast deals with the application of an unknown function to a single Int#; hence the _n in the name. It finds out how many arguments f has (by looking at the pointer tags, or failing that at the info table for f), and deals with all the cases that mentioned above (as well as a bunch of others; we haven’t shown the full picture here). To call stg_ap_n_fast we pop the function argument (f) off the stack and store it in R1 and then push the argument that we want to provide to f onto the stack.

The case for Double#

call_some_want_Double# :: (Double# -> a) -> a
call_some_want_Double# f = f 1234.0##

is very similar, except that we need to grow the stack by one word, and use stg_ap_d_fast instead (d for Double#):

R1 = I32[Sp + 0];
F64[Sp - 4] = 1234.0 :: W64;
Sp = Sp - 4;
jump stg_ap_d_fast ();

Finally, for non-primitive arguments there is a generic stg_ap_p_fast (p for pointer);

call_some_want_Int :: (Int -> a) -> a
call_some_want_Int f = f 1234

translates to

Hp = Hp + 8;
I32[Hp - 4] = I#_con_info;
I32[Hp + 0] = 1234;
R1 = I32[Sp + 0];
I32[Sp + 0] = Hp - 3;
jump stg_ap_p_fast ();

No real surprises here; we construct the heap object and call stg_ap_p_fast.

Multiple arguments, again

What happens if we call an unknown function with multiple arguments?

call_some_want_II :: (Int -> Int -> a) -> a
call_some_want_II f =1234 5678

This is really no different from supplying just one argument; we still have to deal with the same set of possibilities; the only difference is that we now need a payload for two pointers. This is created by stg_ap_pp_fast:

Hp = Hp + 16;
I32[Hp - 12] = I#_con_info;
I32[Hp - 8] = 5678;
I32[Hp - 4] = I#_con_info;
I32[Hp + 0] = 1234;
R1 = I32[Sp + 0];
I32[Sp + 0] = Hp - 11;
I32[Sp - 4] = Hp - 3;
Sp = Sp - 4;
jump stg_ap_pp_fast ();

We construct two heap objects for the two boxed integers, and then call stg_ap_pp_fast.

If at this point you are wondering “how many variations on stg_ap_XYZ_fast are there?” congratulate yourself, you are paying attention :) Clearly, there cannot be a version of this function for every possible number and type of argument, as there are infinitely many such combinations. Instead, the RTS only contains versions for the most common combinations. For example, there is no version for calling a function with two Int# arguments. So what does

call_some_want_II## :: (Int# -> Int# -> a) -> a
call_some_want_II## f = f 1234# 5678#

compile to?

R1 = I32[Sp + 0];
I32[Sp + 0] = 5678;
I32[Sp - 4] = I32[Lstg_ap_n_info$non_lazy_ptr];
I32[Sp - 8] = 1234;
Sp = Sp - 8;
jump stg_ap_n_fast ();

Let’s consider carefully what’s going on here:

  1. We call stg_ap_n_fast (with a single n) with 1234 on the top of the stack. stg_ap_n_fast will notice that f has (at least) two arguments, and can therefore not yet be called. Instead, it creates a PAP object containing just 1234 (and the address of f).

  2. After it has created the PAP object, it then calls the continuation on the top of the stack (after the argument). This continuation happens to be stg_ap_n_info, which is the “continuation wrapper” of stg_ap_n.

  3. This in turn will pop the next argument off the stack (5678) and the process repeats.

In this way any non-standard version of stg_ap_XYZ can be simulated with a chain of standard stg_ap_XYZ functions.

RealWorld tokens

The main points to take away from all of the above are

So what does all this have to do with the RealWorld tokens that I promised at the start we would look at? Well, the RealWorld tokens have type State# RealWorld, which is yet another primitive type … of size 0. So let us retrace our steps and consider the same examples that we considered for Int# and Double#, but now look at the corresponding translation for State# RealWorld.

We first considered the construction of a constant:

constant_State# :: () -> State# RealWorld
constant_State# _ = realWorld#

This translates to

Sp = Sp + 4;
jump (I32[Sp + 0]) ();

All we need to do is pop the () argument off the stack and call the continuation; since a State# RealWorld type has size zero, we don’t need any register at all to store it!

The translation of

call_the_want_State# :: () -> State# RealWorld
call_the_want_State# _ = want_State# realWorld#

is

Sp = Sp + 4;
jump want_State#_info ();

and finally the translation of

call_some_want_State# :: (State# RealWorld -> a) -> a
call_some_want_State# f = f realWorld#

is

R1 = I32[Sp + 0];
Sp = Sp + 4;
jump stg_ap_v_fast ();

The v here stands for “void”, and here we finally see a trace of a RealWorld token in the compiled code: we are applying a function to a primitive type of type void; admittedly, this is something of size zero, but it’s still somehow here. Note that stg_ap_v_fast must still deal with all the possible cases (exact number of arguments, too many arguments, too few arguments) that we mentioned above, despite the fact that we are providing no arguments at all.

Proxy#

In the case of RealWorld and the IO monad this doesn’t really matter most of the time. Usually when we repeatedly call an unknown IO function we are providing an argument (think mapM, for instance), and the STG runtime provides specialized versions of stg_ap_XYZ_fast for a function that takes one, two, or three pointers and a single void argument, in which case the additional void parameter does not introduce any additional overhead of the indirection through stg_ap. But it is good to be aware that the runtime cost is not quite zero when writing highly performance critical code.

However, as we start to do more and more type level programming in Haskell, we increasingly need to explicitly pass type variables around. For this reason ghc 7.8 introduces a new primitive type

Proxy# :: * -> #

with a single “constructor”

proxy# :: Proxy# a

The sole purpose of a Proxy# argument is to instantiate a type variable. It is used heavily for instance in the new ghc extension for overloaded records. This isn’t a blog post about type level programming—quite the opposite :)—so we will just consider a completely contrived example, along the same lines of the other examples that we considered so far:

call_some_want_Proxies# :: (Proxy# a -> Proxy# b -> Proxy# c 
                              -> (a, b, c) -> (c, b, a)) 
                        -> (a, b, c) -> (c, b, a) 
call_some_want_Proxies# f tup = f proxy# proxy# proxy# tup

Although a Proxy# argument takes up no memory, they do play a role when calling an unknown function, as we have seen. The above function call translates to

R1 = P32[Sp];
I32[Sp - 12] = stg_ap_v_info;
I32[Sp - 8]  = stg_ap_v_info;
I32[Sp - 4]  = stg_ap_v_info;
I32[Sp]      = stg_ap_p_info;
Sp = Sp - 12;
call stg_ap_v_fast(R1) args: 24, res: 0, upd: 4;

Note the stack that we set up here: we apply the function to a single “void” argument; then when that is done, we apply it to the next, and again to the third, and only then we apply it to a “pointer” argument (the actual tuple). The use of Proxy# thus does incur a runtime cost. Of course, one could envision various ways in which ghc could be improved to alleviate this cost; the simplest of which would be to introduce some more stg_ap_XYZ variations targeted at void arguments.

For now, if this really matters it is faster to use a single Proxy# argument, and make sure that it is the last argument so that we can take advantage of the specialized stg_ap functions which were introduced for the IO monad:

fast_call_some_want_Proxies# :: ((a, b, c) -> Proxy# (a, b, c) -> (c, b, a))
                             -> (a, b, c) -> (c, b, a)
fast_call_some_want_Proxies# f tup = f tup proxy#

compiles to just

R1 = P32[Sp];
Sp = Sp + 4;
call stg_ap_pv_fast(R1) args: 8, res: 0, upd: 4;

Footnotes

What is I#_con_info?

We saw that the heap object header for boxed integers is I#_con_info. You might be wondering what exactly is at that code pointer address. We can fire up lldb (or gdb) and find out:

(lldb) disassemble -n ghczmprim_GHCziTypes_Izh_con_info
Main`ghczmprim_GHCziTypes_Izh_con_info:
Main[0x68398]:  incl   %esi
Main[0x68399]:  jmpl   *(%ebp)

ghczmprim_GHCziTypes_Izh is the Z-encoding for ghc-prim_GHC.Types.I#; zm for minus, zh for hash, and zi for dot (“the dot on the i”). So what does the code do? Well, the code pointer associated with every heap object is the code that reduces that heap object to normal form (this what we mean by “entering a closure”). Since a constructor application already is in normal form, there is almost nothing to do, so we just call the continuation on the stack (jmpl).

The only complication is due to pointer tagging, once again. Remember that when we evaluate a closure, R1 (mapped to esi on x86 architectures) points to the address of the closure. If we entered the closure that means the pointer wasn’t tagged yet, or we ignored the tag bits (for whatever reason); we make sure to tag it by calling incl before calling the continuation so that we don’t unnecessarily enter the closure again.

 Info tables

The discussion of constant_Int above assumed that the “tables next to code” optimization is enabled (which is most of the time). With this optimization enabled the code pointer for a closure also doubles as a pointer to the info table for the closure. The info table for a closure contains information such as the constructor tag (which constructor of a data type is this?), the size of the payload and the types of its elements (int, float, pointer etc) for garbage collection purposes, the static reference table (used for garbage collection of CAFs), profiling information, etc. The info table is located just before the closure entry code in memory.

When the optimization is not enabled the heap object header is a pointer to the info table, which in turn contains a field for the entry code. This means that in order to enter a closure an additional indirection is necessary.

Allocation in STG

I have often seen claims that the only place where heap allocation occurs in STG code is in let expressions (for instance, on StackOverflow and on cvs-ghc). This is not entirely accurate, as we saw when we looked at constant_Int. See cgExpr in compiler/codeGen/StgCmmExpr in the ghc sources, and in particular function cgConApp under “tail calls”.

Boxing state

We don’t normally box State# objects, but we could:

data State a = State# (State# a)

constant_State :: () -> State RealWorld
constant_State _ = State# realWorld#

The translation of constant_State is however a bit surprising:

Hp = Hp + 8;
if (Hp > I32[BaseReg + 92]) goto cm8;
I32[Hp - 4] = State#_con_info;
R1 = Hp - 3;
Sp = Sp + 4;
jump (I32[Sp + 0]) ();

This creates a heap object with a one-word payload, which is subseqeuently left unused. The reason is that every heap object must be at least two words, so that there is enough room to overwrite it with a forwarding pointer during garbage collection. Thanks to rwbarton on reddit for pointing this out!

That strange () argument

Many of the examples we looked at had an additional () argument:

call_the_want_Int# :: () -> Int#
call_the_want_Int# _ = want_Int# 1234#

What is the purpose of this argument? Well, first of all, a top-level binding to a primitive type (something of kind #) is not allowed; this is illegal Haskell:

caf_the_want_Int# :: Int# -- Not legal Haskell
caf_the_want_Int# = want_Int# 1234#

But even for examples where it would be legal we want to avoid it for the sake of this blog post. Consider

caf_the_want_Int :: Int
caf_the_want_Int = want_Int 1234

The difference between call_the_want_Int and caf_the_want_Int is that the latter is a CAF or “constant applicative form”; ghc must ensure that caf_the_want_Int is reduced to weak head normal form only once in the execution of the program. CAFs are a large topic in their own right; another blog post, perhaps (for now, some interesting links are this Stack overflow question and ghc ticket #917, answer from Simon Peyton-Jones about how CAFs are garbage collected, and the entry on the ghc wiki). If you are feeling brave, take a look at the generated code for caf_the_want_Int.

In order to avoid CAFs being generated, we introduce the additional argument, and use the compiler flag -fno-full-laziness to make sure that ghc doesn’t attempt to optimize that additional argument away again.


Understanding the Stack

Wed, 21 May 2014 10:08:23 GMT, by edsko.
Filed under coding.

One of our clients sent us a bug report, which consisted of a 700 line Haskell program with a complaint that “it deadlocks”. After studying the code I concluded that, for the sake of the bug report, it could be summarized as

print a bunch of stuff, then crash with a stack overflow

So I wanted to replace the original code with code that did just that: print a bunch of stuff, then crash with a stack overflow. No big deal:

{-# LANGUAGE CPP #-}
module Main (main) where

go :: Int -> IO Int
go n = do
  if (n `rem` THRESHOLD == 0)
    then putChar '.'
    else return ()
  n' <- go (n + 1)
  return (n + n')

main :: IO ()
main = print =<< go 0

The function go prints a dot every THRESHOLD recursive calls; we have a (dummy, since it can never be reached) addition after the recursive call to ensure that go is not tail recursive and will eventually run out of stack space. The THRESHOLD macro variable is there so that we can tweak how quickly the program runs out of stack space, or in other words, how many dots it prints before it crashes. For example, if we compile the code with

ghc -O1 PrintThenCrash.hs -fforce-recomp -DTHRESHOLD=23

it prints 22,126 dots before crashing; if we compile with

ghc -O1 PrintThenCrash.hs -fforce-recomp -DTHRESHOLD=26

it prints 19,483 dots before crashing. But as I was experimenting to find the right value for this parameter, I noticed something very strange. If we compile with

ghc -O1 PrintThenCrash.hs -fforce-recomp -DTHRESHOLD=24

it never crashes! It just prints, and prints, and prints (with GHC 7.6.3). This blog post documents my attempt to understand this behaviour, and explains some aspects of ghc’s runtime, and in particular, its stack, as it goes.

Checking core

After type checking ghc translates Haskell to an intermediate language called core, which is the “real life” version of the more “academic” language System FC. We can ask ghc to output the (optimized) core version of our program with

ghc -O1 PrintThenCrash.hs -fforce-recomp -DTHRESHOLD=24 -dsuppress-all -ddump-simpl

This is useful, because it allows to verify that even with a THRESHOLD value of 24 the optimizer did not somehow manage to make the function tail recursive:

$wa =
  \ ww_s1SW w_s1SY ->
    case remInt# ww_s1SW 24 of _ {
      __DEFAULT ->
        case $wa (+# ww_s1SW 1) w_s1SY of _ { (# ipv_aEw, ipv1_aEx #) ->
        (# ipv_aEw,
           case ipv1_aEx of _ { I# y_axO -> I# (+# ww_s1SW y_axO) } #)
        };
      0 ->
        case $wa5 stdout '.' w_s1SY of _ { (# ipv_aEw, _ #) ->
        case $wa (+# ww_s1SW 1) ipv_aEw of _ { (# ipv2_XET, ipv3_XEV #) ->
        (# ipv2_XET,
           case ipv3_XEV of _ { I# y_axO -> I# (+# ww_s1SW y_axO) } #)
        }
        }
    }

This is a bit difficult to read; don’t worry about the details for now, we will start dealing with low level details soon enough. For now, it’s enough to note that $wa is the translation of go, and that the recursive calls to $wa are not in tail position; the optimizer did not somehow manage to make the function tail recursive. Ok. Then what? Why is this function not crashing with a stack overflow?

Simplifying the problem

If we cannot understand the behaviour of the code by looking at core then we need to drop all the way down to assembly language. However, if we want to have any hope of being able to step through the assembly language we need to simplify that call to putChar. Sadly, replacing it with

foreign import ccall "putchar" c_putchar :: Char -> IO ()

made the problem go away: the program now always crashed. So the strange behaviour of our program had something to do with the the implementation of putChar. The real putChar is more involved that it might seem; it deals with buffering, character encodings, concurrent access to Handles, etc.

Unfortunately, since I had no idea what aspect of the implementation of putChar was causing the behaviour I was seeing, I could think of no other approach than to inline putChar and the functions it calls, and start simplifying it bit by bit until the strange behaviour disappeared.

Many, many hours later I ended up with this code:

hPutChar :: State# RealWorld -> (# State# RealWorld, () #)
hPutChar w0 = case stdout of () -> (# w0, () #)

go :: State# RealWorld -> (# State# RealWorld, () #)
go w0 =
  case maskAsyncExceptions# hPutChar w0 of
    (# w1, _ #) ->
      case go w1 of
        (# w2, () #) -> (# w2, () #)

stdout :: ()
stdout = ()

main :: IO ()
main = IO go

In order to understand this code, you first have to realize that the IO monad is a state monad with the RealWorld as the state:

newtype IO a = IO (State# RealWorld -> (# State# RealWorld, a #))

I have written the code with explicit state passing, instead of using the monad, so that the Haskell code is as close as possible to the generated core and beyond.

That out of the way, let’s discuss the actual code. First of all, the real putChar is defined as

putChar = hPutChar stdout

where stdout has type Handle. As it turns out, there are only two aspects of hPutChar that are relevant:

  1. hPutChar does a case analysis on something (actually, of course, it does case analyses on lots of things); as it turns out, all that matters for our example is that it does a case analysis at all, so we have modelled stdout simply by ().

  2. This case analysis happens while asynchronous exceptions have been masked. This happens because a Handle contains an MVar, and most of the I/O operation happens while we have taken the value from this MVar (see withHandle' in GHC.IO.Handle.Internals).

At this point we can simplify no further. Although we are not actually printing any dots anymore, if we run this code it will run forever until you interrupt it manually; but if you remove either the case analysis from hPutChar or the call to maskAsyncExceptions# from go the program will exit with a stack overflow almost immediately. What gives?

Understanding hPutChar

When ghc compiles your program, after lexical analysis and syntax analysis it first removes syntactical sugar and does scope analysis; then, after type checking, it translates your code to core, as we saw. The core gets optimized and then translated to stg (for “Spineless Tagless G-machine”); stg code is very similar to core, but with a handful of additional restrictions; most importantly, all constructor applications must be fully applied (and if not, an explicit lambda must be constructed). Finally, the stg code gets translated to C--, which is a portable assembly language similar in intent to llvm, and the C-- code then is translated to assembly language.

We wrote our code in such a low level way that the core and stg translations are not very interesting; they all look very alike. The C-- code however is a big step down in level. Recall that we simplified hPutChar to

hPutChar :: State# RealWorld -> (# State# RealWorld, () #)
hPutChar w0 = case stdout of () -> (# w0, () #)

The C-- code for this “hPutChar” is (-ddump-opt-cmm):

Main.hPutChar_info:
        if ((Sp + -8) < SpLim) goto c1ae;
        R1 = PicBaseReg + IO.stdout_closure;
        I64[Sp - 8] = PicBaseReg + s19G_info;
        Sp = Sp - 8;
        if (R1 & 7 != 0) goto c1ah;
        jump I64[R1]; // [R1]
    c1ae:
        R1 = PicBaseReg + Main.hPutChar_closure;
        jump (I64[BaseReg - 8]); // [R1]
    c1ah: jump s19G_info; // [R1]

s19G_info:
    R1 = PicBaseReg + GHC.Tuple.()_closure + 1;
    Sp = Sp + 8;
    jump (I64[Sp + 0]); // [R1]

This might look rather frightening, especially when you are used to Haskell, so let’s take it step by step. We will be pushing a value to the stack, so we first check if there is enough room on the stack to do so:

if ((Sp + -8) < SpLim) goto c1ae;

If not, we will call a function from the GHC runtime called __stg_gc_fun, which will extend the stack, if possible. We will come back to this in detail later.

In order to do case analysis on stdout we first have to make sure that it is in weak head normal form (in this case, it will always be, but it might not be in general). In order to do this, we need to call its definition; but before we can do that, we need to do two more things.

First, in general stdout might be a thunk, with free variables in the closure, and hence it needs to know where it can find the values of those free variables. The convention is therefore that when we call a closure, the address of the closure can always be found in register R1:

R1 = PicBaseReg + IO.stdout_closure;

Secondly, when stdout completes, it needs to know what to do next. It does this by looking at the stack: when it finishes, it loads a “continuation address” from the top of the stack and calls it. Hence we need to push this address to the stack so that stdout will be able to find it later:

I64[Sp - 8] = PicBaseReg + s19G_info;
Sp = Sp - 8;

At this point we could call stdout, but actually we can do slightly better. ghc implements an optimization called pointer tagging. Since addresses are always word aligned, the lower 3 bits (or, on 32-bit machines, the lower 2 bits) of addresses are always 0. For datatype constructors ghc uses these bits to encode, as part of the pointer itself, which constructor of the datatype it is (non-zero value) or whether the thunk is not yet in weak head normal form. So we can check by looking at the pointer if there is a point calling stdout at all:

if (R1 & 7 != 0) goto c1ah;

c1ah: jump s19G_info; // [R1]

If the constructor is already in weak head normal form, we call the continuation directly (note that s19G_info is the same address that we pushed to the stack as the continuation address for stdout). Finally, if it turns out that we do still need to evaluate the stdout we call it:

jump I64[R1]; // [R1]

In the continuation we now know that stdout is in weak head normal form; technically speaking, we should now pattern match on it to find out which constructor it is, but of course () only has a single constructor, so we can skip that step. This means that all that is left to do is to call our continuation with the result. We are returning

(# w0, () #)

In general, unboxed tuples are represented as a pair of pointers, either as two consecutive memory locations or, ideally, in two registers. However, real world tokens disappear from generated code, so all we have to return is (). By convention the first few arguments are passed in registers; in this case, that means that we need to load the address of () into register R1 before calling the continuation:

R1 = PicBaseReg + GHC.Tuple.()_closure + 1;
Sp = Sp + 8;
jump (I64[Sp + 0]); // [R1]

The + 1 part is pointer tagging at work: () is already in weak head normal form, and it is the first (and only) constructor of the () type.

Understanding go

At this point you might realize why I said at the start that it would be completely unfeasible to step through the real hPutChar; our simplified version just does a case analysis (on something of unit value, no less) and then returns unit, and it is already complicated enough! What about go?

go :: State# RealWorld -> (# State# RealWorld, () #)
go w0 =
  case maskAsyncExceptions# hPutChar w0 of
    (# w1, _ #) ->
      case go w1 of
        (# w2, () #) -> (# w2, () #)

Thankfully, the code was carefully written to make the translation to lower level code as simple as possible; the C-- translation of go does not introduce any more concepts that we already used in hPutChar:

Main.go_info():
    c1aM:
        if ((Sp + -8) < SpLim) goto c1aO;
        R1 = PicBaseReg + Main.hPutChar_closure + 1;
        I64[Sp - 8] = PicBaseReg + s19O_info;
        Sp = Sp - 8;
        jump stg_maskAsyncExceptions#; // [R1]
    c1aO:
        R1 = PicBaseReg + Main.go_closure;
        jump (I64[BaseReg - 8]); // [R1]

s19O_info:
        I64[Sp + 0] = PicBaseReg + s19N_info;
        jump Main.go_info; // []

s19N_info:
        I64[Sp + 0] = PicBaseReg + s19M_info;
        if (R1 & 7 != 0) goto c1aI;
        jump I64[R1]; // [R1]
    c1aI:
        jump s19M_info; // [R1]

s19M_info:
        R1 = PicBaseReg + GHC.Tuple.()_closure + 1;
        Sp = Sp + 8;
        jump (I64[Sp + 0]); // [R1]

The structure is very similar as before, except that we have two (actually, three) case statements.

  1. First we do a stack overflow check again, as before.

  2. We need to evaluate the scrutinee of the first case statement: the call to the primop maskAsyncExceptions# (the equivalent of mask_ in Haskell-land), which expects its argument in R1. In this case, the argument is hPutChar, except that as before we use pointer tagging (in this case, to indicate that the function has been evaluated and that its arity is 1).

  3. The continuation after we are finished with the scrutinee is s19O_info; this is a second case statement, so we push a second continuation (s19N_info) onto the stack and recursively call go.

  4. The second continuation—which is never reached—makes sure that go indeed returned unit, and then returns unit; if you look at the stg translation of go (-ddump-stg) you will notice that go is actually a triply nested case statement:

case maskAsyncExceptions# [Main.hPutChar w0_s19t] of _ {
  (#,#) ipv_s19x _ ->
      case Main.go ipv_s19x of _ {
        (#,#) ipv2_s19D ipv3_s19B ->
            case ipv3_s19B of _ { () -> (#,#) [ipv2_s19D GHC.Tuple.()]; };
      };
};

Three case statements, hence three continuations (s19O_info, s19N_info, s19M_info).

Running the code

All this is pure theory, so far. Let’s confirm it by actually running our code. We can load up the code into a debugger; make sure to compile the code with -debug to link it against the version of the Haskell RTS that has debugging symbols. Since I am working on OSX Mavericks I will be using lldb; gdb will work in a very similar way.

One of the abstractions that C-- offers over real assembly language is virtual registers. When we translate C-- to assembly these are mapped to machine registers by a register allocator in ghc’s native code generator. On my machine, Sp and SpLim are mapped to rbp and r15, and R1 is mapped to rbx.

Let’s load up our code, set a breakpoint in go, ask it to print Sp and SpLim whenever we hit a breakpoint, and then start the program (which, during my debugging, I called “weird”, which is why you will see many references to “weird” below).

# lldb weird
Current executable set to 'weird' (x86_64).

(lldb) breakpoint set -n Main_go_info
Breakpoint 1: where = weird`Main_go_info, address = 0x0000000100000f98

(lldb) target stop-hook add --one-liner "register read rbp r15"
Stop hook #1 added.

(lldb) run
Process 35298 launched: '/Users/dev/wt/weird/weird' (x86_64)
     rbp = 0x0000000000000000
     r15 = 0x0000000000000000
Process 35298 stopped
* thread #1: tid = 0xebc474, 0x0000000100000f98 weird`Main_go_info, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x0000000100000f98 weird`Main_go_info
weird`Main_go_info:
-> 0x100000f98:  leaq   -0x8(%rbp), %rax
   0x100000f9c:  cmpq   %r15, %rax
   0x100000f9f:  jb     0x100000fc0               ; Main_go_info + 40
   0x100000fa1:  leaq   0xd8818(%rip), %rax       ; Main_hPutChar_closure
     rbp = 0x0000000100405370
     r15 = 0x00000001004050c0

lldb started the program, stopped at the breakpoint, and told us the value of Sp and SpLim, as asked. It also shows us a disassembly of the code. The C-- code started with

if ((Sp + -8) < SpLim) goto c1aO;

which, if you recall, was to check if we might run into a stack overflow. This translates into

leaq   -0x8(%rbp), %rax
cmpq   %r15, %rax
jb     0x100000fc0               ; Main_go_info + 40

in Intel assembly language. The details don’t matter very much, and are beyond the scope of this blog post. If you haven’t seen Intel assembly language, leaq is “load effective address”, cmpq is “compare” and jb “jump if below”. We will not explain such details any further; hopefully you will be able to squint a bit and see that it resembles the C-- rather closely.

Before we do anything else, let’s see what’s on the stack before we start:

(lldb) memory read -format A -count 3 $rbp
0x100405370: 0x000000010009eb68 weird`stg_catch_frame_info
0x100405378: 0x0000000000000000
0x100405380: 0x00000001000dbf72 base_GHCziTopHandler_runIO2_closure + 2

So, the stack frame immediately above us when we start is a catch frame. Catch frames are pushed onto the stack when you use catch, and are used by the runtime to find which exception handler to run when an exception occurs. The catch frame has two additional fields in its payload: the handler to run (in this case, the default top-level handler), and a mask which indicates if asynchronous exceptions have been blocked or not (in this case, they haven’t).

We can use the step command from lldb to start stepping through the execution of the code. We check for stack overflow, find that there is none, and then set things up for the first case statement in go: we push a continuation address to the stack, put the argument to maskAsyncExceptions# in rbx (the machine equivalent of R1), and then call maskAsyncExceptions. At this point the stack therefore looks like

0x100405368: 0x0000000100000f68 weird`s19O_info
0x100405370: 0x000000010009eb68 weird`stg_catch_frame_info
0x100405378: 0x0000000000000000
0x100405380: 0x00000001000dbf72 base_GHCziTopHandler_runIO2_closure + 2

and we can confirm that rbx contains the tagged address of hPutChar

(lldb) image lookup --address $rbx
      Address: weird[0x00000001000d97c1] (weird.__DATA.__data + 1)
      Summary: Main_hPutChar_closure + 1

You can find the implementation of maskAsyncExceptions# as stg_maskAsyncExceptionszh in Exception.cmm in the rts/ directory of the ghc source (“zh” is the z-encoding of “#”). The details are not so important, however. It masks async exceptions by setting a flag in a register, pushes a frame onto the stack to unmask exceptions when we are done, and then calls the function (whose address is in R1). This means that we will end up in hPutChar, at which point the stack looks like

0x100405360: 0x000000010009e540 weird`stg_unmaskAsyncExceptionszh_ret_info
0x100405368: 0x0000000100000f68 weird`s19O_info
0x100405370: 0x000000010009eb68 weird`stg_catch_frame_info
0x100405378: 0x0000000000000000
0x100405380: 0x00000001000dbf72 base_GHCziTopHandler_runIO2_closure + 2

This continues on. hPutChar executes, completes, and runs the contination, which happens to be stg_unmaskAsyncExceptionszh_ret_info; this unmasks asynchronous exceptions, and continues with the continuation above it, which in this case is the original continuation from go. And after some more steps we end up back in go for the next recursive call, and the whole process repeats. However, the stack now looks like

0x100405368: 0x0000000100000f38 weird`s19N_info
0x100405370: 0x000000010009eb68 weird`stg_catch_frame_info
0x100405378: 0x0000000000000000
0x100405380: 0x00000001000dbf72 base_GHCziTopHandler_runIO2_closure + 2
0x100405388: 0x00000001000a46c0 weird`stg_stop_thread_info

Remember that go is not tail recursive, so once this recursive call completes, we still need to call the continuation from the previous invocation. And this repeats; if we run until the breakpoint again (using cont) and print the stack, we will see

0x100405360: 0x0000000100000f38 weird`s19N_info
0x100405368: 0x0000000100000f38 weird`s19N_info
0x100405370: 0x000000010009eb68 weird`stg_catch_frame_info
0x100405378: 0x0000000000000000
0x100405380: 0x00000001000dbf72 base_GHCziTopHandler_runIO2_closure + 2
0x100405388: 0x00000001000a46c0 weird`stg_stop_thread_info

and so on. This is precisely the behaviour that we were expecting: since the function is not tail recursive, we are using up more and more stack space and should eventually run out, and crash with a stack overflow. So why don’t we?

Stack overflow

After precisely 84 recursive calls, the stack looks like

0x1004050d0: 0x0000000100000f38 weird`s19N_info
...
0x100405368: 0x0000000100000f38 weird`s19N_info
0x100405370: 0x000000010009eb68 weird`stg_catch_frame_info
0x100405378: 0x0000000000000000
0x100405380: 0x00000001000dbf72 base_GHCziTopHandler_runIO2_closure + 2

with 84 s19N_info pointers in total. Moreover, the top of the stack (virtual register Sp, real register rbp) is now dangerously close to the stack limit (virtual register SpLim, real register r15):

(lldb) register read rbp r15
     rbp = 0x00000001004050d0
     r15 = 0x00000001004050c0

That is, we have 16 bytes left, or space for two addresses. This means that we have just enough stack space to make it to the entry point for hPutChar; we get there in the same way as before, and the stack now looks like

0x1004050c0: 0x000000010009e540 weird`stg_unmaskAsyncExceptionszh_ret_info
0x1004050c8: 0x0000000100000f68 weird`s19O_info
0x1004050d0: 0x0000000100000f38 weird`s19N_info
...
0x100405368: 0x0000000100000f38 weird`s19N_info
0x100405370: 0x000000010009eb68 weird`stg_catch_frame_info
0x100405378: 0x0000000000000000
0x100405380: 0x00000001000dbf72 base_GHCziTopHandler_runIO2_closure + 2

Note that we pushed two additional addresses onto the stack, which is therefore now full. Remember that hPutChar starts with a check for stack overflow:

if ((Sp + -8) < SpLim) goto c1ae;

c1ae:
    R1 = PicBaseReg + Main.hPutChar_closure;
    jump (I64[BaseReg - 8]); // [R1]

Actually, if we look at the unoptimized C-- minus instead (-ddump-cmm) we will see

c1ae:
    R1 = Main.hPutChar_closure;
    jump stg_gc_fun; // [R1]

instead, which is a lot clearer: we load the address of hPutChar into register R1 and then run the garbage collector. In the optimized C-- code we find the address of the garbage collector somewhere else, but we can verify in lldb that it’s the same thing:

(lldb) memory read -format A -count 1 $r13-8
0x1000e1290: 0x000000010009fcb0 weird`__stg_gc_fun

Either way, the garbage collector runs. It notices that we are out of stack space (as opposed to out of heap space; we carefully avoided any heap allocation in this test code), creates a new, bigger, stack, copies the old stack over, and then calls back into our function. In other words, we end up back at the start of hPutChar, but now the stack looks like:

0x1004fcd30: 0x000000010009e540 weird`stg_unmaskAsyncExceptionszh_ret_info
0x1004fcd38: 0x0000000100000f68 weird`s19O_info
0x1004fcd40: 0x0000000100000f38 weird`s19N_info
...
0x1004fcfd8: 0x0000000100000f38 weird`s19N_info
0x1004fcfe0: 0x000000010009eb68 weird`stg_catch_frame_info
0x1004fcfe8: 0x0000000000000000
0x1004fcff0: 0x00000001000dbf72 base_GHCziTopHandler_runIO2_closure + 2

which is the same stack as before, except at a different location; and, crucially, we have space on the stack again:

(lldb) register read rbp r15
rbp = 0x00000001004fcd30
r15 = 0x00000001004f50c0

so we can continue recursing.

Running out of stack space

Of course, we cannot continue increasing the stack forever; eventually we should reach the maximum stack size (8 MB by default in ghc 7.6). So why don’t we? Well, now that we understand the problem at this level of detail a quick Google search will reveal that this is due to a “bug” in ghc (depending on your definition of bug, I suppose) prior to version 7.8: when a stack overflow happens while asynchronous exceptions are blocked, the stack will be grown no matter what the limitation on the stack size is.

In this example, it just so happens that we run out of stack space when we try to push the continuation address in hPutChar, at which point asynchronous exceptions are indeed masked. Hence, this program will continue growing the stack unpunished, until we run out of machine memory completely. If we remove the case statement from hPutChar (or indeed, if we modify the program in a myriad of other, minor, ways) we would detect the stack overflow outside of the maskAsyncExceptions# and hence we would crash with a stack overflow exception when we reach the maximum stack size.

In the original program that we started with, the value of THRESHOLD determines how many continuation addresses for go are on the stack when we call (the real) putChar. We run out of stack space either while asynchronous exceptions were masked (somewhere deep inside the bowels of the real hPutChar), or outside it, and hence we would run forever or crash almost immediately, respectively. This is clearly not what one would expect, and with GHC 7.8 the problem has been resolved – a stack overflow is now treated like any other asynchronous exception, and the program will crash with a stack overflow as soon as asynchronous exceptions are no longer masked.

Further Reading

If you want to understand ghc’s runtime execution model How to make a fast curry: push/enter vs eval/apply is essential reading, although you can ignore the comparison with push/enter, which is not used in ghc. Faster Laziness Using Dynamic Pointer Tagging explains the pointer tagging trick we discussed above. Apart from these two papers, there are various other references that might be helpful. Be aware however that ghc is constantly evolving and many of these references may no longer precisely match what is in ghc right now.

Postscript

One of the many variations that I played with while tracking this bug down looked like

module Main (main) where

import Control.Exception
import Data.IORef
import System.IO.Unsafe (unsafePerformIO)

counter :: IORef Int
{-# NOINLINE counter #-}
counter = unsafePerformIO $ newIORef 0

hPutChar :: IO ()
hPutChar = mask_ $ do
  n <- readIORef counter
  writeIORef counter (n + 1)

go :: Int -> IO Int
go n =
  case n `rem` 2 of
    0 -> do hPutChar
            go (n + 1)
    _ -> do n' <- go (n + 1)
            return (n + n')

main :: IO ()
main = do  _ <- go 0 ; return ()

When you run this in 7.8 this program will crash with

Weird: internal error: scavenge_stack: weird activation record found on stack: 415597384
    (GHC version 7.8.2 for x86_64_apple_darwin)
    Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug
Abort trap: 6

This turns out to be due to an unrelated bug in 7.8.2; see #9045 (and #8866). It has already been fixed and should be released as part of 7.8.3.


Talk on Overloaded Record Fields for Haskell

Mon, 05 May 2014 14:26:14 GMT, by adam.
Filed under ghc, community, training.

Last week I gave a talk at Skills Matter about my work on a GHC extension to support reusing record field names, about which I have written previously. The video has been available since last week, thanks to the Skills Matter team, and the reddit discussion is already well underway. Now I’m (finally) making the slides available.


Pointwise Lenses

Thu, 24 Apr 2014 13:54:12 GMT, by edsko.
Filed under coding.

Pointwise Lenses

Lenses are a current hot topic in the Haskell community, with a bunch of packages providing implementations (data-accessor, fclabels, lens, amongst others). Although we will recall definitions, this post is not meant as an introduction to lenses. If you have not worked with lenses before, the talk from Simon Peyton Jones or the blog post by Sebastiaan Visser about fclabels are good starting points.

In this blog post we will propose a generalization to the lens representation used in fclabels and in many other packages (with various minor variations); we will consider the relation to the representation used in lens in a separate section.

If you wanted to follow along, this is the header I am using:

{-# LANGUAGE FlexibleInstances, RankNTypes, TupleSections #-}

import Prelude hiding ((.), id, const, curry, uncurry)
import Control.Arrow
import Control.Applicative
import Control.Category
import Control.Monad
import Control.Monad.Free
import Control.Monad.Trans.Class
import Data.Functor.Identity
import Data.Functor.Compose
import Data.Traversable
import qualified Data.Traversable as Traversable

-- We define some Show instances just for the examples

instance Show a => Show (Compose [] Identity a) where
  show (Compose a) = show a

instance Show a => Show (Compose [] (Compose [] Identity) a) where
  show (Compose a) = show a

instance Show a => Show (Identity a) where
  show (Identity a) = show a

Basics

A lens from a to b is a way to get a b from an a, and to modify an a given a modification of b:

data Lens a b = Lens { 
    lensGet    :: a -> b 
  , lensModify :: (b -> b) -> (a -> a)
  }

A simple example is a lens for the first component of a pair:

lensFst :: Lens (a, b) a
lensFst = Lens fst first

Importantly, lenses can be composed—they form a category:

instance Category Lens where
  id = Lens id id
  Lens g m . Lens g' m' = Lens (g . g') (m' . m)

Motivation

Suppose we have a lens from somewhere to a list of pairs:

lensFromSomewhere :: Lens Somewhere [(Int, Char)]

We would like to be able to somehow compose lensFromSomewhere with lensFst to get a lens from Somewhere to [Int]. The obvious thing to do is to try and define

mapLens :: Lens a b -> Lens [a] [b]
mapLens (Lens g m) = Lens (map g) _ 

The getter is easy enough: we need to get a [b] from a [a], and we have a function from b -> a, so we can just map. We get stuck in the modifier, however: we need to give something of type

([b] -> [b]) -> [a] -> [a]

given only a modifier of type (b -> b) -> (a -> a), and there is simply no way to do that.

If you think about it, there is a conceptual problem here too. Suppose that we did somehow manage to define a lens of type

weirdLens :: Lens [(Int, Char)] [Int]

This means we would have a modifier of type

weirdModify :: ([Int] -> [Int]) -> [(Int, Char)] -> [(Int, Char)]

What would happen if we tried

weirdModify (1 :)

to insert one Int into the list? Which (Int, Char) pair would we insert into the original list?

Pointwise lenses

What we wanted, really, is a lens that gave us a [Int] from a [(Int, Char)], and that modified a [(Int, Char)] given a modifier of type Int -> Int: we want to apply the modifier pointwise at each element of the list. For this we need to generalize the lens datatype with a functor f:

data PLens f a b = PLens {
    plensGet    :: a -> f b
  , plensModify :: (b -> b) -> (a -> a)
  }

It is easy to see that PLens is strictly more general than Lens: every lens is also a Pointwise lens by choosing Identity for f. Here’s a lens for the first component of a pair again:

plensFst :: PLens Identity (a, b) a
plensFst = PLens (Identity . fst) first

Note that the type of the modifier is precisely as it was before. As a simple but more interesting example, here is a lens from a list to its elements:

plensList :: PLens [] [a] a
plensList = PLens id map

You can think of plensList as shifting the focus from the set as a whole to the elements of the set, not unlike a zipper.

Composition

How does composition work for pointwise lenses?

compose :: Functor f => PLens g b c -> PLens f a b -> PLens (Compose f g) a c
compose (PLens g m) (PLens g' m') = PLens (Compose . fmap g . g') (m' . m)

The modifier is unchanged. For the getter we have a getter from a -> f b and a getter from b -> g c, and we can compose them to get a getter from a -> f (g c).

As a simple example, suppose we have

exampleList :: [[(Int, Char)]]
exampleList = [[(1, 'a'), (2, 'b')], [(3, 'c'), (4, 'd')]]

Then we can define a lens from a list of list of pairs to their first coordinate:

exampleLens :: PLens (Compose [] (Compose [] Identity)) [[(a, b)]] a
exampleLens = plensFst `compose` plensList `compose` plensList

Note that we apply the plensList lens twice and then compose with plensFst. If we get with this lens we get a list of lists of Ints, as expected:

> plensGet exampleLens exampleList 
[[1,2],[3,4]]

and we modify pointwise:

> plensModify exampleLens (+1) exampleList 
[[(2,'a'),(3,'b')],[(4,'c'),(5,'d')]]

Category Instance

As we saw in the previous section, in general the type of the lens changes as we compose. We can see from the type of a lens where the focus is: shifting our focus from a list of list, to the inner lists, to the elements of the inner lists:

PLens Identity                           [[a]] [[a]]
PLens (Compose [] Identity)              [[a]]  [a]
PLens (Compose [] (Compose [] Identity)) [[a]]   a

However, if we want to give a Category instance then we need to be able to keep f constant. This means that we need to be able to define a getter of type a -> f c from two getters of type a -> f b and b -> f c; in other words, we need f to be a monad:

instance Monad f => Category (PLens f) where
  id = PLens return id
  PLens g m . PLens g' m' = PLens (g <=< g') (m' . m)

This is however less of a restriction that it might at first sight seem. For our examples, we can pick the free monad on the list functor (using Control.Monad.Free from the free package):

plensFst' :: PLens (Free []) (a, b) a
plensFst' = PLens (Pure . fst) first

plensList' :: PLens (Free []) [a] a
plensList' = PLens lift map

We can use these as before:

> plensGet id exampleList :: Free [] [[(Int, Char)]]
Pure [[(1,'a'),(2,'b')],[(3,'c'),(4,'d')]]

> plensGet plensList' exampleList
Free [Pure [(1,'a'),(2,'b')],Pure [(3,'c'),(4,'d')]]

> plensGet (plensList' . plensList') exampleList
Free [Free [Pure (1,'a'),Pure (2,'b')],Free [Pure (3,'c'),Pure (4,'d')]]

> plensGet (plensFst' . plensList' . plensList') exampleList
Free [Free [Pure 1,Pure 2],Free [Pure 3,Pure 4]]

Note that the structure of the original list is still visible, as is the focus of the lens. (If we had chosen [] for f instead of Free [], the original list of lists would have been flattened.) Of course we can still modify the list, too:

> plensModify (plensFst' . plensList' . plensList') (+1) exampleList
[[(2,'a'),(3,'b')],[(4,'c'),(5,'d')]]

Comparison to Traversal

An alternative representation of a lens is the so-called van Laarhoven lens, made popular by the lens package:

type LaarLens a b = forall f. Functor f => (b -> f b) -> (a -> f a)

(this is the representation Simon Peyton-Jones mentions in his talk). Lens and LaarLens are isomorphic: we can translate from Lens to LaarLens and back. This isomorphism is a neat result, and not at all obvious. If you haven’t seen it before, you should do the proof. It is illuminating.

A Traversal is like a van Laarhoven lens, but using Applicative instead of Functor:

type Traversal a b = forall f. Applicative f => (b -> f b) -> (a -> f a)

Traversals have a similar purpose to pointwise lenses. In particular, we can define

tget :: Traversal a b -> a -> [b]
tget t = getConst . t (Const . (:[])) 

tmodify :: Traversal a b -> (b -> b) -> (a -> a)
tmodify t f = runIdentity . t (Identity . f) 

Note that the types of tget and tmodify are similar to types of the getter and modifier of a pointwise lens, and we can use them in a similar fashion:

travFst :: LaarLens (a, b) a
travFst f (a, b) = (, b) <$> f a 

travList :: Traversal [a] a
travList = traverse

exampleTrav :: Traversal [[(Int, Char)]] Int
exampleTrav = travList . travList . travFst

As before, we can use this traversal to modify a list of list of pairs:

> tmodify exampleTrav (+1) exampleList
[[(2,'a'),(3,'b')],[(4,'c'),(5,'d')]]

However, Traversals and pointwise lenses are not the same thing. It is tempting to compare the f parameter of the pointwise lens to the universally quantified f in the type of the Traversal, but they don’t play the same role at all. With pointwise lenses it is possible to define a lens from a list of list of pairs to a list of list of ints, as we saw; similarly, it would be possible to define a lens from a tree of pairs to a tree of ints, etc. However, the getter from a traversal only ever returns a single, flat, list:

> tget exampleTrav exampleList
[1,2,3,4]

Note that we have lost the structure of the original list. This behaviour is inherent in how Traversals work: every element of the structure is wrapped in a Const constructor and are then combined in the Applicative instance for Const.

On the other hand, the Traversal type is much more general than a pointwise lens. For instance, we can easily define

mapM :: Applicative m => (a -> m a) -> [a] -> m [a]
mapM = travList

and it is not hard to see that we will never be able to define mapM using a pointwise lens. Traversals and pointwise lenses are thus incomparable: neither is more general than the other.

In a sense the generality of the Traversal type is somewhat accidental, however: it’s purpose is similar to a pointwise lens, but it’s type also allows to introduce effectful modifiers. For pointwise lenses (or “normal” lenses) this ability is entirely orthogonal, as we shall see in the next section.

(PS: Yes, traverse, travList and mapM are all just synonyms, with specialized types. This is typical of using the lens package: it defines 14 synonyms for id alone! What you take away from that is up to you :)

Generalizing further

So far we have only considered pure getters and modifiers; what about effectful ones? For instance, we might want to define lenses into a database, so that our getter and modifier live in the IO monad.

If you look at the actual definition of a lens in fclabels you will see that it generalises Lens to use arrows:

data GLens cat a b = GLens {
    glensGet    :: cat a b
  , glensModify :: cat (cat b b, a) a
  }

(Actually, the type is slightly more general still, and allows for polymorphic lenses. Polymorphism is orthogonal to what we are discussing here and we will ignore it for the sake of simplicity.) GLens too forms a category, provided that cat satisfies ArrowApply:

instance ArrowApply cat => Category (GLens cat) where
  id = GLens id app
  (GLens g m) . (GLens g' m') = GLens (g . g') (uncurry (curry m' . curry m))

const :: Arrow arr => c -> arr b c
const a = arr (\_ -> a)

curry :: Arrow cat => cat (a, b) c -> (a -> cat b c)
curry m i = m . (const i &&& id)

uncurry :: ArrowApply cat => (a -> cat b c) -> cat (a, b) c
uncurry a = app . arr (first a)

The ArrowApply constraint effectively means we have only two choices: we can instantiate cat with ->, to get back to Lens, or we can instantiate it with Kleisli m, for some monad m, to get “monadic” functions; i.e. the getter would have type (isomorphic to) a -> m b and the modifier would have type (isomorphic to) (b -> m b) -> (a -> m a).

Can we make a similar generalization to pointwise lenses? Defining the datatype is easy:

data GPLens cat f a b = GPLens {
    gplensGet    :: cat a (f b)
  , gplensModify :: cat (cat b b, a) a
  }

The question is if we can still define composition.

Interlude: Working with ArrowApply

I personally find working with arrows horribly confusing. However, if we are working with ArrowApply arrows then we are effectively working with a monad, or so Control.Arrow tells us. It doesn’t however quite tell us how. I find it very convenient to define the following two auxiliary functions:

toMonad :: ArrowApply arr => arr a b -> (a -> ArrowMonad arr b)
toMonad f a = ArrowMonad $ app . (const (f, a))

toArrow :: ArrowApply arr => (a -> ArrowMonad arr b) -> arr a b
toArrow act = app . arr (\a -> (unArrowMonad (act a), ()))
  where
    unArrowMonad (ArrowMonad a) = a

Now I can translate from an arrow to a monadic function and back, and I just write monadic code. Right, now we can continue :)

Category instance for GPLens

Since the type of the modifier has not changed at all from GLens we can concentrate on the getters. For the identity we need an arrow of type cat a (f a), but this is simply arr return, so that is easy.

Composition is trickier. For the getter we have two getters of type cat a (f b) and cat b (f c), and we need a getter of type cat a (f c). As before, it looks like we need some kind of monadic (Kleisli) composition, but now in an arbitrary category cat. If you’re like me at this stage you will search Hoogle for

(ArrowApply cat, Monad f) => cat a (f b) -> cat b (f c) -> cat a (f c)

… and find nothing. So you try Hayoo and again, find nothing. Fine, we’ll have to try it ourselves. Let’s concentrate on the monadic case:

compM :: (Monad m, Monad f) 
      => (a -> m (f b)) -> (b -> m (f c)) -> a -> m (f c)
compM f g a = do fb <- f a
                 _

so far as good; fb has type f b. But now what? We can fmap g over fb to get something of type f (m (f c)), but that’s no use; we want that m on the outside. In general we cannot commute monads like this, but if you are a (very) seasoned Haskell programmer you will realize that if f happens to be a traversable functor then we can flip f and m around to get something of type m (f (f c)). In fact, instead of fmap and then commute we can use mapM from Data.Traversable to do both in one go:

compM :: (Monad m, Monad f, Traversable f) 
      => (a -> m (f b)) -> (b -> m (f c)) -> a -> m (f c)
compM f g a = do fb  <- f a
                 ffc <- Traversable.mapM g fb
                 _

Now we’re almost there: ffc has type f (f c), we need somthing of type f c; since f is a monad, we can just use join:

compM :: (Monad m, Monad f, Traversable f) 
      => (a -> m (f b)) -> (b -> m (f c)) -> a -> m (f c)
compM f g a = do fb  <- f a
                 ffc <- Traversable.mapM g fb
                 return (join ffc)

We can use the two auxiliary functions from the previous section to define Kleisli composition on arrows:

compA :: (ArrowApply cat, Monad f, Traversable f) 
      => cat a (f b) -> cat b (f c) -> cat a (f c)
compA f g = toArrow (compM (toMonad f) (toMonad g))

And now we can define our category instance:

instance (ArrowApply cat, Monad f, Traversable f) 
         => Category (GPLens cat f) where
  id = GPLens (arr return) app
  GPLens g m . GPLens g' m' = GPLens (g' `compA` g) 
                                     (uncurry (curry m' . curry m))

Note that the Traversable constraint comes up now because we need to commute the “effects” of two monads: the monad f from the structure that we are returning (be it a list or a tree or..) and the monad m implicit in the arrow. In a Traversal these two are somehow more closely coupled. In particular, if we lift a (pure) pointwise lens PLens to the more general GPLens, by picking Identity for f, the Traversable constraint is trivially satisfied.


Haskell gets static typing right

Thu, 17 Apr 2014 16:04:23 GMT, by andres.
Filed under community, training.

The following blog post originally appeared as a guest post on the Skills Matter blog. Well-Typed are regularly teaching both introductory or advanced Haskell courses at Skills Matter, and we will also be running a special course on Haskell’s type system.

Statically typed languages are often seen as a relic of the past – old and clunky. Looking at languages such as C and Java, we’re used to writing down a lot of information in a program that just declares certain variables to be of certain types. And what do we get in return? Not all that much. Yes, granted, some errors are caught at compile time. But the price is high: we’ve cluttered up the code with noisy declarations. Often, code has to be duplicated or written in a more complicated way, just to satisfy the type checker. And then, we still have a significant risk of run-time type errors, because type casting is common-place and can fail at unexpected moments.

So it isn’t a big surprise that dynamically typed languages are now very fashionable. They promise to achieve much more in less time, simply by getting rid of static type checking.

However, I want to argue that we shouldn’t be too keen on giving up the advantages of static types, and instead start using programming languages that get static typing right. Many functional languages such as Scala, F#, OCaml and in particular Haskell are examples of programming languages with strong static type systems that try not to get in the way, but instead guide and help the programmer.

In the rest of this post, I want to look at a few of the reasons why Haskell’s type system is so great. Note that some of the features I’m going to discuss are exclusive to Haskell, but most are not. I’m mainly using Haskell as a vehicle to advertise the virtues of good static type systems.

1. Type inference

Type inference makes the compiler apply common sense to your programs. You no longer have to declare the types of your variables, the compiler looks at how you use them and tries to determine what type they have. If any of the uses are inconsistent, then a type error is reported. This removes a lot of noise from your programs, and lets you focus on what’s important.

Of course, you are still allowed to provide explicit type signatures, and encouraged to do so in places where it makes sense, for example, when specifying the interface of your code.

2. Code reuse

Nothing is more annoying than having to duplicate code. In the ancient days of statically typed programming, you had to write the same function several times if you wanted it to work for several types. These days, most languages have “generics” that allow you to abstract over type parameters. In Haskell, you just write a piece of code that works for several types, and type inference will tell you that it does, by inferring a type that is “polymorphic”. For example, write code that reverses all the elements of a data structure, and type inference will tell you that your code is independent of the type of elements of the data structure, so it’ll just work regardless of what element type you use. If you write code that sorts a data structure, type inference will figure out that all you require to know about the elements is that they admit an ordering.

3. No run-time type information by default

Haskell erases all type information after type checking. You may think that this is mainly a performance issue, but it’s much more than that. The absence of run-time type information means that code that’s polymorphic (i.e., type-agnostic, see above) cannot access certain values. This can be a powerful safety net. For example, just the type signature of a function can tell you that the function could reorder, delete or duplicate elements in a data structure, but not otherwise touch them, modify them or operate on them in any other way. Whereas in the beginning of this post I complained that bad static type systems don’t allow you to do what you want because they’re not powerful enough, here we can deliberately introduce restrictions to save us (as well as colleagues) from accidental mistakes. So polymorphism turns out to be much more than just a way to reduce code duplication.

By the way, Haskell gives you various degrees of selective run-time typing. If you really need it in places, you can explicitly attach run-time type information to values and then make type-based decisions. But you say where and when, making a conscious choice that you gain flexibility at the cost of safety.

4. Introducing new datatypes made easy

In Haskell, it’s a one-liner to define a new datatype with a new name that has the same run-time representation as an existing type, yet is treated as distinct by the type system. (This may sound trivial, but surprisingly many statically typed languages get it wrong.) So for example it’s easy to define lots of different types that are all integers internally: counters, years, quantities, … In Haskell, this simple feature is often used to define safe boundaries: a specific type for URLs, a specific type for SQL queries, a specific type for HTML documents, and so on. Each of these types then comes with specific functions to operate on them. All such operations guarantee that whenever you have a value of this type, it’s well-formed, and whenever you render a value of this type, it’s syntactically correct and properly escaped.

5. Explicit effects

In virtually all programming languages, a function that performs some calculations on a few numbers and another function that performs the same calculations, but additionally sends a million spam emails to addresses all over the world, have exactly the same type, and therefore the same interface. Not so in Haskell. If a function writes to the screen, reads from the disk, sends messages over the network, accesses the system time, or makes use of any other so-called side effect, this is visible in its type. This has two advantages: first, it makes it much easier to rely on other people’s code. If you look at the interface and a function is effect-free, then you for example automatically know that it is also thread-safe. Second, the language facilitates a design where side effects are isolated into relatively small parts of the code base. This may seem difficult to achieve for highly stateful systems, but surprisingly, it usually is not: even interactive systems can usually be described as pure functions reacting to a series of requests with appropriate responses, and a separate driver that does the actual communication. Such separation makes it not only easier to test the program, but also facilitates the evolution of the program such, for example, to adapt it to run in a different environment. Haskell’s type system therefore encourages good design.

6. Types as a guide in program development

If you only ever see types as a source of errors, and therefore as enemies on your path of getting your program accepted, you’re doing them injustice. Types as provided by Haskell are an element of program design. If you give your program precise types and follow systematic design principles, your program almost writes itself. Programming with a strong type system is comparable to playing a puzzle game, where the type system removes many of the wrong choices and helpfully guides you to take the right path. This style of programming is supported by a new language extension called “Typed Holes” where you leave parts of your program unspecified during development, and obtain feedback from the development environment about what type has to go into the hole, and what program fragments you have available locally to construct a value of the desired type. Playing this kind of puzzle game is actually quite fun!

7. Programming on the type level

Haskell’s type system provides many advanced features that you don’t usually have to know about, but that can help you if you want to ensure that some complicated invariants hold in your program. Scarily named concepts such as “higher-ranked polymorphism”, “generalized algebraic datatypes” and “type families” essentially provide you with a way to write programs that compute with types. The possibilities are nearly endless. From playful things such as writing a C-printf-style function where the first argument determines the number of arguments that are expected afterwards as well as their types, you can go on to code that provides useful guarantees such as that mutable references that are available within one thread of control are guaranteed not to be accessed in a completely different context, arrays that can adapt to different internal representations depending on what type of values they contain, working with lists that are guaranteed to be of a specific length, or with trees that are guaranteed to be balanced, or with heterogeneous lists (where each element can be of a different type) in a type-safe way. The goal is always to make illegal inputs impossible to construct. If they’re impossible to construct by the type system, you can isolate sanity tests at the boundary of your code, rather than having to do them over and over again. The good thing is that these features are mostly optional, and often hardly affect the interface of libraries. So as a user, you can benefit from libraries employing such features and having extra safety guarantees internally. As a library writer, you can choose whether you’re happy with the normal level of Haskell type safety (which is already rather a lot), or if you want to spend some extra effort and get even more.

If my overview has tempted you and you now want to learn more about Haskell, you’re welcome follow one of my introductory or advanced Haskell courses that I (together with my colleagues at Well-Typed) regularly teach at Skills Matter. These courses do not just focus on the type system of Haskell (although that’s a significant part). They introduce the entire language in a hands-on way with lots of examples and exercises, as well as providing guidelines on how to write idiomatic Haskell and how to follow good development practices.

If you already know some Haskell and are particularly interested in the advanced type system features mentioned in point 7, we also offer a new one-day course on Haskell’s type system that specifically focuses on how far you can push it.


Fixing foldl

Tue, 01 Apr 2014 10:38:20 GMT, by duncan.
Filed under community.

The foldl function is broken. Everyone knows it’s broken. It’s been broken for nearly a quarter of a century. We should finally fix it!

Today I am proposing that Prelude.foldl be redefined using the implementation currently known as Data.List.foldl'.

foldl is broken!

I’m sure you knew that already, but just in case…

Have you ever noticed that Haskellers usually recommend using either foldr or foldl' but not foldl? For example Real World Haskell has this to say:

Due to the thunking behaviour of foldl, it is wise to avoid this function in real programs: even if it doesn’t fail outright, it will be unnecessarily inefficient. Instead, import Data.List and use foldl'.

In the online version of the book the first user comments on that paragraph are

Why isn’t Data.List foldl implementation placed in Prelude?

I second the question: Why isn’t foldl' the default?

Good question.

Ok, so obviously we’re talking about the difference between foldl and foldl':

foldl :: (b -> a -> b) -> b -> [a] -> b
foldl f a []     = a
foldl f a (x:xs) = foldl f (f a x) xs

foldl' :: (b -> a -> b) -> b -> [a] -> b
foldl' f a []     = a
foldl' f a (x:xs) = let a' = f a x in a' `seq` foldl' f a' xs

The dry technical difference is that foldl' evaluates the call to f before making the next recursive call. The consequences of that are perhaps not immediately obvious, so we’ll take a step back and look at a slightly bigger picture.

Folding left and right

When we first learn Haskell we learn that there are two ways to fold a list, from the left or right.

foldl f z [x1, x2, ..., xn] = (...((z `f` x1) `f` x2) `f`...) `f` xn

foldr f z [x1, x2, ..., xn] = x1 `f` (x2 `f` ... (xn `f` z)...)

Saying “from the left” or “from the right” is a description of what foldl and foldr calculate, with the parenthesis nesting to the left or to the right. At runtime of course we always have to start from the left (front) of the list.

We later learn other ways of thinking about left and right folds, that a left fold can be used like a classic loop where we go through the whole list eagerly, while a right fold can be used like a demand-driven iterator. For the left fold that means using a function that is strict in its first argument (like (+)) while for the right fold that means using a function that is not strict in its second argument (like (:)).

Indeed when looking at whether we want foldl or foldr in any particular case our choice is usually governed by whether we want “all at once” behaviour (foldl) or if we want incremental or short-cut behaviour (foldr).

Accumulating thunks

Again, as we are learning Haskell, we get told that foldl has this crazy behaviour

foldl (+) 0 (1:2:3:[])
          =  foldl (+) (0 + 1)             (2:3:[])
          =  foldl (+) ((0 + 1) + 2)       (3:[])
          =  foldl (+) (((0 + 1) + 2) + 3) []
          =            (((0 + 1) + 2) + 3)

when what we had in mind when we thought of an accumulating loop was

foldl' (+) 0 (1:2:3:[])
          =  foldl' (+) 1 (2:3:[])
          =  foldl' (+) 3 (3:[])
          =  foldl' (+) 6 []
          =             6

Of course that’s just what foldl' does, it evaluates the call to + before making the next recursive call.

When is foldl (rather than foldl') useful?

The short answer is “almost never”.

As beginners we often assume that foldl must still make sense in some cases (or why have both?) but it turns out that’s not really the case.

When the f argument of foldl is a strict function then delaying the evaluation does not gain us anything as it all has to be evaluated at the end anyway. The only time when delaying the evaluation could save us anything is when the f function is not strict in its first argument – in which case you either don’t care or probably should have been using foldr in the first place.

In fact even if our f function is non-strict in its first argument, we probably do not gain anything from delaying evaluation and it usually does no harm to evaluate earlier. Remember that we still have to traverse the whole list, we don’t get any short-cutting behaviour like with foldr.

We can, if we think about it, construct examples where foldl' would be too strict. We could define last and last' like this:

last  = foldl  (\_ y -> y) (error "empty list")

last' = foldl' (\_ y -> y) (error "empty list")

Now if we try

> last [1,undefined,3]
3
> last' [1,undefined,3]
*** Exception: Prelude.undefined

This is because our accumulator is always the latest element that we’ve seen but we don’t actually want to evaluate the elements (except the last one).

So it’s true that foldl' fails in this case, but it’s also a silly definition, the usual definition is a lot clearer

last [x]    = x
last (_:xs) = last xs
last []     = error "empty list"

That goes for pretty much all the other examples you might be able to think of where foldl would work but foldl' would not: the examples are either artificial or are clearer written in other ways.

People sometimes point out that sum is defined using foldl and not foldl' and claim that this is something to do with Haskell’s designers wanting to allow Num instances where (+) might be lazy. This is pretty much nonsense. If that were the case then sum would have been defined using foldr rather than foldl to properly take advantage of short-cutting behaviour. A much simpler explanation is that foldl' was not available in early versions of Haskell when sum was defined.

In nearly 15 years as a Haskell programmer I think I’ve specifically needed foldl rather than foldl' about three times. I say “about” because I can only actually remember one. That one case was in a slightly cunning bit of code for doing cache updates in a web server. It would almost certainly have been clearer as a local recursion but I was amused to find a real use case for foldl and couldn’t help myself from using it just for fun. Of course it needed a comment to say that I was using it on purpose rather than by mistake!

So why do we have foldl and foldl'?

If foldl is almost always a mistake (or merely benign) then why do we have it in the first place?

I don’t know for sure, but here’s my guess…

When Haskell 1.0 was published on this day 24 years ago there was no seq function at all, so there was no choice but to define foldl in the “classic” way.

Eventually, six years later after much discussion, we got the seq function in Haskell 1.3. Though actually in Haskell 1.3 seq was part of an Eval class, so you couldn’t use it just anywhere, such as in foldl. In Haskell 1.3 you would have had to define foldl' with the type:

foldl' :: Eval b => (b -> a -> b) -> b -> [a] -> b

Haskell 1.4 and Haskell 98 got rid of the Eval class constraint for seq but foldl was not changed. Hugs and GHC and other implementations added the non-standard foldl'.

I suspect that people then considered it a compatibility and inertia issue. It was easy enough to add a non-standard foldl' but you can’t so easily change the standard.

I suspect that if we had had seq from the beginning then we would have defined foldl using it.

Miranda, one of Haskell’s predecessor languages, already had seq 5 years before Haskell 1.0.

A strict foldl in Orwell!

Orwell is an interesting case. Orwell was another Haskell predecessor, very similar to Miranda and early Haskell. An informed source told me that Orwell had defined its foldl in the way that we now define foldl', ie with strict evaluation. Information on Orwell is a little hard to get ahold of online these days so I asked Philip Wadler. Phil very kindly fished out the manuals and looked up the definitions for me.

In the original version:

An Introduction to Orwell
(DRAFT)
Philip Wadler
1 April 1985

In the standard prelude:

lred f a []  =  a
lred f a (x:xs) = lred f (f a x) xs

But five years later, by the time Haskell 1.0 is being published…

An Introduction to Orwell 6.00
by Philip Wadler
revised by Quentin Miller
Copyright 1990 Oxford University Computing Lab

In the standard prelude:

foldl :: (a -> b -> a) -> a -> [b] -> a
foldl f a []  =  a
foldl f a (x:xs)  =  strict (foldl f) (f a x) xs

Note the use of strict. Presumably Orwell’s strict function was defined as (or equivalent to)

strict :: (a -> b) -> a -> b
strict f x = x `seq` f x

(These days in Haskell we call this function ($!).)

So my source was right, Orwell did change foldl to be the strict version!

I contend that this was and is the right decision, and that it was just a consequence of the late arrival of seq in Haskell and inertia and fears about backwards compatibility that have kept us from fixing foldl.

Just do it!

It’d help all of us who are sometimes tempted to use foldl because we can’t be bothered to import Data.List. It’d help confused beginners. It’d save teachers from the embarrassment of having to first explain foldl and then why you should never use it.

Orwell fixed this mistake at least 24 years ago, probably before Haskell 1.0 was released. Just because it’s an old mistake doesn’t mean we shouldn’t fix it now!

A postscript: which foldl'?

I hate to complicate a simple story but I should admit that there are two plausible definitions of foldl' and I’ve never seen any serious discussion of why we use one rather than the other (I suspect it’s another historical accident).

So the version above is the “standard” version, perhaps more clearly written using bang patterns as

foldl' :: (b -> a -> b) -> b -> [a] -> b
foldl' f a []     = a
foldl' f a (x:xs) = let !a' = f a x
                     in foldl' f a' xs

But an equally plausible version is

foldl'' :: (b -> a -> b) -> b -> [a] -> b
foldl'' f !a []     = a
foldl'' f !a (x:xs) = foldl'' f (f a x) xs

The difference is this: in the first version we evaluate the new accumulating parameter before making the recursive call, while in the second version we ensure that the accumulating parameter is always evaluated (to WHNF) on entry into each call.

These two ways of forcing the evaluation have almost the same effect. It takes a moment or two to find a case where it makes a difference, but here’s one:

foldl'  (\_ y -> y) undefined [1] = 1
foldl'' (\_ y -> y) undefined [1] = undefined

The standard foldl' ensures that all the new accumulating parameter values are evaluated, but still allows the initial value to be unevaluated. The alternative version simply says that the accumulating parameter is always evaluated.

The second version is attractive from a code generation point of view. One of the clever things that GHC can do with foldl' (and strict function arguments generally) is to unbox the values, so for example an Int can be unboxed to a Int# and passed in registers rather than on the heap. With the standard foldl' it needs a special case for the first iteration of the loop where the initial value of the accumulating parameter which might not be evaluated yet. With foldl'', that’s not a problem, we can unbox things right from the start. In practice, GHC can often see that the initial value is evaluated anyway, but not always.

(Don Stewart and I noticed this a few years back when we were working on stream fusion for lists. We had defined foldl' on streams in a way that corresponded to the second form above and then got a test failure when doing strictness testing comparing against the standard list functions.)

So if we’re going to fix foldl to be the strict version, then perhaps it should be the fully strict version, not just the “strict after the first iteration” version.


Fun and Profit with Strongly-Typed Data Schemas

Tue, 01 Apr 2014 07:24:01 GMT, by adam.
Filed under community, training, well-typed.

Over the past few months, Duncan and I have been working with Chris Dornan and Alfredo Di Napoli on api-tools, a DSL for specifying data schemas for REST-like APIs in Haskell. If you’re interested in the real-world use of Haskell, static types and DSLs, why not come along to hear Chris talk about it?

Wednesday 9th April, 6:30pm, London

Find out more and register for free over at Skills Matter:

Typical business apps store structured data, transform it and send it hither and thither. They are typically made of multiple components that have to agree on the schema of the data they exchange. So a large part of what it means to be “flexible” is for it to be easy to modify and extend the schema of the data that the system handles.

Strong typing can help with this, ensuring that the code that accesses the data is consistent with the schema. One idea that has been tried with databases is to generate Haskell types from the database schema, or generate the database schema from the Haskell types, or both from some standalone schema.

In this talk we will describe how we have applied this same idea but for REST APIs. We have a DSL that we use to specify the schema of the data available via a REST API. With a machine description of the schema we can generate the code and documentation for many parts of the system, and of course those parts are then easily adaptable when the schema changes. We generate Haskell types, JSON conversion functions, REST API documentation, disk persistence and more. We also have a story for managing schema changes and migrating data. This system is in use at Iris Connect and is now available on Hackage.

This talk will also discuss further applications we have found for these tools and some of the experience from the development project at Iris Connect, including problems we have had to solve building the Haskell components of the system and the wider challenge of integrating it into the overall development project.

And if that isn’t enough for you, Well-Typed’s Haskell courses are back at the end of April, with an all-new course on advanced features of the Haskell type system. Stay tuned for more events coming soon…


(New) Haskell Courses in London, April/May 2014

Wed, 19 Mar 2014 13:51:39 GMT, by andres.
Filed under well-typed, training.

On May 2, I’m going to teach an all new course at Skills Matter in London:

Well-Typed’s Guide to the Haskell Type System

May 2, 2014 in London (1 day)
£595 + VAT

This course is going to cover advanced type system features of Haskell and GHC in detail. It is aimed at developers who are fascinated by Haskell and the power of its type system, and want to know what’s possible and learn how to get even more out of it.

Among other topics, we’re going to cover

The course includes exercises, and we’re going to look at actual code on Hackage and see how these advanced features of the type system are being used in practice.

You don’t need to be a Haskell expert in order to participate, but familiarity with the basics will be assumed.

The course is open for registration via the Skills Matter site.

Regular courses

In the same week, we’re also offering our regular two-day introductory and advanced Haskell courses again:

Well-Typed’s Fast Track to Haskell

April 28–29 in London (2 days)
£1195 + VAT

This highly practical course teaches you the basics of Haskell. Functions and datatypes, pattern matching, higher-order functions, input and output, and (of course) monads.

Well-Typed’s Advanced Haskell

April 30–May 1 in London (2 days)
£1195 + VAT

In this hands-on course, you’ll learn about data structures, how data is represented in memory, reasoning about lazy evaluation and performance, parallelism and concurrency, and design patterns such as monads, applicative functors, and monad transformers.

Questions?

If you have any questions about which course might be the right one for you, or if you’d be interested in a course that isn’t listed here, please check our training page for further information or email us.


Previous entries