Debugging Haskell at assembly level
by scripting lldb in Python

Edsko de Vries – Friday, 01 August 2014

Haskell programmers tend to spend far less time with debuggers than programmers in other languages. Partly this is because for pure code debuggers are of limited value anyway, and Haskell’s type system is so strong that a lot of bugs are caught at compile time rather than at runtime. Moreover, Haskell is a managed language – like Java, say – and errors are turned into exceptions. Unlike in unmanaged languages such as C “true” runtime errors such as segmentation faults almost never happen.

I say “almost” because they can happen: either because of bugs in ghc or the Haskell runtime, or because we are doing low level stuff in our own Haskell code. When they do happen we have to drop down to a system debugger such as lldb or gdb, but debugging Haskell at that level can be difficult because Haskell’s execution model is so different from the execution model of imperative languages. In particular, compiled Haskell code barely makes any use of the system stack or function calls, and uses a continuation passing style instead (see my previous blog posts Understanding the Stack and Understanding the RealWorld). In this blog post I will explain a technique I sometimes use to help diagnose low-level problems.

Since I work on OSX I will be using lldb as my debugger. if you are using gdb you can probably use similar techniques; The LLDB Debugger shows how gdb and lldb commands correlate, and the ghc wiki also lists some tips. However, I have no experience with scripting gdb so your mileage may vary.

Description of the problem

As our running example I will use a bug that I was tracking down in a client project. The details of the project don’t matter so much, except that this project happens to use the GHC API to compile Haskell code—at runtime—into bytecode and then run it; moreover, it also—dynamically, at runtime—loads C object files into memory.

In one example run it loads the (compiled) C code

#include <stdio.h>

void hello(void) { 
  printf("hello\n"); 
}

and then compiles and runs this Haskell code:

{-# LANGUAGE ForeignFunctionInterface #-}
module Main where

foreign import ccall "hello" hello :: IO ()

main = hello

Sadly, however, this resulted in total system crash.

Starting point

By attaching lldb to the running process we got a tiny bit more information about the crash:

* thread #4: tid = 0x3550aa, 0x000000010b3b8226, stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x000000010b3b8226
-> 0x10b3b8226:  addb   %al, (%rax)
   0x10b3b8228:  addb   %al, (%rax)
   0x10b3b822a:  addb   %al, (%rax)
   0x10b3b822c:  addb   %al, (%rax)

It turns out we have a null-pointer dereferencing here. Anybody who has spent any time debugging Intel assembly code however will realize that this particular instruction

addb   %al, (%rax)

is in fact the decoding of zero:

(lldb) memory read -c 8 0x10b3b8226
0x10b3b8226: 00 00 00 00 00 00 00 00                          ........

In other words, chances are good we were never meant to execute this instruction at all. Unfortunately, asking lldb for a backtrace tells us absolutely nothing new:

(lldb) bt
* thread #4: tid = 0x3550aa, 0x000000010b3b8226, stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x000000010b3b8226

Finding a call chain

The lack of a suitable backtrace in lldb is not surprising, since compiled Haskell code barely makes use of the system stack. Instead, the runtime maintains its own stack, and code is compiled into a continuation passing style. For example, if we have the Haskell code

functionA :: IO ()
functionA = do .. ; functionB ; ..

functionB :: ()
functionB = do .. ; functionC ; ..

functionC :: IO ()
functionC = .. crash ..

main :: IO ()
main = functionA

and we step through the execution of this program in lldb, and we ask for a backtrace when we start executing function A all we get is

(lldb) bt
* thread #1: tid = 0x379731, 0x0000000100000a20 Main`A_functionA_info, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
  * frame #0: 0x0000000100000a20 Main`A_functionA_info

with no mention of main. Similarly, the backtraces on entry to functions B and C are

* thread #1: tid = 0x379731, 0x0000000100000b90 Main`B_functionB_info, queue = 'com.apple.main-thread', stop reason = breakpoint 2.1
  * frame #0: 0x0000000100000b90 Main`B_functionB_info

and

* thread #1: tid = 0x379731, 0x0000000100000c88 Main`C_functionC_info, queue = 'com.apple.main-thread', stop reason = breakpoint 3.1
  * frame #0: 0x0000000100000c88 Main`C_functionC_info

none of which is particularly informative. However, stepping manually through the program we do first see function A on the (singleton) call stack, then function B, and finally function C. Thus, by the time we reach function C, we have discovered a call chain A, B, C—it’s just that it involves quite a bit of manual work.

Scripting `lldb`

Fortunately, lldb can be scripted (see Using Scripting and Python to Debug in LLDB and the LLDB Python Reference). What we want to do is keep stepping through the code, showing the top-level (and quite possibly only) function at the top of the call stack at each step, until we crash.

We can use the following Python script to do this:

import lldb

def step_func(debugger, command, result, internal_dict):
  thread = debugger.GetSelectedTarget().GetProcess().GetSelectedThread()
  
  while True:
    thread.StepOver()

    stream = lldb.SBStream()
    thread.GetStatus(stream)
    description = stream.GetData()
    print description

    if thread.GetStopReason() == lldb.eStopReasonException:
      break

def __lldb_init_module(debugger, dict):
  debugger.HandleCommand('command script add -f %s.step_func sf' % __name__)

For the above example, we might use this as follows: we load our application into lldb

# lldb Main
Current executable set to 'Main' (x86_64).

(lldb) command script import mystep.py

set a breakpoint where we want to start stepping

(lldb) breakpoint set -n A_functionA_info
Breakpoint 1: where = Main`A_functionA_info, address = 0x0000000100000b90

run to the breakpoint:

(lldb) run
Process 54082 launched: 'Main' (x86_64)
Process 54082 stopped
* thread #1: tid = 0x384510, 0x0000000100000b90 Main`A_functionA_info, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x0000000100000b90 Main`A_functionA_info

and then use sf to find a call-trace until we crash:

(lldb) sf
...

* thread #1: tid = 0x384510, 0x0000000100000bf0 Main`B_functionB_info, queue = 'com.apple.main-thread', stop reason = instruction step over
    frame #0: 0x0000000100000bf0 Main`B_functionB_info
Main`B_functionB_info:

...

* thread #1: tid = 0x384510, 0x0000000100000c78 Main`C_functionC_info, queue = 'com.apple.main-thread', stop reason = instruction step over
    frame #0: 0x0000000100000c78 Main`C_functionC_info
Main`C_functionC_info:

...

* thread #1: tid = 0x384510, 0x0000000100000d20 Main`crash + 16 at crash.c:3, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x0000000100000d20 Main`crash + 16 at crash.c:3

Note that if you are using the threaded runtime, you may have to select which thread you want to step through before calling sf:

(lldb) thread select 4
(lldb) sf

Tweaking the script

You will probably want to tweak the above script in various ways. For instance, in the application I was debugging, I wanted to step into each assembly language instruction but over each C function call, mostly because lldb was getting confused with the call stack. I also added a maximum step count:

def step_func(debugger, command, result, internal_dict):
  args = shlex.split(command)
 
  if len(args) > 0:
    maxCount = int(args[0])
  else:
    maxCount = 100

  thread = debugger.GetSelectedTarget().GetProcess().GetSelectedThread()
  i      = 0;

  while True:
    frame = thread.GetFrameAtIndex(0)
    file  = frame.GetLineEntry().GetFileSpec().GetFilename()
    inC   = type(file) is str and file.endswith(".c")
   
    if inC:
      thread.StepOver()
    else:
      thread.StepInstruction(False)

    stream = lldb.SBStream()
    thread.GetStatus(stream)
    description = stream.GetData()

    print i
    print description

    i += 1;
    if thread.GetStopReason() == lldb.eStopReasonException or i > maxCount:
      break

You may want to tweak this step into/step over behaviour to suit your application; certainly you don’t want to have a call trace involving every step taken in the Haskell RTS or worse, in the libraries it depends on.

Back to the example

Rather than printing every step along the way, it may also be useful to simply remember the step-before-last and show that on a crash; often it is sufficient to know what happened just before the crash. Indeed, in the application I was debugging the call stack just before the crash was:

2583
  thread #3: tid = 0x35e743, 0x00000001099da56d libHSrts_thr_debug-ghc7.8.3.20140729.dylib`schedule(initialCapability=0x0000000109a2f840, task=0x00007fadda404550) + 1533 at Schedule.c:470, stop reason = step over
    frame #0: 0x00000001099da56d libHSrts_thr_debug-ghc7.8.3.20140729.dylib`schedule(initialCapability=0x0000000109a2f840, task=0x00007fadda404550) + 1533 at Schedule.c:470
   467 	    }
   468 	    
   469 	    case ThreadInterpret:
-> 470 		cap = interpretBCO(cap);
   471 		ret = cap->r.rRet;
   472 		break;
   473 		

2584
  thread #3: tid = 0x35e743, 0x0000000103106226, stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x0000000103106226
-> 0x103106226:  addb   %al, (%rax)
   0x103106228:  addb   %al, (%rax)
   0x10310622a:  addb   %al, (%rax)
   0x10310622c:  addb   %al, (%rax)

Which is a lot more helpful than the backtrace, as we now have a starting point: something went wrong when running the bytecode interpreter (remember that the application was compiling and running some Haskell code at runtime).

To pinpoint the problem further, we can set a breakpoint in interpretBCO and run sf again (the way we defined sf it steps over any C function calls by default). This time we get to:

4272
  thread #4: tid = 0x35f43a, 0x000000010e77c548 libHSrts_thr_debug-ghc7.8.3.20140729.dylib`interpretBCO(cap=0x000000010e7e7840) + 18584 at Interpreter.c:1463, stop reason = step over
    frame #0: 0x000000010e77c548 libHSrts_thr_debug-ghc7.8.3.20140729.dylib`interpretBCO(cap=0x000000010e7e7840) + 18584 at Interpreter.c:1463
   1460		    tok = suspendThread(&cap->r, interruptible ? rtsTrue : rtsFalse);
   1461	
   1462		    // We already made a copy of the arguments above.
-> 1463	            ffi_call(cif, fn, ret, argptrs);
   1464	
   1465	            // And restart the thread again, popping the stg_ret_p frame.
   1466		    cap = (Capability *)((void *)((unsigned char*)resumeThread(tok) - STG_FIELD_OFFSET(Capability,r)));

4273
  thread #4: tid = 0x35f43a, 0x0000000107eba226, stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x0000000107eba226
-> 0x107eba226:  addb   %al, (%rax)
   0x107eba228:  addb   %al, (%rax)
   0x107eba22a:  addb   %al, (%rax)
   0x107eba22c:  addb   %al, (%rax)

Ok, now we are really getting somewhere. Something is going wrong when are doing a foreign function call. Let’s re-run the application once more, setting a breakpoint at ffi_call:

(lldb) breakpoint set -n ffi_call
Breakpoint 1: where = libffi.dylib`ffi_call + 29 at ffi64.c:421, address = 0x0000000108e098dd
(lldb) cont
Process 51476 resuming
Process 51476 stopped
* thread #4: tid = 0x360fd3, 0x0000000108e098dd libffi.dylib`ffi_call(cif=0x00007fb262f00000, fn=0x00000001024a4210, rvalue=0x000000010b786ac0, avalue=0x000000010b7868a0) + 29 at ffi64.c:421, stop reason = breakpoint 1.1
    frame #0: 0x0000000108e098dd libffi.dylib`ffi_call(cif=0x00007fb262f00000, fn=0x00000001024a4210, rvalue=0x000000010b786ac0, avalue=0x000000010b7868a0) + 29 at ffi64.c:421
   418 	  /* If the return value is a struct and we don't have a return value
   419 	     address then we need to make one.  Note the setting of flags to
   420 	     VOID above in ffi_prep_cif_machdep.  */
-> 421 	  ret_in_memory = (cif->rtype->type == FFI_TYPE_STRUCT
   422 			   && (cif->flags & 0xff) == FFI_TYPE_VOID);
   423 	  if (rvalue == NULL && ret_in_memory)
   424 	    rvalue = alloca (cif->rtype->size);

and let’s take a look at the function we’re about to execute:

(lldb) disassemble -s fn
   0x1024a4210:  pushq  %rbp
   0x1024a4211:  movq   %rsp, %rbp
   0x1024a4214:  leaq   (%rip), %rdi
   0x1024a421b:  popq   %rbp
   0x1024a421c:  jmpq   0x1024a4221
   0x1024a4221:  pushq  $0x6f6c6c65
   0x1024a4226:  addb   %al, (%rax)
   0x1024a4228:  addb   %al, (%rax)
   0x1024a422a:  addb   %al, (%rax)
   0x1024a422c:  addb   %al, (%rax)
   0x1024a422e:  addb   %al, (%rax)

We were expecting to execute hello:

# otool -tV hello.o 
hello.o:
(__TEXT,__text) section
_hello:
0000000000000000	pushq	%rbp
0000000000000001	movq	%rsp, %rbp
0000000000000004	leaq	L_str(%rip), %rdi ## literal pool for: "hello"
000000000000000b	popq	%rbp
000000000000000c	jmpq	_puts

and if you compare this with the code loaded into memory it all becomes clear. The jump instruction in the object file

jmpq _puts

contains a symbolic reference to puts; but the jump in the code that we are about to execute in fact jumps to the next instruction in memory:

0x1024a421c:  jmpq   0x1024a4221
0x1024a4221:  ...

In other words, the loaded object file has not been properly relocated, and when we try to call puts we end up jumping into nowhere. At this point the bug was easily resolved.