DWARF support in GHC (part 3)

Ben Gamari – Sunday, 05 April 2020

This post is the third of a series examining GHC’s support for DWARF debug information and the tooling that this support enables:

Part 1 introduces DWARF debugging information and explains how its generation can be enabled in GHC.
Part 2 looks at a DWARF-enabled program in gdb and examines some of the limitations of this style of debug information.
Part 3 looks at the backtrace support of GHC’s runtime system and how it can be used from Haskell.
Part 4 examines how the Linux perf utility can be used on GHC-compiled programs.
Part 5 concludes the series by describing future work, related projects, and ways in which you can help.

Getting backtraces from the runtime

We saw in the last post that GHC’s debug information can be used by the gdb interactive debugger to provide meaningful backtraces of running Haskell programs. However, debuggers are not the only consumer of these backtraces. For several releases now the GHC RTS has itself supported stack backtraces. This support can be invoked in two ways:

via the SIGQUIT signal
via the GHC.ExecutionStack interface in base

In the first case, programs built with debug symbols and a libdw-enabled compiler can be sent the SIGQUIT signal ¹, resulting in a stack trace being blurted to stderr:

$ vector-tests-O0 >/dev/null & sleep 0.2; kill -QUIT %1
Caught SIGQUIT; Backtrace:
                 0x1387442    set_initial_registers (rts/Libdw.c:288.0)
            0x7feecbf2b0e0    dwfl_thread_getframes (/nix/store/35vnzk39hwsx18d1bkcd30r5xrx026mr-elfutils-0.176/lib/libdw-0.176.so)
            0x7feecbf2ab4e    get_one_thread_cb (/nix/store/35vnzk39hwsx18d1bkcd30r5xrx026mr-elfutils-0.176/lib/libdw-0.176.so)
            0x7feecbf2aea3    dwfl_getthreads (/nix/store/35vnzk39hwsx18d1bkcd30r5xrx026mr-elfutils-0.176/lib/libdw-0.176.so)
            0x7feecbf2b479    dwfl_getthread_frames (/nix/store/35vnzk39hwsx18d1bkcd30r5xrx026mr-elfutils-0.176/lib/libdw-0.176.so)
                 0x1387abd    libdwGetBacktrace (rts/Libdw.c:259.0)
                 0x1373b26    backtrace_handler (rts/posix/Signals.c:534.0)
            0x7feecbf7185f    (null) (/nix/store/g2p6fwjc995jrq3d8vph7k45l9zhdf8f-glibc-2.27/lib/libpthread-2.27.so)
                 0x137f24f    _rts_stgzuapzup_ret (_build/stage1/rts/build/cmm/AutoApply.cmm:654.18)
                 0x137de10    stg_upd_frame_info (rts/Updates.cmm:31.1)
                  0xa7cc80    _randomzm1zi1zmc60864d5616c60090371cdf8e600240f388e8a9bd87aa769d8045bda89826ee2_SystemziRandom_lvl6_siHP_entry (System/Random.hs:489.70)
                 0x12c52d8    integerzmwiredzmin_GHCziIntegerziType_minusInteger_info (libraries/integer-gmp/src/GHC/Integer/Type.hs:437.1)
                  0xa7d098    randomzm1zi1zmc60864d5616c60090371cdf8e600240f388e8a9bd87aa769d8045bda89826ee2_SystemziRandom_zdwrandomIvalInteger_info (System/Random.hs:487.20)
                  0x98da50    _QuickCheckzm2zi13zi2zmac90a2a0d9e0dd2c227d795a9d4d9de22a119c3781b679f3b245300e1b658c43_TestziQuickCheckziArbitrary_sat_sx8e_entry (Test/QuickCheck/Arbitrary.hs:988.26)
                 0x137de10    stg_upd_frame_info (rts/Updates.cmm:31.1)
...
                 0x137c810    stg_stop_thread_info (rts/StgStartup.cmm:42.1)
                 0x136571b    StgRunJmp (rts/StgCRun.c:370.0)
                 0x136241b    scheduleWaitThread (rts/Capability.h:219.0)
                 0x135f35e    hs_main (rts/RtsMain.c:73.0)
                  0x455df4    (null) (/opt/exp/ghc/ghc-8.10/vector/dist-newstyle/build/x86_64-linux/ghc-8.10.0.20191231/vector-0.13.0.1/t/vector-tests-O0/build/vector-tests-O0/vector-tests-O0)
            0x7feecbd4ab8e    __libc_start_main (/nix/store/g2p6fwjc995jrq3d8vph7k45l9zhdf8f-glibc-2.27/lib/libc-2.27.so)
                  0x40a82a    _start (../sysdeps/x86_64/start.S:122.0)

This can be especially useful in diagnosing unexpected CPU usage or latency in long-running tasks (e.g. a server stuck in a loop).

Note, however, that this currently only provides a backtrace of the program’s main capability. Backtrace support for multiple capabilities is an outstanding task.

Getting backtraces from Haskell

The runtime’s unwinding support can also be invoked from Haskell programs via the GHC.ExecutionStack interface. This provides:

-- | A source location.
data Location = {- ... -}

-- | Returns a stack trace of the calling thread or 'Nothing'
-- if the runtime system lacks libdw support.
getStackTrace :: IO (Maybe [Location])

In the future we would also like to provide

getThreadStackTrace :: ThreadId -> IO (Maybe [Location])

although this is an outstanding task.

This could be used in a number of ways:

when throwing an exception, one could capture the current stack for use in diagnostics output.
with getThreadStackTrace a monitoring library like ekg might provide the ability to enumerate the program’s threads and introspect on what they are up to.

We’ll look at (1) in greater detail below.

Providing backtraces for exceptions

Attaching backtrace information to exceptions is fairly straightforward. For instance, one could provide

data WithStack e = WithStack (Maybe [Location]) e
instance Exception (WithStack e)

throwIOWithStack :: e -> IO a
throwIOWithStack exc = do
 stack <- getStackTrace
 throwM $ WithStack stack exc

throwWithStack :: e -> a
throwWithStack = unsafePerformIO . throwIOWithStack

-- | Attach a stack trace to any exception thrown by the enclosed action.
-- Note that this is idempotent.
addStack :: IO a -> IO a
addStack = handle f
  where
    f :: SomeException -> IO b
    f exc | WithStack{} <- fromException exc =
      throwIO exc  -- ensure idempotency
    f (SomeException exc) =
      throwIOWithStack exc

Keep in mind that DWARF stack unwinding can incur a significant overhead (being linear in the depth of the stack with a significant constant factor). Consequently, it would be unwise to use throwIOWithStack indiscriminantly (e.g. when throwing an asynchronous exception to kill another thread). However, for truly “exceptional” cases (e.g. failing due to a non-existent file), it would offer quite some value.

Unfortunately, the untyped nature of Haskell exceptions complicates the migration path for existing code. Specifically, if a library provides a function which throws MyException, users catching MyException would break if the library started throwing WithStack MyException. While this may be manageable in the case of user libraries, for packages at the heart of the Haskell ecosystem (e.g. base) this is a significant hurdle.

Another design which avoids this migration problem is to incorporate backtraces directly into the base SomeException type, which is used to represent all thrown exceptions. Specifically, Control.Exception could then expose a variety of throwing functions, reflecting the many call stack mechanisms GHC now offers:

data SomeException where
    SomeException :: forall e. Exception e
                  => Maybe [Location]  -- ^ backtrace, if available
                  -> e                 -- ^ the exception
                  -> SomeException

-- | A representation of source locations consolidating 'GHC.Stack.SrcLoc',
-- 'GHC.Stack.CostCentre', and 'GHC.ExecutionStack.Location'.
data Location = {- ... -}

-- | Throws an exception with no stack trace.
throwIO :: e -> IO a

-- | Throws an exception with a stack trace captured via
-- 'GHC.Stack.getStackTrace'.
throwIOWithExecutionStack :: e -> IO a

-- | Throws an exception with a `HasCallStack` stack trace.
throwIOWithCallStack :: HasCallStack => e -> IO a

Of course, this raises the question of which call-stack method a particular exception ought to use. This is often unknowable, depending upon the user’s build configuration. Consequently, we might consider exposing something of the form:

-- | Throws an exception with a stack trace using the most
-- precise method available in the current build configuration.
throwIOWithStack :: HasCallStack => e -> IO a
throwIOWithStack
  | profiling_enabled = throwIOWithCostCentreStack
  | dwarf_enabled     = throwIOWithExecutionStack
  | otherwise         = throwIOWithCallStack

Finally, new “catch” operations could be introduced providing the handler access to the exception’s stack:

catchWithLocation :: IO a -> (e -> Maybe [Location] -> IO a) -> IO a

Above are just two possible designs; I’m sure there are other points worthy of exploration. Do let me know if you are interesting in picking up this line of work.

The next post will look at using the Linux perf utility to profile Haskell executables.

The unfortunate choice of the SIGQUIT signal to dump a backtrace originates from the Java virtual machine implementations, where this has been long available. GHC currently follows this precedent although some people believe that SIGQUIT should be used for… quitting. Do let us know on #17451 if you feel should we should reconsider the choice to follow Java on this point.↩︎