This is the fifth and final post of a series examining GHC’s support for DWARF debug information and the tooling that this support enables:
- Part 1 introduces DWARF debugging information and explains how its generation can be enabled in GHC.
- Part 2 looks at a DWARF-enabled program in
gdband examines some of the limitations of this style of debug information.
- Part 3 looks at the backtrace support of GHC’s runtime system and how it can be used from Haskell.
- Part 4 examines how the Linux
perfutility can be used on GHC-compiled programs.
- Part 5 concludes the series by describing future work, related projects, and ways in which you can help.
In the previous four posts we saw of some the functionality enabled by DWARF debug information. As of GHC 8.10.2 everything we saw above should be possible with the standard DWARF-enabled GHC binary distributions.
However, there is still a great deal of untapped potential and much remains to be done. Here is a sampling of tasks in no particular order:
- Merge the fruits of my latest push on DWARF support upstream (!2380, !2373, !2387)
- Make GHC-generated symbols (e.g.
59fw_info) more reflective of their origin in the source program
- Preserve call-stacks in exceptions (as discussed in part 3)
- Reduce the size of debug information through more concise representation (see #17609)
- Some RTS symbols (e.g.
stg_PAP_apply) don’t have accurate unwind information, leading to truncated backtraces in some cases (#17627)
- Implement a native (e.g. non-DWARF-based) stack unwinder in the GHC runtime system, allowing improved unwind performance in Haskell code
- Windows PDB support (#12397)
- Try moving GHC’s stack pointer to the native stack pointer register, enabling call-graph profiling via DWARF unwinding (as discussed in part 4, #8272)
- Build statistical profiling support into the GHC runtime system (#10915)
- Add support for expressing local variables in C–, enabling allocation profiling
- Add support for tracking register value semantics in STG-to-C– and DWARF type information, enabling local variable introspection.
- Implement thread support in
- Make better use of GHC-specific source-note information (mentioned briefly in part 1)
- Symbol demangling support in the GHC RTS,
- Analysis tools
As always, we are looking for people to help with this effort. If any of the above tasks sound enticing to you, do let us know. Deep compiler experience is quite unnecessary for many of these tasks, especially those in the area of analysis tools.
Below I will describe in greater detail a few of the tasks which I think hold the greatest potential.
Profile analysis tools
In his thesis, Peter Wortmann shows that the one-to-one correspondence between instructions and line numbers required by DWARF (see part 1) can result in rather un-helpful profiles. He shows that one can do significantly better by splitting the attribution of an instruction across the full set of source locations that gave rise to it. This is not something that existing tools can do. One could implement this approach on top of the sample data produced by
perf record (e.g. exporting the samples via the
perf script tool or the
linux-perf Haskell library) and using the the extended DWARF annotations produced by GHC.
Peter’s Haskell Implementor’s Workshop demonstration showed one possible interface for such an analysis tool, marrying Haskell source and Core with sample data in the ThreadScope interface. It would be great to continue exploration down this path.
Using native stack pointer register
As noted in part 4, GHC’s current execution model on x86 precludes use of
perf record’s call-graph profiling functionality. The most promising avenue to fix this would be to rework GHC to use the native stack pointer register to track the Haskell stack (#8272). This would potentially carry a few benefits:
it would enable use of native profiling tools
the native code generator could use the
POPinstructions, which may be more concise or better optimised in the microarchitecture than our current stack manipulation strategy
However, there are also a few tricky points:
LLVM makes very strong assumptions about the nature of the stack; consequently, moving the LLVM backend to this scheme may be non-trivial.
the System V ABI requires that the stack always have a small region above the stack pointer (called the “red zone”) which code can use for temporary storage. GHC would need to ensure this before calling into foreign code.
Building sampling profiling into the GHC runtime
Without fixing the stack register issue described above,
perf’s call-graph profiling functionality is unusable. However, nothing is stopping GHC from providing its own sampling infrastructure in the runtime (#10915). In 2016 I started a branch) doing exactly this using
perf_events’s signal-based sampling interface, dumping samples to GHC’s eventlog.
As far as I can recall the
wip/libdw-prof branch can readily collect samples; the work that remains primarily revolves around developing analysis tools.
One approach would be to build a tool to convert the GHC-eventlog-based output from the
wip/libdw-prof branch into a
perf.data file for use with
perf report. However, one could no doubt do much better with a more specialised tool, as described in the “Profile analysis tools” above.
While simple, this signal-based approach does imply a slightly more overhead (in the form of context-switches) than necessary. A more efficient approach might involve the Linux eBPF mechanism, which can be triggered from a
Most imperative compilers produce debug information that allow debuggers display and modify in-scope variables and their values. In principle GHC could also provide such support. However, doing so in a way that will be useful in simplified programs would be quite non-trivial. For instance, consider the program:
f :: (Int, String) -> Int
= x + 4f (x, _)
GHC’s worker-wrapper transformation would likely transform this to,
f :: (Int, String) -> Int
f pair case pair of (x, _) ->
case x of I# x# ->
case $wf x# of result ->
$wf :: Int# -> Int#
$wf x# = x# + 4
This sort of transformation is ubiquitous and critical to the quality of GHC’s produced code. Naturally, we would want to ensure that the debug information of
$wf can represent the fact that
x# is the unboxed first element of the argument of
f. I suspect that the best way to accomplish this would be to propagate value provenance information through binders’ (e.g. in this case
This would involve:
- Adding syntax in C– to encode local variable information
- Producing such syntax in the STG-to-C– code generator
- Adding information in Core to propagate value provenance, as discussed above
- Populate this information in worker-wrapper
While being able to poke around at Haskell values in
gdb is perhaps a tempting proposition, all-in-all I suspect that the costs (both in implementation time and complexity) of would likely outweigh the benefits it would bring. This is especially true given that GHC already has the GHCi debugger for cases where such interactive debugging is necessary.
Aside: Event tracing
Some users have related to me that they have sometimes wished that GHC programs were as “traceable” as other programming language. In particular, tools like
dtrace provide robust, minimal-overhead, language-agonstic tracing infrastructure which can be invaluable in production settings. It would be great if Haskell programs could benefit from these same tools.
The easiest on-ramp to tracing support is via the User-space Statically-Defined Tracepoint (USDT) mechanism supported by all of the aforementioned tools. Under this scheme, the traced program embeds a bit of metadata describing the available tracepoints, the information they provide, and how they are enabled.
It turns out that GHC’s runtime system already defines a number of USDT tracepoints (although they need to be enabled when configuring GHC with the
configure flag). However, it is possible that this support may have bit-rotted (#15543).
However, it may also be useful to be able to define USDT tracepoints in Haskell programs. A simple implementation of this would simply be a Template Haskell splice which would generate the necessary C stubs and splice in a foreign function import and call into the program.
Aside: LLVM and X-Ray
It should also be noted that LLVM provides another, much different approach to the tracing/profiling problem with its XRay instrumentation infrastructure. This approach seeks to introduce low-cost tracing instrumentation in generated code, allowing precise and highly detailed accounting of runtime costs.
Matthew Pickering tried (#15929) adding XRay support to GHC’s LLVM backend. Unfortunately, this effort ended up being rather stunted, in part due to limitations of LLVM itself (specifically difficulties with tail-calls) and in part due to limitations of GHC’s LLVM backend (namely, we rely on the LLVM IR
alias mechanism to convince LLVM that our type annotations are correct; this confuses the XRay logic).
This work has been a multi-year (off-and-on) effort for me, but it would not have been possible without a number of others.
In particular, this work would never have even started without the efforts of Peter Wortmann. Not only does the causality formalism he described in his dissertation provide the theoretical foundation for all of this functionality, but his initial implmentation kick-started the effort and the promising results he demonstrated at the Haskell Implementors’ Workshop provided me with the motivation to keep picking away at the seemingly endless stream of details which arose as I refined the feature over the years.
In general, Well-Typed’s work on GHC (and, therefore, my own work) would not have been possible without the support of Microsoft Research, IOHK, and others who have supported the position which allows me to work on GHC for many years. In addition, some of my early work in 2015 to clean up the original DWARF implementation was supported directly by funding from Microsoft Research.