editorial note: This is a cross-post of a post originally published on the Hasura blog.
Well-Typed and Hasura have been working together since 2020 to improve Haskell tooling for commercial Haskell users, taking advantage of Well-Typed’s expertise maintaining the Glasgow Haskell Compiler and Hasura’s experience using Haskell in production at scale. Over the last two years we have continued our productive relationship working on a wide variety of projects, in particular related to the profiling and debugging capabilities of the compiler, many of which have had a reinvention or spruce-up. In this post we’ll look at back at the progress we have made together.
Memory profiling and heap analysis
One of the first big projects we worked on was
ghc-debug, a new heap analysis tool that can gather detailed information about the heap of a running process or analyse a snapshot. This tool gives precise results so it can be used to reliably investigate memory usage issues, and we have used it numerous times to fix bugs in the GHC code base. Within Hasura we have used it to investigate fragmentation issues more closely and also to diagnose a critical memory leak regression before a release.
Since GHC 9.2,
ghc-debug is supported natively in GHC. All the libraries and executables are on Hackage so it can be installed and used like any normal Haskell library.
- Memory Fragmentation: A Deeper Look With ghc-debug
- ZuriHac21: Understanding Memory Usage with eventlog2html and ghc-debug
Info table profiling
Also in GHC 9.2 we introduced “info table profiling” (or
-hi profiling), a new heap profiling mode that analyses memory usage over time and relates it to source code locations. Crucially, it does not require introducing cost centres and recompiling with profiling enabled (which may distort performance). It works by storing a map from info tables to meta-information such as where it originated, what type it is and so on. The resulting profile can be viewed using
eventlog2html to give a detailed table about the memory behaviour of each closure type over the course of a program.
We have used info table profiling extensively on GHC itself to find and resolve memory issues, meaning that GHC 9.2 and 9.4 bring significant reductions in compile-time memory usage.
Understanding memory fragmentation
Our early work with Hasura investigated why there was a large discrepency between the memory usage reported by the operating system and the Haskell runtime. The initial hypothesis was that, due to the extensive use of pinned bytestrings in Hasura’s code base, we were losing memory due to heap fragmentation.
We developed an understanding of how exactly fragmentation could occur on a Haskell heap, tooling to analyse the extent of fragmentation and ultimately some fixes to GHC’s memory allocation strategy to reduce fragmentation caused by short-lived bytestring allocations.
This investigation also led to a much deeper understanding of the memory retention behaviour of the GHC runtime and led to some additional improvements in how much memory the runtime will optimistically retain. For long-lived server applications the amount of memory used should return to a steady baseline after being idle for a long period.
This work also highlighted how other compilers trigger idle garbage collections. In particular, we may want to investigate triggering idle collections by allocation rate rather than simple idleness, as applications may continue to still do a small amount of work in their idle periods.
- Understanding Memory Fragmentation
- Memory Fragmentation: A Deeper Look With ghc-debug
- Improvements to memory usage in GHC 9.2
Runtime performance profiling and monitoring
Late cost centre profiling
Code centre profiling, the normal tool recommended for GHC users profiling their Haskell programs, allows recording both time/allocation and heap profiles. It requires compiling the project in profiling mode, which inserts cost centres to the compiled code. Traditionally, the issue with cost centre profiling has been that adding cost centres severly affects how your program is optimised. This means that the existing strategies for automatically inserting cost centres (such as
-fprof-auto) can lead to major skew if they are inserted in an inopportune place.
We have implemented a new cost centre insertion mode in GHC 9.4,
-fprof-late, which inserts cost centres after the optimiser has finished running. Therefore the cost centres will not affect how your code is optimised and the profile gives a more accurate view of how your unprofiled program would perform. The trade-off is that the names of the cost centres contain internal names, but they are nearly always easily understandable.
The utility of this mode can not be understated, you now get a very fine-grained profile that accurately reflects the actual runtime behaviour of your program. It’s made me start using the cost-centre profiler again!
We also developed a plugin which can be used to approximate this mode if you are using GHC 9.0 or 9.2.
Hasura have a suite of benchmarks that track different runtime metrics, such as bytes allocated.1 Investigating regressions in these benchmarks requires a profiling tool geared towards profiling allocations. GHC has long had support for ticky profiling, which gives a low level view about which functions are allocating. However, in the past ticky profiling has been used almost exclusively by GHC developers, not users, and profiles were only consumable in a rudimentary text-based format.
We added support to emit ticky samples via the eventlog in GHC 9.4, and support for rendering the information in the profile to an interactive HTML table using
eventlog2html. In addition, we integrated the new info table mapping (as used by
-hi profiling) to give precise locations for each ticky counter, making it easier to interpret the profile.
Live profiling and monitoring via the eventlog
For a long time we have been interested in unifying GHC’s various profiling mechanisms via the eventlog, and making them easier to monitor. We developed a prototype live monitoring setup for Hasura,
eventlog-live, that could attach to the eventlog and read events whilst the program was running. This prototype was subsequently extended thanks to funding from IOG.
Native Stack Pointer register
GHC-compiled programs use separate registers for the C stack and Haskell stack. One consequence of this is that native Linux debugging and statistical profiling tools (such as
perf) see only the C stack pointer, and hence provide a very limited window into the behaviour of Haskell programs.
Hasura commissioned some experimental investigatory work to see whether it would be possible to use the native stack pointer register for the Haskell stack, and hence get more useful output from off-the-shelf debugging tools. Unfortunately we ran into issues getting
perf to understand the debugging information generated by GHC, and there are challenges related to maintaining LLVM compatibility, but we remain interested in exploring this further.
- Consider using native stack pointer register for STG stack (#8272)
- Towards system profiler support for GHC
Haskell Language Server
Lately we have started to support maintenance of the Haskell Language Server (HLS). The language server is now a key part of many developers’ workflows, so it is a priority to make sure it is kept up-to-date and works reliably, and sponsorship from companies like Hasura is crucial to enabling this.
Recently our work on HLS has included:
Supporting the GHC 9.2 release series, as Hasura were keen to upgrade and have access to all the improved functionality we discussed in this post.
Diagnosing and resolving difficult-to-reproduce segfaults experienced by HLS users. It turned out that the version compatability checks were not strict enough, and HLS could load incompatible object files when running Template Haskell. In particular, you must build
haskell-language-serverwith exactly the same version of GHC with which you compiled your package dependencies, so that object files for dependencies have the correct ABI.
Starting to take advantage of the recently completed support for Multiple Home Units in GHC to make HLS work more robustly for projects consisting of multiple components.
Well-Typed are grateful to Hasura for funding this work, as it will benefit the whole Haskell community. With their help we have made significant progress in the last two years improving debugging and profiling capabilities of the compiler, and improving the developer experience using HLS. We look forward to continuing our productive collaboration in the future.
As well as experimenting with all these tools on Hasura’s code base, we have also been using them to great effect on GHC’s code base, in order to reduce memory usage and increase performance of the compiler itself (e.g. by profiling GHC compiling Hasura’s
graphql-engine). The new profiling tools have been useful in finding places to optimise:
-hi profiling made eliminating memory leaks straightforward, the late cost centre patch gives a great overview of where GHC spends time, and ticky profiling gives a low level overview of the allocations. They have also been very helpful for our work on improving HLS performance.
Well-Typed are actively looking for funding to continue maintaining and enhancing GHC and HLS. If your company relies on robust Haskell tooling, and you could support this work, or would like help improving the developer experience for your Haskell engineers, please get in touch with us via email@example.com!
The number of bytes allocated acts as a proxy for the amount of computation performed, since Haskell programs tend to allocate frequently, and allocations are more consistent than CPU or wall clock time.↩︎