This is the eleventh edition of our GHC activities report, which describes the work on GHC and related projects that we are doing at Well-Typed. The current edition covers roughly the months of February and March 2022.

You can find the previous editions collected under the ghc-activities-report tag.

A bit of background: One aspect of our work at Well-Typed is to support GHC and the Haskell core infrastructure. Several companies, including IOHK, Meta, and GitHub via the Haskell Foundation, are providing us with funding to do this work. We are also working with Hasura on better debugging tools. We are very grateful on behalf of the whole Haskell community for the support these companies provide.

If you are interested in also contributing funding to ensure we can continue or even scale up this kind of work, please get in touch.

Of course, GHC is a large community effort, and Well-Typed’s contributions are just a small part of this. This report does not aim to give an exhaustive picture of all GHC work that is ongoing, and there are many fantastic features currently being worked on that are omitted here simply because none of us are currently involved in them in any way. Furthermore, the aspects we do mention are still the work of many people. In many cases, we have just been helping with the last few steps of integration. We are immensely grateful to everyone contributing to GHC. Please keep doing so (or start)!

Team

The current GHC team consists of Ben Gamari, Andreas Klebinger, Matthew Pickering, Zubin Duggal and Sam Derbyshire.

Many others within Well-Typed, including Adam Gundry, Alfredo Di Napoli, Alp Mestanogullari, Douglas Wilson and Oleg Grenrus, are contributing to GHC more occasionally.

Releases

  • Ben finished backports to GHC 9.2.2 and cut the release.

  • Matt worked on preparing the 9.4 release.

  • Zubin has started preparing the 9.2.3 release.

Typechecker

  • Sam has been implementing syntactic unification, which allows two types to be checked for equality syntactically. This is useful in several places in the typechecker when we don’t want to emit an equality constraint to be processed by the constraint solver (thus giving less work to the constraint solver). This work will allow us to progress towards fixing #13105 (allowing rewriting in RuntimeReps). (!7812)

  • Sam fixed the implementation of isLiftedType_maybe, an internal function in GHC used to determine whether something is definitely lifted (e.g. Int :: Type), definitely unlifted (e.g. Int# :: TYPE IntRep), or unknown (e.g. a :: TYPE r for a type variable r). This function did not correctly account for type families or levity variables, as noted in #20837. This was hiding several bugs, e.g. in strictness analysis and in pattern matching inhabitation tests.

  • Sam allowed HasField constraints to appear in quantified constraints (#20989).

  • Sam added a check preventing users to derive KnownNat instances, which could be used to cause segfaults as shown in #21087.

  • Sam made the output of GHCi’s :type command more user-friendly, by improving instantiation of types involving out-of-order inferred type variables (#21088) and skipping normalisation for types that aren’t fully instantiated (#20974).

Code generation

  • Ben reworked the x86-64 native code generator to produce more position-independent code where possible. This enables use of GHC with new Windows toolchains, which enable address-space layout randomization by default (#16780).

  • Ben fixed a slew of bugs in GHC’s code generation for unaligned array accesses, revealed by recent work on the bytestring package (#20987, #21015).

  • Ben characterised and fixed a rather tricky bug in the generation of static reference tables for programs containing cyclic binding groups containing CAFs, static functions, and static data constructor applications (#20959).

  • Ben debugged fixed a recently introduced regression where GHC miscompiled code involving jump tables with sub-word discriminants (#21186).

  • Ben debugged a tricky non-deterministic crash due to a set of missing GC roots (#21141).

  • Spurred by insights from #21141, Ben started investigating how we can reduce the impact of error paths on SRT sizes. Sadly, there are some tricky challenges in this area which will require further work (#21169, #21183)

  • Ben migrated GHC’s Windows distribution towards a fully Clang/LLVM-based toolchain, eliminating a good number of bugs attributable to the previous GNU toolchain. This was a major undertaking involving changes in code generation (!7449), linking (!7774, !7528), the RTS (!7511, !7512, !7446), Cabal (Cabal #8062), the driver (!7448), and packaging and should significantly improve GHC’s maintainability and reliability on Windows platforms. See !7448 for a full overview of all of the moving parts involved in this migration.

  • Sam fixed a bug with code generation of keepAlive#, which was incorrectly being eta reduced even though it is supposed to always be kept eta-expanded (#21090).

Core

  • Sam added a check that prevents unboxed float literals from occurring in patterns, to avoid case expressions needing to implement complicated floating-point equality rules. This didn’t affect any packages on head.hackage.

Runtime system

  • Ben refactored the handling of adjustor thunks, an implementation detail of GHC’s foreign function interface implementation. The new representation significantly reduces their size and contribution to address-space fragmentation, fixing eliminating a known memory leak in GHCi (#20349) and fixing a source of testsuite fragility on Windows and i386 (#21132).

  • With help from Matt, Ben at long last finished and merged his refactoring of GHC’s eventlog initialization logic, eliminating a measurable source of RTS startup overhead and removing the last barrier to enabling eventlog support by default (!4477).

  • Ben rewrote the RTS linker used on Windows platforms, greatly improving link robustness by extending and employing the RTS’s m32 allocator for mapping object code (!7447).

  • Ben identified a GC bug, revealed by recent improvements in pointer tagging consistency, where the GC failed to untag function closures referenced from PAPs (#21254).

  • Matt identified and fixed a discrepency in the runtime stats calculations which would lead to incorrect CPU time calculations when using multiple GC threads. (!7890)

Error messages

  • Sam migrated more error messages to use the diagnostic infrastructure, such as “Missing signature” errors, and illegal wildcard errors (!7033).

  • Sam improved the treatment of promotion ticks on symbolic operators, which means that GHC now correctly report unticked promoted symbolic constructors when compiling with -Wunticked-promoted-constructors (#19984).

  • Zubin added warnings that get triggered when file header pragmas like LANGUAGE pragmas are found in the body of the module where they would usually be ignored (#20385).

Parser

  • Sam allowed COMPLETE pragmas involving qualified constructor names to be parsed correctly (!7645).

Driver

  • Ben rebased and extended work by Tamar Christina to improve GHC’s support for linking against C++ libraries, addressing pains revealed by the text libraries recent addition of a dependency on simdutf (#20010).

  • Ben introduced response file support into GHC’s command-line parser, making it possible to circumvent the restrictive limits on command-line-length imposed by some platforms (#16476).

  • Zubin and Matt added more fine grained recompilation checking for modules using Template Haskell, so that they are only recompiled when a dependency actually used in a splice is changed (#20605, !7353, blog post).

  • Matt fixed some more bugs in the dependency calculations in the driver. In particular some situations involving redundant hs-boot files are now handled correctly.

  • Matt modified the driver to store a cached transitive dependency calculation which can share work of computing a transitive dependency across modules. This is used when computing what instances are in scope for example.

  • Matt once again improved the output of the -Wunused-packages warning to now display some more information about the redundant package imports (!7883).

  • Matt fixed a long-standing bug where in one-shot mode, the compiler would look for interface files in the -i dirs even if the -hidir was set (!7851).

API features

  • Zubin fixed a few bugs with HIE file support, one where certain variable scopes weren’t being calculated properly (#18425) and another one where relationships between derived typeclass instances weren’t being recorded (#20341).

  • Matt and Zubin finished the hi-haddock patch which makes GHC lex and rename Haddock documentation strings. These are then stored in interface files so downstream tools such as HLS and GHCi can directly read this information without having to process the doc strings themselves.

Template Haskell

  • Sam fixed a compiler panic triggered by illegal occurrences of type wildcards resulting from splicing in a Template Haskell type (#15433).

  • Zubin fixed a few issues with the pretty printing of TH syntax (#20868, #20842).

  • Zubin added support for quoting patterns containing negative numeric literals (#20711).

Profiling

  • Andreas added a new profiling mode: -fprof-late. This mode will add cost centres only after the simplifier had a chance to run resulting in profiling performance more in line with regular builds as profiling in this mode will allow most optimizations to fire even with profiling enabled. The user guide has more information and we encourage people to try it out!

  • Andreas and Matt made various changes to the ticky-profiling infrastructure which allow it to be used with the eventlog, and furthermore to use it with eventlog2html. Thanks to Hasura for funding this work!

  • Matt made a few improvements to Source Notes, which are used to give source locations to expressions. This should result in more accurate location information in more situations (!7536).

Libraries

  • Ben fixed a regression in the process library due to the recently-introduced support for posix_spawnp (process #224).

  • Andreas opened up a proposal to export MutableByteArray from Data.Array.Byte. This makes the treatment of MutableByteArray consistent with the treatment of ByteArray in regards to being exported by base. The proposal has been implemented, accepted and will be in ghc-9.4.

Compiler performance

  • Sam continued work on directed coercions, which avoids large coercions being produced when rewriting type families (#8095). Unfortunately, benchmarking revealed some severe regressions, such as when compiling the singletons package. This is due to coercion optimisation being less effective than it was previously. Sam implemented a workaround (coercion zapping in the coercion optimiser), but due to the complexity of the patch, it was decided it would be better to investigate zapping without directed coercions, as the implementation of the directed coercions patch suggested a way forward that could avoid previous pitfalls. The directed coercions patch has been put on hold for the time being, until a better approach can be found for coercion optimisation. Sam wrote up an overview of the difficulties encountered during the implementation on the GHC wiki here, which should be useful to future implementors.

Packaging

  • All the release bindists are now produced by Hadrian. Starting from the 9.4 release, all the bindists that we distribute will be built in this manner.

  • Zubin and Matt finished the reinstallable GHC patch, which allows GHC and all its libraries to be built using normal cabal-install commands. In the future it’s hoped that this will allow ghc to be rebuilt in cabal build plans but we still have some issues to work out to do with Template Haskell (#20742).

Runtime performance

  • After a long time tag inference has finally landed in !5614 which implements #16970.

    This is a new optimization which can allow the compiler to omit branches checking for the presence of a pointer tag if we can infer from the context that a tag must always be present.

    For the nofib benchmark suite benchmarks the best result was obtained when -fworker-wrapper-cbv was enabled resulting in a 4.01% decrease in instructions executed, with a similar benefit in runtime. Sadly because of #20364 -fworker-wrapper-cbv can prevent RULEs from firing when INLINE[ABLE] pragmas are not used correctly. Which turned out to be a fairly common problem!

    For this reason -fworker-wrapper-cbv is off by default at which point the improvement was “only” by a 1.53% reduction in instructions executed. But in general -fworker-wrapper-cbv can be safely enabled for all modules except these which define RULE relevant functions which are currently not subject to a W/W split. The user guide has some guidance about when -fworker-wrapper-cbv can be safely enabled.

    The main goal of this optimization was to improve tight loops like the ones performed by the lookup operations in containers. There a significant amount of performance was lost to redundant checks for pointer tags and we saw an improvement of runtime by up to 15% for some of the lookup operations.

Infrastructure

  • Ben and Matt introduced a lint to verify references between GHC’s long-form comments (so-called “Notes”). This helps eliminate a long-standing problem where note references grow stale across refactorings of the compiler (!7482).

  • Sam migrated the linting infrastructure to allow all linting steps to be run locally, so that developers can be confident their merge requests won’t fail during the linting stage in CI (!7578).

  • Matt created a script which generates the main CI pipelines for all supported build combinations. The script is a simple Haskell file which is easier to understand and modify than the old gitlab yaml file which led to various inconsistencies between the build configurations and bugs such as missing build artifacts (!7753).