Exploring Cloud Builds in Hadrian

Thursday, 01 August 2019, by David Eichmann.
Filed under ghc, well-typed.

GHC’s new build system, Hadrian, now supports a cloud build option: --share. With that enabled, build artifacts will be cached and shared between builds. This can improve build times, but getting this feature to function correctly is still a work in progress. This blog post explores the requirements for correctness, some examples in Hadrian, the current state of the project, and how developers can use cloud builds to improve iteration time.

Background

For much of its life GHC has had a build system based around Gnu Make. Though it has served its purpose well, the make based build system is being replaced by a new and improved project called Hadrian. Hadrian is a build system written in Haskell and based on the Shake library. With Hadrian comes cleaner code and better maintainability, but also some interesting features of Shake such as cloud builds. Cloud builds are enabled with the --share option. A cloud build keeps a cache of build artifacts that can be shared between builds eliminating a lot of duplicate work that had already been done on a previous build. This feature seems particularly useful in the context of continuous integration (CI). Merge Requests made via GitLab must all pass a comprehensive set of tests on the CI server (the CI pipeline). This involves building GHC multiple times and running the test suite. Due to limited resources, long build times, and the large number of tested build configurations, the CI pipeline typically takes 6 to 8 hours to complete. Keep in mind that each merge request is typically only making a small change to the compiler. By utilizing Shake’s cloud build system to cache build artifacts between CI runs, one would expect a large speedup.

After an initial effort, Andrey Mokhov achieved a successful cloud build and rebuild of GHC with Hadrian. After Andrey’s changes were merged I did a cloud build (using the --share option) of GHC to fill up the cloud build cache, deleted the build directory, then timed a rebuild. This resulted in a 10x speedup of the rebuild compared to a clean non-cloud build, a very positive result. On my machine this took a 25 minute build down to only 2.5 minutes! However, this is an overly optimistic test: no source files have changed, so we expect full cache utilization. In practice we expect source files to have changed and so should expect less cache utilization. Regardless, the prospect of using cloud builds to improve CI times looked very promising. Unfortunately, the correctness of cloud builds came into question, and the last thing we want in CI is an incorrect build. We needed a better understanding of the requirements for correct cloud builds.

Build rules

To understand the problems at hand, one must understand the basics of how a build system like Hadrian is implemented using Shake. Put simply, you must create a set of build rules that map file name patterns to Actions that perform some IO to generate the corresponding file or files i.e. the “targets” of the rule. In particular, Action is a monad and the need :: [FilePath] -> Action () function is used to dynamically specify the rule’s inputs (i.e. dependencies) that must be built before the action continues. Here is an example of a typical build rule that generates a file with the extension .output:

Evaluating Cloud Build Correctness

Let’s consider what it means to have a correct build system. A correct build system is one that always produces a correct set of build artifacts according to some specification. In the case of Hadrian, while not formally specified, this would include producing a functioning ghc binary that reflects the build inputs (e.g. source files and build flavour). We can break correctness down into three specific definitions:

  1. Clean Build Correctness: Start with no existing build artifacts. Running a build will succeed and produce correct build artifacts.
  2. Incremental Build Correctness: Start with existing build artifacts (e.g. by doing a full or partial build). Possibly change some build inputs. Running a build will succeed and produce correct build artifacts.
  3. Cloud Build Correctness: Start with an existing and populated cloud cache and possibly existing build artifacts (e.g. by doing a cloud build then possibly deleting build artifacts). Possibly change some build inputs. Running a cloud build will succeed and produce correct build artifacts.

For each definition above, a requirement for correctness is that the build produces correct build artifacts. Another way of thinking about this is that correct clean, incremental, and cloud builds should always produce an equal set of build artifacts given the same build inputs. In practice, “equal” ignores some insignificant differences due to nondeterminism in GHC and other build tools.

A main source of incremental and cloud build correctness bugs relates to build rules declaring an incorrect set of inputs via need. In an incremental build, the rule inputs indicate when the rule needs to be rerun to produce fresh build artifacts. If an input is undeclared and that input changes, it could result in the rule failing to rerun and hence stale build artifacts. In cloud builds, the rule inputs are used as keys into the cache. If an input is undeclared and that input changes, it could result in an (incorrect) cache hit and again stale build artifacts.

In this blog post we only focus on declaring the correct set of inputs. Rules are assumed to otherwise correctly call produces and generate build artifacts for that rule. Note, Andrey Mokhov’s initial effort largely focused on the use of produces.

Let’s take a closer look at what we mean by “input”. The natural definition of inputs, which we call “direct inputs”, is the set of all files used (i.e. read) by the rule. We can also define an “indicating inputs set”. Consider comparing build artifacts of two clean builds where some build inputs where changed. Here “changed” means the files differ between the first and second build. An indicating inputs set for a rule is a set of files such that:

a rule’s output may change only if at least one of the rule’s indicating inputs has changed.

The direct inputs is always a valid indicating inputs set. There can be other valid indicating inputs sets that need not be a subset or equal to the direct inputs. An indicating inputs set may include files outside the set of direct inputs (the libffi rules are a good example). Generally we try to pick a minimal or close to minimal set of indicating inputs in order to keep the rule implementation simple.

On that note, the invariants to achieve the three levels of correctness defined above are as follows. Understanding these invariants is fundamental to evaluating incremental and cloud build correctness.

  1. Clean Build: All rules need enough to build all direct inputs (this can be via direct or transitively triggered rules).
  2. Incremental Build: Clean build invariants + All rules need a valid indicating inputs set.
  3. Cloud Build: Clean and incremental invariants + All rules produces all outputs (excluding the rule targets(s)).

Dependency Lint Errors

Now that we understand the requirements for correctness, how can we find correctness issues in Hadrian? Shake has a great linting feature (enabled with --lint-fsatrace) that utilizes the Filesystem Access Tracer tool (fsatrace) to observe file accesses within a rule. This feature results in a linting error when any file is read without first being declared with need. In other words, we get linting errors when there is an omitted need for a direct input. Zero linting errors would imply that all direct inputs are needed and since the direct inputs are a valid indicating input set, this would imply incremental build correctness. Great! So how many lint errors do we have? At the start of my endeavor, about 400,000 lint errors! That’s 400,000 files used by various rules that forgot to need them first.

Are all lint errors indicative of a correctness issue? Not necessarily. Remember, lint errors identify omitted direct inputs, but we are concerned with indicating inputs. As such, lint errors must be investigated individually. The general workflow is:

  1. Pick a lint error. The error which contains the rule and the omitted input file paths.
  2. Choose an indicating input set for the rule.
  3. Ensure all indicating inputs are needed in that rule.
  4. The rule is now correct, so trackAllow any direct inputs not in the indicating inputs set (this disables lint errors for those files).

Challenges

So far there is nothing too daunting about this task. Even the large number of lint errors is not too bad when you consider that individual fixes can resolve large numbers of lint errors if the rule is run many times. It sounds like we have our work cut out for us, but let’s look at two cases that prove tricky to resolve.

GHC Dependency Generation and CPP Inputs

Imagine we have a Haskell module that uses CPP:

Now when we build A.hs we must make sure to need the CPP includes i.e. the rule for A.hs has direct inputs A.hs, B.h, and C.h. We ideally want to discover the .h inputs dynamically so that if we, e.g., add new #include directives, we don’t need to update our build system rules. In Hadrian we do this by generating a .dependencies file generated using GHC’s dependency generation options -M -include-cpp-deps. This is done in a rule for the .dependencies file. Now the rule to build A.hs first needs the .dependencies file, then dynamically needs the dependencies (i.e. inputs) listed within. In this case A.hs, B.h, and C.h. The rules look something like this:

So what is the problem here? The rule for ["A.o", "A.hi"] is fine as it tracks the correct inputs. What about the ".dependencies" rule? Running ghc -M -include-cpp-deps A.hs ... reads A.hs then traverses all the #includeed files, hence the direct inputs of .dependencies are A.hs, B.h, and C.h. This can’t be reduced to a smaller indicating inputs set; any of those files can change the output of .dependencies. Well, this is awkward! The inputs of .dependencies are exactly the inputs that we are trying to discover in the first place. If we now include D.h from C.h and recompile, A.hs will recompile (it needs C.h which has changed), but .dependencies will not be recalculated so ["A.o", "A.hi"] will not track the new D.h input. Now change D.h and rebuild. A.hs will not be rebuilt (i.e. the ["A.o", "A.hi"] rulewill not rerun) even though the D.h input has changed.

A possible solution to this is to use Shake’s needed action in the .dependencies rule to declare the inputs after running GHC’s dependency generation. This has a couple of issues:

Currently this is an unresolved issue in Hadrian. It is hard to say how likely this issue is to surface as a bug in incremental/cloud builds.

Compiling Haskell Modules

Consider a rule for a file X.o that compiles X.hs with ghc. We use ghc -M which returns the following Makefile fragment:

The Y.hi is there because module X imports module Y. We conclude that direct inputs = indicating inputs = { X.hs, X.hi, Y.hi } So we implement the rule like this (normally this is done dynamically using .dependencies as described above, but we will list the dependencies statically here to illustrate the problem better):

This seems correct, but running the rule with --lint-fsatrace complains that the rule used a file Z.hi without needing it. Upon further inspection, module Y imports module Z. It turns out ghc often uses a subset of transitively imported modules. This means the set of direct inputs is larger than what we originally though, it contains transitive interface files. Unfortunately there doesn’t seem to be any feasible way to predict which transitive interface files ghc will use at compile time. Fortunately we only need to need indicating inputs, so we can stay optimistic and try to show that transitive interface files (.hi files) are not indicating inputs. This can be done by showing that a change in any transitive interface files results in a change in the non-transitive interface files (this is explained more in more detail on the wiki:

  change({transitive interface files})
        → change({indicating inputs} \ {transitive interface files})
= change({ Z.hi })
        → change({ X.hs, X.hi, Y.hi })

Where means “logically implies”. With some insight from an experienced GHC developer we see that all interface files contain “a list of the fingerprints of everything it used when it last compiled the file” (from the user guide). Making the practical assumption that the fingerprint’s hashing function is injective, we know that change({ transitive interface files }) which implies change({ X.hi }) and trivially change({ X.hs, X.hi, Y.hi }) and hence it is safe to not need transitive interface files. Hence, we can need the direct interface files reported by ghc -M and trackAllow all other interface files.

In this particular case, the lint errors were a non-issue: there were no omitted indicating inputs. This is good, but establishing correctness required in-depth knowledge of the contents and invariants of interface files as well as a solid understanding of how to manipulate the indicating inputs set.

Limitations

Caching Staged Builds

Having seen two examples, we can see that fixing these lint errors isn’t entirely trivial. So the cost of enabling caching has increased, but the benefits still sound great: a large reduction in CI build times. Unfortunately this expectation is compromised by the fact that GHC is usually built in two stages. The majority of the second stage is built using the GHC binary produced in the first stage. Stage 1 GHC is less featureful than stage 2 and so takes only one-quarter to one-third of the total build time. The key observation is that stage 1 GHC is a dependency for most of stage 2. This means that any change to GHC will result in cache misses for the majority of stage 2, severely degrading cache utilization. This should not be understated.

A possible solution is to bypass the 2 stage build process with the help of Hadrian’s --freeze1 option. This freezes the stage 1 GHC. By keeping stage 1 frozen across multiple CI runs, stage 2 can now better utilize the cache. While this sounds like a good idea at first, we strongly want to avoid incorrect builds on CI. The correctness freezing stage 1 is questionable:

On a positive note, stage 1 still takes a significant portion of time to build, and does not suffer from this cache utilization issue. A reduction in stage 1 compile times is already something to be celebrated.

Changes in Hadrian Build Rules

An important note is that Hadrian is currently seeing regular updates and changes to its build rules. Hadrian is under version control with the rest of GHC, so when changing between commits you’re also potentially changing the version of Hadrian. Cloud/incremental builds may be incorrect or fail when Hadrian rules change between builds. While there are ways to protect against this i.e. by making the Hadrian version a dependency of all rules, Hadrian currently doesn’t do this. As a result it is not safe to run cloud/incremental builds across changes to Hadrian. As an example, doing a cloud build of commit A then commit B gives a failed build complaining of a missing GHC.Version Haskell module. The build rules surrounding that module were changed between commit A and B. Hence the cache directory should be cleared when Hadrian has changed compared to previous cloud builds. Alternatively you can maintain separate cache directories based on the version of hadrian e.g. the latest commit that changed the hadrian directory: git rev-list -n1 HEAD -- ./hadrian.

Relocatable Build Artifacts

We want to take advantage of the cloud cache regardless of the absolute filesystem path of the build directory and GHC source directory. Unfortunately some build artifacts depend on these paths. In other words, the build artifacts are not relocatable. This makes cloud builds impractical when the build and source paths are not held constant: it would result in poor cache utilization. Issue #16956 documents this problem further.

What This Means for GHC Developers

Compared with the make build system, potential issues are much more visible due to the fsatrace linting feature. Considering Hadrian is a port of the make system, now with may lint errors fixed, it’s reasonable to expect similar if not more confidence in Hadrian’s incremental builds as with make’s incremental builds. More confidence in incremental builds is a welcome improvement and the most impactful benefit of this work.

We can expect a similar level of confidence in cloud builds assuming that hadrian rules are unchanged between builds. If you’re planning on running the same builds multiple times then using the cloud build feature can benefit you. Simply use the --share command line option when running Hadrian (or --share=DIR to specify a cache location). Consider the use case of swapping between build flavours. A simple solution would be to maintain separate build directories per flavour, but you may not want to manage these directories manually or perhaps you expect the flavours to share a significant amount of build artifacts from the cloud build cache. Here I’ve run through all steps in order with cloud build enabled then all steps again with cloud build disabled (i.e. just as incremental builds). Before step 1, I always start with a clean build and cache directory. You can see that cloud builds are much faster on the second build:

Action (flavour) Cloud Build Time Non-Cloud Build Time
1 build a39a3cd663 (default) 24m37.775s 23m49.780s
2 build a39a3cd663 (quick) 11m55.601s 11m16.476s
3 build a39a3cd663 (default) 1m23.450s 18m58.530s
4 build a39a3cd663 (quick) 1m 2.805s 11m23.408s

Conclusion

The project of supporting correct cloud builds resulted in numerous accomplishments. The number of fsatrace lint error has been reduced from 400,000 to 90,000 thanks numerous bug fixes in Hadrian. We now have greater confidence in incremental and cloud builds. GHC developers can benefit from cloud builds locally in some narrow use cases, and they frequently benefit from improved incremental builds. We’ve developed a clear vocabulary for reasoning about the degrees of build system correctness and what these require. Use of the fsatrace linting feature will be invaluable to understanding and fixing newly surfaced Hadrian bugs in the future. GHC’s -include-cpp-deps option was implemented as part of this project, enabling more precise dependency tracking in Hadrian and users’ build systems. This project also had a part to play in recent ideas to simplify Haskell interface files (see issue #16885).

Still, there is plenty of room to further improve incremental and cloud builds. This blog post is in part a call for help from the community. The remaining work is challenging and interesting, requires little compiler expertise, and yet can have a significant impact on GHC developers. #16926 is a top level “getting started” issue for continuing this project. Most of the remaining work is focused on resolving fsatrace lint errors summarized in issue #16400. That issue also explains how to get started using fsatrace linting. Specific issues of interest include:

It’s highly recommended that you read through and contribute to the Developing Hadrian wiki page which goes into more detail about incremental and cloud build correctness.