What's wrong with make?

Duncan Coutts – Thursday, 21 August 2008

all coding  

Actually, make is really pretty good. Well, GNU make is pretty good. When used correctly it can rebuild large projects in the minimum number of steps, in parallel.

So what is wrong with it? I'm not complaining about the simple annoyances like insane use of tabs or being based on two inflexible programming languages (shell script and make macros). No, it's got two fundamental problems: it cannot enforce correct dependency tracking and its dependency language is insufficiently expressive for common tasks.

It is possible to write incorrect make rules

It's far too easy to write incorrect rules -- rules where the named dependencies and targets do not match up with the action:

foo : bar
    cat bar baz > foo

So obviously that's a totally contrived example and is easy to spot that it has an untracked dependency on baz, however it gets much harder with more complex rules and where the programs you call read or write files that you did not know about.

So the problem is that make cannot check that you've written a bogus rule and yet if you do your makefile is broken. What makes it worse is that it is not immediately obvious that the makefile is broken. It will work initially but you'll end up not trusting parallel make or running into quirks that are solved by make clean. If you have to use make clean your makefile is broken.

The reason it cannot check for correctness of rules is of course because the actions are opaque blobs of shell script. The decision to use shell script made it quicker to implement make and users were already familiar with the language but the cost is that makefiles are more or less impossible to check automatically.

Dynamic dependencies cannot be expressed sanely in make

Make takes the view that there is a single massive static dependency graph. The single and massive aspects are not a problem but the static one is. The problem is that in many cases you do actually need to run some build actions before you know all the dependencies of each file in a project. The canonical example is running a tool to discover the dependencies of .hs or .c files.

GNU make has a clever extension which makes it possible to express this kind of dependency in simple cases. It lets one include deps.mk. That reads in the deps.mk file and extends the dependency graph with all the rules found in deps.mk. The clever thing is that deps.mk itself can depend on other things and can have a rule to generate it. For the example of dependencies of .hs or .c files, this lets us write a rule to call the compiler with a flag to dump make-format dependencies into a file. This works quite nicely, in this simple case.

The problem with doing include deps.mk is that the file can specify any rules at all. It can add edges between any existing nodes in make's dependency graph. That means that make cannot be confident that it knows everything unless it has got deps.mk up to date and included it. So before anything else can be done, make has to update deps.mk, even if it turns out in the end that the deps given in the include file are irrelevant to the task at hand.

In practise we do not need to be able to add edges willy nilly all over the place and there is a great benefit to being more disciplined. Suppose for the sake of argument that for each .hs file I have a .deps file listing the files that the .hs file depends on. Now if I am trying to update foo.hs then I only need to look at the suffix rule for .hs files and the rules in foo.deps. I know that I do not need to look in bar.deps (unless I discover from foo.deps that foo.hs depends on bar.hi). This is a massive saving. It means I do not have to bring all the .deps files up to date. That also means I do not have to bring the corresponding .hs files up to date. That also means I do not have to run any pre-processors or code generators for those .hs files. That also means I do not have to compile the code generators.

You might think this is a contrived situation but it is not. The Gtk2Hs build system use GNU make (non-recursive of course). It has lots of .chs.pp files which get pre-processed with cpp and then c2hs to generate a .hs file. For each .chs file we look for all the imports and generate a .deps file which gets included into the top level makefile. So just to calculate all the dependencies we actually have to two about two thirds of the build. That's because we first have to build c2hs and two other code generators and then run cpp and c2hs on a hundred or so files, just so we can generate all the .hs files and find their dependencies. This means that from a clean state I cannot build just one particular .hs file because the very makefile itself depends on all the other .hs files. Amusingly, just to make clean, make has to build half the project. Obviously that is insane so we have to hack it using includes that are conditional on the make target. That's not a technique that scales however.

In any non-trivial build system it is necessary to interleave build actions with discovering dependencies. Classic make ignores this fact completely. GNU make notes it but forces us to do all the actions to discover the full dependency graph up front before we can get round to useful work. A better build system would be built around the notion of interleaving graph exploration with graph reduction. Saizan's make-like framework is built around this principle.

Could we do better?

Looking around at previous work, like the classic recursive make considered harmful or the Vesta or Maak build systems, one lesson comes up again and again: track dependencies absolutely precisely or you lose.

I am convinced that the solution to the first problem is not to specify the dependencies and targets separately from the action. They should be specified together so that there is simply no way to express an untracked dependency. This is also the approach taken by Vesta and Maak.

If we have a bunch of primitive actions that correctly declare their dependencies and ways of combining actions that remembers dependencies then we should never end up with untracked dependencies. For example consider the above example again. Lets suppose we have two primitives:

readFile  :: FilePath -> Rule String
writeFile :: FilePath -> Rule (String -> ())

Clearly one has a single dependency and the other a single target. Lets write that contrived rule again:

rule = write <*> read
  where
    write = writeFile "foo"
    read  = (++) <$> readFile "bar" <*> readFile "baz"

So the intuition here is that applicative composition gives us static rule-graph composition. As for rules with dynamic dependencies, this corresponds to monadic composition:

(<**>) :: Rule a -> Rule (a -> b) -> Rule b
(>>=)  :: Rule a -> (a -> Rule b) -> Rule b

This lets us express the cases where very structure of the dependency graph depends on a value calculated from another part of the graph.

With static composition we can look at the resulting rule and 'see' down both branches of the composition to discover the graph structure without running any actions. With dynamic composition we can look at the left branch statically but the right branch requires the value produced by the left before we can even discover its structure.

So we should use static composition where possible because it tells us more about the graph which gives us more opportunities for parallelism and for caching intermediate results to avoid needless recompilation. Where necessary we can use dynamic dependencies but we should not over-use them or our dependency 'graph' would just become one long chain.

So, that's the idea. I should make clear: there is no corresponding implementation! Unlike my vaporware, Saizan has a working implementation of a slightly lower level API as part of his GSoC project. We have talked about putting a higher level API on top though it's not completely clear how to do that yet.

There are also undoubtedly connections to FRP that could do with further exploration. I mean it is quite clearly an FRP style problem, but an ugly grungy one where many parts work by side effects in the file system.