No untracked dependencies!

Years ago, back when Isaac Potoczny-Jones and others were defining the Cabal specification, the big idea was to make Haskell software portable to different environments. One of the mantras was “no untracked dependencies!”

The problem at the time was that Haskell code had all kinds of implicit dependencies which meant that while it worked for you, it wouldn’t build for me. For example, I might not have some other module that it needed, or the right version of the module.

So of course that’s what the build-depends in .cabal files is all about, requiring that the author of the code declare just what the code requires of its environment. The other important part is that the build system only lets your code see the dependencies you’ve declared, so that you don’t accidentally end up with these untracked dependencies.

This mantra of no untracked dependencies is still sound. If we look at a system like nix, part of what enables it to work so well is that it is absolutely fanatical about having no untracked dependencies.

Untracked dependencies?!

One weakness in the original Cabal specification is with Setup.hs scripts. These scripts are defined in the spec to be the entry point for the system. According to the Cabal spec, to build a package you’re required to compile the Setup.hs script and then use its command line interface to get things done. Because in the original spec the Setup.hs is the first entry point, it’s vital that it be possible to compile Setup.hs without any extra fuss (the runhaskell tool was invented just to make this possible, and to make it portable across compilers).

But by having the Setup.hs as the primary entry point, it meant that it’s impossible to reliably use external code in a Setup.hs script, because you cannot guarantee that that code is pre-installed. Going back to the “no untracked dependencies” mantra, we can see of course that all dependencies of Setup.hs scripts are in fact untracked!

This isn’t just a theoretical problem. Haskell users that do have complex Setup.hs scripts often run into versioning problems, or need external tools to help them get the pre-requisite packages installed. Or as another example: Michael Snoyman noted earlier this year in a diagnosis of an annoying packaging bug that:

As an aside, this points to another problematic aspect of our toolchain: there is no way to specify constraints on dependencies used in custom Setup.hs files. That’s actually caused more difficulty than it may sound like, but I’ll skip diving into it for now.

The solution: track dependencies!

As I said, the mantra of no untracked dependencies is still sound, we just need to apply it more widely.

These days the Setup.hs is effectively no longer a human interface, it is now a machine interface used by other tools like cabal or by distro’s install scripts. So we no longer have to worry so much about Setup.hs scripts always compiling out of the box. It would be acceptable now to say that the first entry point for a tool interacting with a package is the .cabal file, which might list the dependencies of the Setup.hs. The tool would then have to ensure that those dependencies are available when compiling the Setup.hs.

So this is exactly what we have now done. Members of the Industrial Haskell Group have funded us to fix this long standing problem and we have recently merged the solution into the development version of Cabal and cabal-install.

From a package author’s point of view, the solution looks like this: in your .cabal file you can now say:

build-type: Custom

custom-setup
  setup-depends: base >= 4.6,
                 directory >= 1.0,
                 Cabal >= 1.18 && < 1.22,
                 acme-setup-tools == 0.2.*

So it’s a new stanza, like libraries or executables, and like these you can specify the library dependencies of the Setup.hs script.

Now tools like cabal will compile the Setup.hs script with these and only these dependencies, just like it does normally for executables. So no more untracked dependencies in Setup.hs scripts. Newer cabal versions will warn about not using this new section. Older cabal versions will ignore the new section (albeit with a warning). So over time we hope to encourage all packages with custom setup scripts to switch over to this.

In addition, the Setup.hs script gets built with CPP version macros (MIN_VERSION_{pkgname}) available so that the code can be made to work with a wider range of versions of its dependencies.

In the solver…

So on the surface this is all very simple and straightforward, a rather minor feature even. In fact it’s been remarkably hard to implement fully for reasons I’ll explain, but the good news is that it works and the hard work has also gotten us solutions to a couple other irksome problems.

Firstly, why isn’t it trivial? It’s inevitable that sooner or later you will find that your application depends on one package that has setup deps like Cabal == 1.18.* and another with setup deps like Cabal == 1.20.*. At that point we have a problem. Classically we aim to produce a build plan that uses at most one version of each package. We do that because otherwise there’s a danger of type errors from using multiple versions of the same package. Here with setup dependencies there is no such danger: it’s perfectly possible for me to build one setup script with one version of the Cabal library and another script with a different Cabal version. Because these are executables and not libraries, the use of these dependencies does not “leak,” and so we would be safe to use different versions in different places.

So we have extended the cabal solver to allow for limited controlled use of multiple versions of the same package. The constraint is that all the “normal” libraries and exes all use the same single version, just as before, but setup scripts are allowed to introduce their own little world where independent choices about package versions are allowed. To keep things sane, the solver tries as far as possible not to use multiple versions unless it really has to.

If you’re interested in the details in the solver, see Edsko’s recent blog post.

Extra goodies

This work in the solver has some extra benefits.

Improve Cabal lib API without breaking everything

In places the Cabal library is a little crufty, and the API it exposes was never really designed as an API. It has been very hard to fix this because changes in the Cabal library interface break Setup.hs scripts, and there was no way for packages to insulate themselves from this.

So now that we can have packages have proper dependencies for their custom Setup.hs, the flip side is that we have an opportunity to make breaking changes to the Cabal library API. We have an opportunity to throw out the accumulated cruft, clean up the code base and make a library API that’s not so painful to use in Setup.hs scripts.

Shim (or compat) packages for base

Another benefit is that the new solver is finally able to cope with having “base shim” packages, as we used in the base 3.x to 4.x transition. For two GHC releases, GHC came with both base-3.x and base-4.x. The base-4 was the “true” base, while the base-3 was a thin wrapper that re-exported most of base-4 (and syb), but with some changes to implement the old base-3 API. At the time we adapted cabal to cope with this situation of having two versions of a package in a single solution.

When the new solver was implemented however support for this situation was not added (and the old solver implementation was retained to work with GHC 6.12 and older).

This work for setup deps has made it relatively straightforward to add support for these base shims. So next time GHC needs to make a major bump to the version of base then we can use the same trick of using a shim package. Indeed this might also be a good solution in other cases, perhaps cleaner than all these *-compat packages we’ve been accumulating.

It has also finally allowed us to retire the old solver implementation.

Package cycles involving test suites and benchmarks

Another feature that is now easy to implement (though not actually implemented yet) is dealing with the dependency cycles in packages’ test suites and benchmarks.

Think of a core package like bytestring, or even less core like Johan’s cassava csv library. These packages have benchmarks that use the excellent criterion library. But of course criterion is a complex beast and itself depends on bytestring, cassava and a couple dozen other packages.

This introduces an apparent cycle and cabal will fail to find an install plan. I say apparent cycle because there isn’t really a cycle: it’s only the benchmark component that uses criterion, and nothing really depends on that.

Here’s another observation: when benchmarking a new bytestring or cassava, it does not matter one bit that criterion might be built against an older stable version of bytestring or cassava. Indeed it’s probably sensible that we use a stable version. It certainly involves less rebuilding: I don’t really want to rebuild criterion against each minor change in bytestring while I’m doing optimisation work.

So here’s the trick: we break the cycle by building criterion (or say QuickCheck or tasty) against another version of bytestring, typically some existing pre-installed one. So again this means that our install plan has two versions of bytestring in it: the one we mean to build, and the one we use as a dependency for criterion. And again this is ok, just as with setup dependencies, because dependencies of test suites and benchmarks do not “leak out” and cause diamond dependency style type errors.

One technical restriction is that the test suite or benchmark must not depend on the library within the same package, but must instead use the source files directly. Otherwise there would genuinely be a cycle.

Now in general when we have multiple components in a .cabal file we want them to all use the same versions of their dependencies. It would be deeply confusing if a library and an executable within the same package ended up using different versions of some dependency that might have different behaviour. Cabal has always enforced this, and we’re not relaxing it now. The rule is that if there are dependencies of a test suite or benchmark that are not shared with the library or executable components in the package, then we are free to pick different versions for those than we might pick elsewhere within the same solution.

As another example – that’s nothing to do with cycles – we might pick different versions of QuickCheck for different test suites in different packages (though only where necessary). This helps with the problem that one old package might require QuickCheck == 2.5.* while another requires QuickCheck == 2.8.*. But it’d also be a boon if we ever went through another major QC-2 vs QC-3 style of transition. We would be able to have both QC-2 and QC-3 installed and build each package’s test suite against the version it requires, rather than freaking out that they’re not the same version.

Private dependencies in general

Technically, this work opens the door to allowing private dependencies more generally. We’re not pursuing that at this stage, in part because it is not clear that it’s actually a good idea in general.

Mark Lentczner has pointed out the not-unreasonable fear that once you allow multiple versions of packages within the same solution it will in practice become impossible to re-establish the situation where there is just one version of each package, which is what distros want and what most people want in production systems.

So that’s something we should consider carefully as a community before opening those flood gates.