Regression testing with Hackage

Sat, 21 Mar 2009 23:41:15 GMT, by duncan.
Filed under haskell-platform.

Suppose you wanted to do something rash like release a new version of some important piece of infrastructure like Cabal, haddock or indeed ghc itself. Of course you worry that your sparkling new release might have hidden regressions. If only you could check that you're not breaking anyone's code. Well, you can!

We can use the cabal command line tool to do regression testing. Basically we build all of Hackage with the old and new releases and then we compare the build reports to find regressions. Simple!

Let's look at the details...

The consistent subset of Hackage

At first you might think that we can just build all of hackage by making a list and asking cabal to install them all. It's not quite that simple. For one thing, some packages on hackage are simply impossible to install. Some are just missing dependencies (there are a small number of packages on Hackage that are best described as completely borked).

A bigger problem however is that in general it is not possible to install all the packages together consistently. Remember that cabal tries really hard to make sure that in each set of packages it installs there is at most one version of each package. But on Hackage there are some packages that need HaXml-1.13.x and some need HaXml-1.19.x. What are we to do? Sadly, we just have to pick one and chuck out the packages that require the version we didn't pick. But there are over 1,000 packages on Hackage! How do we find the set that we can install consistently. Yeah, it's a bit time consuming. Sorry. I'll show you how to do it though. I talked recently about how we might be able to automate this. Might even be doable as part of a GSoC project, who knows.

The algorithm is to start with all packages on Hackage:

cabal update
cabal list --simple-output | \
  cut -d' ' -f1 | uniq > pkgs

Now we ask cabal to try installing all of these:

cabal install $(cat pkgs) --dry -v

In truth, cabal list also lists packages that you've got installed but are not on hackage. So the first thing you'll have to do is remove any of those. For example:

There is no package named rts

Well, there is, but it's not on Hackage, so remove it from the pkgs file and try again.

After that you'll get the packages that simply cannot be installed because they're borked. It's an iterative process. You cast out packages that cannot be installed. It soon gets onto the more interesting cases: dependency conflicts. That's like the HaXml-1.13.x vs HaXml-1.19.x issue I mentioned earlier.

You can add constraints in your package list file for example:

HaXml==1.13.*

When I did this recently for the Cabal-1.6.0.2 release it looked to me like more packages would build if we picked the stable 1.13.x release rather than the development 1.19.x release. Similarly I ended up adding these constraints:

HTTP<4000
funsat<0.5.1
parse-dimacs<1.2
parsec<3
regex-base<0.80
regex-compat<0.80
regex-posix<0.80
rss<3000.1
xml-parsec<1.0.3

The list of packages I had to chuck out is rather longer. When I did this for Cabal-1.6.0.2 there were 1072 packages in my initial list and in then end I had to whittle it down to 985 before I could get a consistent solution.

If you actually try this then you'll notice that it soon gets annoying waiting for the cabal dependency resolver. I'll let you in on a secret. If you turn off the assertion checking in the resolver then it's about a bazillion times quicker. See the -fno-ignore-asserts in the cabal-install.cabal file? Leave that out and compile with -O and it's really much quicker. But shhh! Don't tell everyone! I sleep much better at night knowing that all the people using cabal-install at the moment are not getting completely bogus results due to some hidden internal error in the resolver (yeah yeah, they're getting bogus results for other reasons, but that's not the point right now).

I've got the list of packages that I used when I tested Cabal-1.6.0.2. I wanted to tell you how I got it first though, because it includes some essentially arbitrary choices on my part and it does not necessarily work with the current Hackage, it was only a snapshot. So, with all those caveats in mind, here's my list. You can diff it with the full list and make any alterations you like.

Isolated builds

We obviously want to be able to build all these packages without messing up your normal set of user packages. That and we want to build two ways without them getting in each others way. So what we want to do is have two (or more) isolated sets of builds. That means a separate install prefix but also separate ghc package DBs. We will also want separate locations for build reports of course.

So, here are the important flags:

--prefix=
--package-db=
--build-summary=
--build-log=

The --prefix and --package-db are needed to get the isolated builds. Note that if you do this then it's quite possible to run two sets of builds in parallel. They will not trample over each other because they're installing to different prefixes and registering into different package DBs.

The --build-summary and --build-log stuff is quite nice. The build summary is the machine readable information about the outcome of building a package. For example:

package: xmonad-0.8.1
os: linux
arch: x86_64
compiler: ghc-6.10.1
client: cabal-install-0.6.2
flags: -testing small_base
dependencies: X11-1.4.5
    base-3.0.3.0 containers-0.2.0.0
    directory-1.0.0.2 mtl-1.1.0.2
    process-1.0.1.0 unix-2.3.1.0
install-outcome: InstallOk
docs-outcome: NotTried
tests-outcome: NotTried

This is what we'll use at the end to compare and check for regressions. So we want a single build summary file. For build logs however it's much more convenient to have a log file per-package. That will let us go and look at the details of any regressions that we find. There are a few template variables that we can use in the log file name, in particular $pkgid.

So lets put things together. Lets suppose that you want to test two versions of ghc. I mentioned that I used this technique for comparing two versions of Cabal. It should also work for haddock because we can --enable-documentation and the build reports I think do correctly record if the docs built ok (whereas tests are currently ignored).

So you'll want two dirs, lets say ~surveyghc-6.10.1 and ~surveyghc-6.10.2. It's probably wise to use absolute paths.

export PREFIX=$HOME/survey/ghc-6.10.1
echo [] > $PREFIX/package.conf
cabal install
  --with-compiler=ghc-6.10.1
  -v $(cat pkgs)
  --prefix=$PREFIX
  --package-db=$PREFIX/package.conf
  --build-log=$PREFIX'/logs/$pkgid.log'
  --build-summary=$PREFIX/build.reports

Note that the $pkgid bit has to be in quotes because the template variable has to be passed to cabal and not expanded by the shell.

Obviously you then do this again with a different $PREFIX and this time you specify --with-compiler=ghc-6.10.2. Or if you were testing two versions of haddock you'd use --with-haddock. For different Cabal lib versions use the --cabal-lib-version flag.

If you're testing ghc itself then you probably want to build all these packages with -O, but if you're testing something else like Cabal or haddock then you'll save loads of time by using -O0. Even so, a full build of 900+ packages on hackage will take many hours. Run it overnight. Use nohup too.

Some more tricky details...

I would also recommend backing up and removing your per-user ghc package db. At the moment I don't think it's possible to do these kinds of isolated builds completely ignoring the per-user package db. It should be possible, and maybe it does work, but I've not tried it, so be on the safe side. Note that if you do this then you'll also want to register packages like Gtk2Hs into both isolated package DBs. Gtk2Hs isn't on Hackage yet so you have to register mtl then build Gtk2Hs twice. Sorry about that. Either that or you have to exclude all the packages on Hackage that use Gtk2Hs.

Analysing the results

The build reports are machine readable however the code to read them isn't part of a nice library yet. One day I'll make a hackage-client package but until then we just have to load one of the cabal-install modules in ghci:

cd cabal-install
cabal configure; cabal build
ghci -idist/build/autogen \
  -package base-3.0.3.0
  Distribution.Client.BuildReports.Anonymous

Now we can load up the two logs and compare them:

> :m +Distribution.Client.Utils
> :m +Data.Function
> :m +Distribution.Text
> reports1 <- fmap parseList $ readFile
      "survey/ghc-6.10.1/build.reports"
> reports2 <- fmap parseList $ readFile
      "survey/ghc-6.10.2/build.reports"
> let merged = mergeBy
      (compare `on` package)
      reports1 reports2
> let regressions =
      [ b | InBoth a b <- merged
      , installOutcome a == InstallOk
      , installOutcome b /= InstallOk ]
> mapM_ (putStrLn . display . package)
      regressions

Sorry about the mult-line formatting here. Width restrictions.

So that prints the list of packages that built fine before and now fail for some reason. You can then go and look in the per-package log files that we made and see what you can see. The cabal unpack command is really handy for trying to reproduce the problems.

There are obviously more things you can do here. You've got the data and you've got list comprehensions. There are lots of ad-hoc queries you can easily do. For example, you can look for the packages that cause the most knock-on failures. Just look for DependencyFailed and group by the package id of the failing package.

Actually, you might want to look more generally at which packages are causing many knock-on failures, not just regressions. Often there are C libs that you could install which would make these packages work and then you're getting better coverage in your regression testing. You'd be surprised by how many packages are bindings to obscure C libs that you probably don't have installed.

Note that if you are actually comparing ghc-6.10.1 vs 6.10.2 then you will also pick up all the regressions due to Cabal-1.6.0.1 vs 1.6.0.2. I described all these in an email to the libraries list. A lot of the regressions you find will actually be due to C libraries that you do not have installed. With Cabal-1.6.0.1 many such packages will build (if they're using pure FFI and no hsc2hs) whereas now Cabal-1.6.0.2 checks that the C lib actually exists and so you'll get a ConfigureFailed outcome. For this reason, if you're testing ghc-6.10.1 vs 6.10.2 you might be tempted to use ghc-6.10.1 using Cabal-1.6.0.2 to eliminate that set of issues. That's up to you of course.

These instructions are somewhat from memory so feel free to get in touch if you run into problems.

Have fun!