Cabal & Hackage hacking at ZuriHac

Duncan Coutts – Monday, 01 June 2015

At ZuriHac this weekend we had eight people hacking on Cabal or Hackage, many of whom are new contributors. There were a number of projects started as well as a number of smaller fixes completed.

In addition, there are three Google Summer of Code students working on Cabal and Hackage projects this summer:

They’re all just getting started, so more news about them later. All in all there seems to be a decent amount of progress at the moment across a range of issues. In particular we’re getting closer to solving some of the thornier “Cabal Hell” problems.

Heroic bug squashing

Oleg Grenrus was a bit of a hero in that as a new Cabal contributor, over two days of the hackathon, he managed to send in pull requests to fix five open tickets.

Another couple chaps (whose names to my shame have slipped my mind) dived in to fix old tickets on sanity checking absolute/relative paths for library directories in .cabal files and config files, and on passing GHC env vars to sub-commands like in cabal run/exec/test.

These in addition to the flurry of pull requests in recent weeks, and others from the hackathon, has given the regular Cabal hackers quite a pile of patches to get through. We hope to review and merge them in the next week or so.

Integrating package security for Cabal/Hackage

The work on securing the package download process that we announced a while ago is nearing the integration phase. While it’s been useful to have a couple people concentrate on implementing the core hackage-security library, at this stage it makes sense to open the process up and get more people involved.

Matthias Fischmann had proposed it as a ZuriHac project and organised a group of people who were interested. We discussed some of the issues involved with using the new hackage-security code in the cabal-install tool, and got started on some of the tasks.

Bootstrapping repository security

With public key crypto systems there’s always a need to somehow bootstrap the trust chains. For example with the public web certificate system (used by TLS / HTTPS) the root of the trust chains is the certificate authorities. We must know and trust their public keys to be able to verify the chain of trust for any particular website. But how should we establish trust in the certificate authorities’ keys in the first place? With web browsers this bootstrapping problem is resolved by the browser (or OS) shipping pre-installed with all the CA public keys.

For hackage servers we face a similar bootstrapping problem. Hackage security does not use public certificate authorities but there is a similar root of trust in the form of a set of root keys. For the central community hackage.haskell.org we can of course do the same thing as the web browsers and ship the server’s public root keys with cabal-install. But we need to support people making their own repositories and obviously we can’t ship all the public keys. So we need a convenient way for people to configure cabal-install to establish trust in a particular repository. The obvious thing is to specify the trusted public keys in the cabal configuration, where you specify the repository to use.

Currently in a cabal configuration file that part looks like:

remote-repo: hackage.haskell.org:http://hackage.haskell.org/

This syntax is too limited to support adding extra attributes like keys. So what people were working on at ZuriHac was supporting something like this:

remote-repo hackage.haskell.org
  url: http://hackage.haskell.org/
  keys: ed25519:9fc1007af2baff7088d082295e755102c1593cdd24b5282adbfa0613f30423f6
        ed25519:7cd11f018d5211f49b2fb965f18577d7f45c0c9be2a79f5467e18d0105ac1feb
        ed25519:26443e74981d5b528ef481909a208178371173ff7ccee8009d4ebe82ddb09e1e

So someone hosting their own hackage repo can provide instructions with a sample cabal.config or a block of text like the above to copy and paste into their config file, for people to use to get started.

This more flexible syntax will also allow miscellaneous other repository settings such as specific mirrors to use, or the ability to turn off security entirely.

Mirroring

Another couple people got started on writing a mirror client using the hackage-security library. While mirroring does not need a dedicated tool it is a rather convenient and efficient approach. It means we can use the ordinary HTTP interface rather than providing rsync or another interface and we can still do very bandwidth-efficient synchronisation. The advantage over a generic HTTP mirroring tool is that we have an index of the available packages and we know that existing packages are immutable, so we can simply diff the source and target indexes and copy over the extra packages.

In fact there are already two hackage mirror clients that do this. One of them pulls from one repo and pushes to a “smart” hackage-server. The other pulls from a repo and pushes to a repo hosted via S3. What is missing is the ability to mirror to a simple local set of files. Mirrors don’t have to be full hackage-server instances or S3, they can be ordinary HTTP servers like Apache or nginx that point at a set of files in the right layout. The hackage-security library is a convenient tool to use to write this kind of mirror since it handles all the details of the repository layout, and it supports doing incremental updates of the repository index. In this use case the security checks are merely sanity checks, as in the end, clients downloading from this mirror do their own checks.

So the work started by taking the existing hackage-server mirror and hackage-security demo client with the goal of replacing (or extending) the guts of the mirror client to use the hackage-security lib to download and to be able to manage a target repo as a set of local files.

Once the security work is integrated it will become much more useful to have public mirrors because clients then don’t need to trust the mirrors (we’re safe from MITM attacks). And hackage will distribute a list of public mirrors that the clients will use automatically. So having a decent mirroring client will become rather important. It’s also useful for the synchronisation to be very efficient so that the public mirrors can be nearly live copies.

Solving the cabal sandbox / global packages problem

A problem that people have been complaining about recently is that the Haskell Platform ships with lots of packages in the global package database, making it hard to compile packages that require non-standard versions of the platform packages.

This isn’t really a problem with the Haskell Platform at all, it’s really a problem with how cabal-install constructs its sandboxes, and fortunately it’s one that seems relatively easy to fix. Good progress was made on this ticket over the hackathon and hopefully it will be completed within the next couple weeks.

The problem is that cabal sandbox init makes an environment where with a package database stack consisting of the global one, plus a new empty local one. This means all the global packages are implicitly inside the sandbox already. That’s not so useful when you want to start with a minimal sandbox.

Originally this was a GHC limitation, that we always had to use the global package DB, however that has been fixed now for a couple GHC releases. So the solution we went for is to use only a local empty package DB, and to copy the registration information for a certain set of core packages into the local package DB. Ultimately we would like GHC to come supplied with the list of core packages, but for now we can just hard code the list.

Improving the tagging feature on Hackage

One new contributor started work on reimplementing the Hackage website’s tagging feature to make it more flexible and useful. The key idea is to make package categories into tags and make it easier to curate tags by solving the problem that lots of tags are essentially aliases for each other. This happens because each package author picks their tags themselves. So we will have sets of tag aliases, each with a canonical representative. Then any package using any alias will be assigned the canonical tag. The data model and user interface will make it possible for trustees to decide which existing tags ought to be aliased together and then do it. Ultimately, the tags and aliases should be useful in the existing hackage search.

Supporting curated package collections in Cabal and Hackage

Curated package collections are one of the two major parts to solving Cabal Hell.

People started work on supporting these natively in Cabal and Hackage. The idea is that proper integration will make them easier to use, more flexible and easier for people to make and distribute curated collections. Examples of curated collections include stackage (LTS and nightly snapshots) and the sets of versions distributed by Linux distros. Integration will allow simpler and shorter configurations, easier switching between collections and the ability to easily define collections either to distribute on Hackage or to use locally. By teaching cabal about collections it can give better error messages (e.g. when something cannot be installed because it’s not consistent with the collection(s) in use). Making collections easier to distribute via Hackage, and easier to combine, might open up new possibilities. For example we might see collections specifically to help people work with the popular web stacks (e.g. if those cannot always fit into a large general purpose collection). Or we might see collections of things you might like to avoid such as deprecated or known-broken packages. Combining collections would then allow you to configure cabal to use a large collection intersected with the negation of the collection of deprecated packages.