The dreaded diamond dependency problem

Thu, 24 Apr 2008 12:55:10 GMT, by duncan.
Filed under coding, cabal.

One issue that came up several times at the hackathon-especially when talking to the Yi hackers-is the dreaded diamond dependency problem. This happens when you have 4 packages in a diamond shaped dependency graph:
Diamond dependency graph
The problem arises when we have package B and C already installed but built against different versions of D and then we try to use both packages B and C together in package A:
Diamond dependency problem
This can work ok but only if packages B and C do not expose types defined in D in their interfaces. If they do then package A will not be able to use functions from B and C together because they will not be working with the same type. That is you'll get a type error.

To pick a concrete example, suppose package D is bytestring and we have both bytestring-0.9.0.1 and 0.9.0.4 installed. Lets say B is utf8-string and C is regex-base. Lets say that package A is the Yi editor program. So the point is, at some place in the code in Yi we want to pass a bytestring produced as the result of UTF-8 decoding as input to one of the regex functions. But this does not work because the functions in the utf8-string package are using the ByteString type from bytestring-0.9.0.1 while the regex functions in the regex package are using the ByteString type from bytestring-0.9.0.4. So we get a type error when we try to compile Yi:

Couldn't match expected type `bytestring-0.9.0.4:Data.ByteString.ByteString'
against inferred type `bytestring-0.9.0.1:Data.ByteString.ByteString'

As far as GHC knows, these two types are totally unrelated!

This is obviously extremely annoying. There is also no easy solution. In this example we're assuming that packages B and C have already been built, so there is actually no way to sensibly use the two packages together without rebuilding on of them against a different version of package D. In this case the obvious solution is to rebuild B to use the D-1.1 rather than D-1.0. The problem with rebuilding a package of course is it breaks all other packages that have already been built against it. It isn't clear that you want a package manager to go automatically rebuilding lots of apparently unrelated packages.

In the longer term the best solution would seem to be to do what Nix does. In the above example, instead of replacing package B built against D-1.0 with B built against D-1.1, Nix would add another instance of B built against D-1.1. So the original instance of B would remain unchanged and nothing would break. It's the functional approach: we never mutate values (installed packages) we just create new ones and garbage collect old one when they are no longer needed.

In practice it means we have to identify installed packages using some hash of the package and the hashes of all dependent packages. jhc already does this and there are moves afoot to do something similar for GHC, though aimed more at tracking API/ABI changes. For sane source based package management, I think it's the right direction to take.

I should note that this is not a new problem. You've been able to construct this problem ever since ghc started to allow multiple versions of the same package to be installed at once. We are just noticing it a lot more frequently now because we split up the base package and allow those split-out packages to be upgraded.

The current state of play is that Cabal warns of this problem but doesn't really help you solve it. For the above example we'd get:

$ cabal configure
Configuring A-1.0...
Warning: This package indirectly depends on multiple versions of
the same package. This is highly likely to cause a compile failure.
package B-1.0 requires D-1.0
package C-1.0 requires D-1.1