GHC, primops and exorcising GMP

Duncan Coutts – Tuesday, 09 June 2009

all coding ghc

GHC uses GMP to implement the Haskell arbitrary-precision Integer type. It's been this way for ages.

For various reasons using GMP is a slight problem for some users. Some users don't really make use of Integer and don't like to have to link to GMP. Since GMP uses the LGPL, if you want to ship closed source programs then you have to link to it dynamically. On Windows static linking is the default so you have to jump through hoops to link it dynamically. Then there are also users who make heavy use of GMP and find that the Integer library is far too limited an interface to GMP. However binding extra GMP functions is complicated by the the way that the GHC RTS uses it already (especially the memory management).

So what these people want is a way to build GHC such that the RTS does not directly link to GMP. Then the implementation of Integer should be in a library that is replaceable so that one can use a simple slow implementation, a super-duper binding to GMP or some other "big num" library.

Daniel Peebles, Ian Lynagh and I have been working on this problem recently. Ian and my contributions to this are supported by the IHG.

Getting GMP out of the RTS

Before we can think about replacements however we need to disentangle GMP from the RTS and at least move the existing GMP-based Integer implementation into a library. This Integer implementation would remain the default so it still has to be fast. Daniel has managed to rip GMP out of the RTS and we're now focusing on how to move the GMP binding into its own library.

The difficulty of moving it out of the RTS is that currently almost all the GMP operations are bound as GHC "primops", as opposed to using the FFI. This is partly historical accident (FFI arrived on the scene relatively late) and partly that due to certain FFI restrictions, the primop route is simpler and faster. The issue is that the wrapper code (around the actual GMP calls) needs to return several results to Haskell land, in particular things like (# Int, ByteArray# #). Using the FFI it is possible to return several results but one has to do it in the time-honoured tradition of C and emulate "out" parameters by passing pointers. The problem with doing that is we would need to do a lot of marshaling: temporarily allocate some memory, pass pointers and read back the results. All this just to return a few integers and pointers. It's actually more tricky because at the level in the library stack where we have to implement Integer we do not actually have access to the FFI libraries (in fact currently we do not even have access to the IO type).

GHC primops

Primops bypass the single-result restrictions inherited from the C calling convention. We can write primops that directly return unboxed tuples, like (# Int, ByteArray# #). Primops (at least out-of-line primops) are implemented in Cmm, which is GHC's low level intermediate language based on the C-- language. These Cmm functions have to know exactly the internal calling convention that GHC uses, but there is no excess marshaling.

Unfortunately knowledge of the primops has to be baked into the compiler and the Cmm code has to be compiled into the RTS. So that's no good for implementing Integer a separate library from the RTS.

What if we could use the FFI to import Cmm functions...

foreign import prim

That would make it possible to have out-of-line primops in a library. The library would contain the compiled .cmm files and the .hs code in the same library would "foreign import" the cmm function. In particular we could then just move the .cmm code we use for wrapping the GMP library calls from the RTS into the integer-gmp package. Then instead of getting primops like plusInteger# from the GHC.Prim module, we would just foreign import them, eg:

foreign import prim "plusInteger" plusInteger#
  :: Int# -> ByteArray#
  -> Int# -> ByteArray#
  -> (# Int#, ByteArray# #)

So that's what I started implementing today, "foreign import prim". It needs a slight extension in the lexer, parser, type checker, desugarer, core->stg, and stg->cmm phases. That sounds like a lot but the changes in each bit are pretty small. As a feature it is very similar to foreign C calls and also to primops, so fortunately it can share most code with those existing features. So far it's going ok, I've got it producing convincing looking core, stg and cmm code. Tomorrow I'll test it and review the design and changes with Simon Marlow.

If this works out ok then it should mean we're still using the same well-tested gmp binding code and without any extra marshaling overhead. Correctness testing is mostly covered by the existing GHC testsuite. We still want to check the performance of course. To that end, Daniel has been working on an Integer performance benchmark. He's tried it already using the simple pure-Haskell implementation of Integer. Apparently it does respectably but takes ages to calculate 10000 factorial.