GHC and Windows DLLs

Fri, 03 Jul 2009 06:50:43 GMT, by ben.
Filed under coding, industrial-haskell-group.

Following on from Duncan's work on Building plugins as Haskell shared libs, I've been working on supporting the same functionality on Windows. The end goal is to have a rts.dll, libHsBase.dll and myPlugin.dll and be able to write things like Excel plugins in Haskell without needing to statically link the whole runtime system and set of libraries into each one.

Windows uses the Portable Executable (PE) Format, so the hoops that must be jumped through are different than those for Linux and Mac OS X. Linux uses ELF for its object format, and Mac OS X uses Mach-O. Tool chain programs such as linkers and object file views are also different.

One of immediate issues is to deal with mutually recursive imports between Haskell libraries and the GHC Run Time System (RTS). Clearly, the code for a Haskell library will call the RTS to perform tasks such as allocating memory, throwing exceptions, forking threads and so on. However, the runtime system also calls back on the base library. For example, here is a function from the RTS which helps to create parallel threads:

void createSparkThread (Capability *cap) {
    StgTSO *tso;
    tso = createIOThread (cap, RtsFlags.GcFlags.initialStkSize, 
                                  &base_GHCziConc_runSparks_closure);
    postEvent(cap, EVENT_CREATE_SPARK_THREAD, 0, tso->id);
    appendToRunQueue(cap,tso);
}

The variable base_GHCziConc_runSparks_closure is the name of a function closure in the GHC.Conc library which we won't have code for when we're linking the RTS.

One of the quirks of Windows is the need to generate so called "import libraries". These contain stub code that is used to call a function in a DLL. For example, if code in module main.o wants to call a function fun in a library base.dll, the picture looks something like this:

## in main.o ##################### (linked into main.exe)
main:
    call fun
    ....
    call dword ptr [__imp_fun]
    .... 


## in base.lib ###################### (linked into main.exe)
fun:
    jmp dword ptr [__imp_fun]

__imp_fun:
.data
    .dword fun


## in base.dll ######################
fun:
    .. actual code for fun   

In Windows, all calls to a function in a DLL go via the Imported function Address Table (IAT). This is a table of pointers, and in the example above there is one entry named __imp_fun. There are two ways to use this table. The first way is illustrated by the first call to fun in main.o. This call targets stub code that looks up the pointer from the table and then jumps to it. The second way is to lookup the pointer and jump to it directly, but to do this we need to know that the function is in an external DLL at code generation time. A call fun instruction uses a PC relative offset, and is physically shorter than a call dword ptr [] instruction, so it's not practical to change one to the other at link time.

The file base.lib is the "import library", which contains the call stub and the IAT. Import libraries need to be generated independently from the main compiling and linking process, using Windows specific tools. The import library for a particular dll is then linked into every executable (or other dll) that uses it.

Anyway, I've spent the last few days wading through MSDN and the GHC build system, and I think I've cataloged at least all the major hoops. I'll let you know how the jumping goes next post.