Overloaded record fields for GHC

Fri, 29 Nov 2013 16:38:21 GMT, by adam.
Filed under ghc.

TL;DR: GHC HEAD (but not GHC 7.8) will soon support OverloadedRecordFields, an extension to permit datatypes to reuse field labels and even turn them into lenses.

Introduction

The Haskell records system is a frequent source of frustration to Haskell programmers working on large projects. On the face of it, it is simple: a datatype such as

data Person = Person { id :: Int, name :: String }

gives rise to top-level projection functions:

id   :: Person -> Int
name :: Person -> String

However, this means that using multiple records with the same field label leads to name clashes. Workarounds are possible, such as declaring one datatype per module (so the module prefix can be used to disambiguate) or adding a per-datatype prefix to all the field labels, but they are somewhat clumsy.

Over the summer before I started working for Well-Typed, I implemented a new GHC extension, OverloadedRecordFields, that allows the same name to be used for multiple fields. This is not a whole new records system for Haskell, and does not include everything one might want (such as anonymous records), but it is a small step forward in a notorious quagmire. Proposals for better systems are welcome, but while it is easy to propose a more powerful design in isolation, integrating it with other extensions and syntax in GHC is another matter!

Unfortunately, the extension will not be ready for GHC 7.8, to allow time for the design to settle and the codebase changes to mature. However, it should land in HEAD soon after the 7.8 release is cut, so the adventurous are encouraged to build GHC and try it out. Feedback from users will let us polish the design before it is finally released in 7.10.

Record projection

The essential point of the Haskell records system is unchanged by this extension: record projections are still functions, except now they are polymorphic in the choice of datatype. Instead of

name :: Person -> String

we have

name :: (r { name :: t }) => r -> t

where r { name :: t } is a constraint meaning that the type r has a field name of type t. The typechecker will automatically solve such constraints, based on the datatypes in scope. This allows module abstraction boundaries to be maintained just as at present. If a module does not export a field, clients of that module cannot use the OverloadedRecordFields machinery to access it anyway.

For example, the following code will be valid:

data Company { name :: String, employees :: [Person] }
companyNames :: Company -> [String]
companyNames c = name c : map name (employees c)

Notice that name can be used at two different record types, and the typechecker will figure out which type is meant, just like any other polymorphic function. Similarly, functions can be polymorphic in the record type:

nameDefined :: (r { name :: [a] }) => r -> Bool
nameDefined = not . null . name

Record update

The traditional Haskell record update syntax is very powerful: an expression like

e { id = 3, name = "Me" }

can update (and potentially change the types of) multiple fields. The OverloadedRecordFields extension does not attempt to generalise this syntax, so a record update expression will always refer to a single datatype, and if the field names do not determine the type uniquely, a type signature may be required. With the Person and Company types as defined above, the previous expression is accepted but the definition

f x = x { name = "Me" }

is ambiguous, so a type signature must be given to f (or a local signature given on x or the update expression).

On the other hand, type-changing update of single fields is possible in some circumstances using lenses, discussed below.

Technical details

The constraint r { name :: t } introduced above is in fact syntactic sugar for a pair of constraints

(Has r "name", t ~ FldTy r "name")

where the Has r n class constraint means that the type r has a field named n, and the FldTy r n type family gives the type of the field. They use type-level strings (with the DataKinds extension) to name fields. The type family is used so that the field type is a function of the data type and field name, which improves type inference.

Unlike normal classes and type families, the instances of Has and FldTy are generated automatically by the compiler, and cannot be given by the user. Moreover, these instances are only in scope if the corresponding record projection is in scope. For example, if the name field of Person is in scope, the extension works as if the following declarations had been given:

instance Has Person "name"
type instance FldTy Person "name" = String

Lenses

An excellent way to deal with nested data structures, and to alleviate the shortcomings of the Haskell record system, is to use one of the many Haskell lens libraries, such as lens, data-lens, fclabels and so on. These pair together a "getter" and a "setter" for a field (or more generally, a substructure of a larger type), allowing them to be composed as a single unit. Many of these libraries use Template Haskell to generate lenses corresponding to record fields. A key design question for OverloadedRecordFields was how to fit neatly into the lenses ecosystem.

The solution is to generalise the type of field functions still further. Instead of the desugared

name :: (Has r "name") => r -> FldTy r "name"

we in fact generate

name :: (Accessor p r "name") => p r (FldTy r "name")

where Accessor is a normal typeclass (no instances are generated automatically). In particular, it has an instance

instance Has r n => Accessor (->) r n

so when p = (->), the new field type specialises to the previous type. Thus a field can still be used as a (record-polymorphic) function. Moreover, each lens library can give its own instance of Accessor, allowing fields to be used directly as lenses of that type. (Unfortunately this doesn't quite work for van Laarhoven lenses, as in the lens library, because they do not have the shape p r (FldTy r n). A wrapper datatype can be used to define a combinator that converts fields into van Laarhoven lenses.)

Concluding remarks

For more information, see the detailed discussion of the design and implementation on the GHC wiki. A prototype implementation, that works in GHC 7.6.3, shows how the classes are defined and gives examples of the instances that will be generated by GHC. Keep an eye out for the extension to land in GHC HEAD later this year, and please try it out and give your feedback!

I'm very grateful to Simon Peyton Jones for acting as mentor for this project and as my guide to the GHC codebase. Edward Kmett threw several wrenches in the works, and the final design owes a great deal to his careful scrutiny and helpful suggestions. Anthony Clayden also gave many helpful comments on the design. My thanks go also to the many denizens of the ghc-devs and glasgow-haskell-users mailing lists who gave feedback at various stages in the development process. This work was supported by the Google Summer of Code programme.