OverloadedRecordFields revived

Monday, 30 March 2015, by Adam Gundry.
Filed under coding, ghc, well-typed.

Way back in the summer of 2013, with support from the Google Summer of Code programme, I implemented a GHC extension called OverloadedRecordFields to address the oft-stated desire to improve Haskell’s record system. This didn’t get merged into GHC HEAD at the time, because the implementation cost outweighed the benefits. Now, however, I’m happy to report that Well-Typed are sponsoring the work required to get an improved version into GHC. Moreover, the first part of the work is already up for review on Phabricator.

The crucial thing that has enabled OverloadedRecordFields to get going again is that we’ve found a way to factor the design differently, so that we get a much better power-to-weight ratio for the extension by splitting it into two parts.

Part 1: Records with duplicate field labels

The first step is to cut down the core OverloadedRecordFields extension as much as possible. The essential idea is the same as it ever was, namely that a single module should be able to use the same field name in multiple datatypes, as in this example:

data Person  = Person  { personId :: Int, name :: String }
data Address = Address { personId :: Int, address :: String }

These definitions are forbidden in normal Haskell, because personId is defined twice, but the OverloadedRecordFields extension will permit them and instead postpone name conflict checking to use sites. The basic extension will require that fields are used in such a way that the relevant datatype is always unambiguously determined, and the meanings of record construction, pattern matching, selection and update will not change. This means that the extension can always be enabled for an existing module and it will continue to compile unchanged, an important property that was not true of the previous design.

The Haskell syntax for record construction and pattern-matching is always unambiguous, because it mentions the data constructor, which means that code like this is perfectly fine:

p = Person { personId = 1, name = "Donald" }
getId (Person { personId = i }) = i

On the other hand, record selector functions are potentially ambiguous. The name and address selectors can be used without restrictions, and with their usual types, but it will not be possible to use personId as a record selector if both versions are in scope (although we will shortly see an alternative).

Record update is a slightly more interesting case, because it may or may not be ambiguous depending on the fields being updated. In addition, since updates are a special syntactic form, the ambiguity can be resolved using a type signature. For example, this update would be ambiguous and hence rejected by the compiler:

f x = x { personId = 0 } -- is x a Person or an Address?

On the other hand, all these updates are unambiguous:

g :: Person -> Person
g x = x { personId = 0 }                 -- top-level type signature

h x = x { personId = 0 } :: Person       -- type signature outside

k x = (x :: Person) { personId = 0 }     -- type signature inside

l x = x { personId = 0, name = "Daffy" } -- only Person has both fields

Overall, this extension requires quite a bit of rewiring inside GHC to distinguish between field labels, which may be overloaded, and record selector function names, which are always unambiguous. However, it requires nothing conceptually complicated. As mentioned above, the implementation of this part is available for review on Phabricator.

Part 2: Polymorphism over record fields

While the OverloadedRecordFields extension described in part 1 is already useful, even though it is a relatively minor relaxation of the Haskell scoping rules, another important piece of the jigsaw is some way to refer to fields that may belong to multiple datatypes. For example, we would like to be able to write a function that selects the personId field from any type that has such a field, rather than being restricted to a single datatype. Much of the unavoidable complexity of the previous OverloadedRecordFields design came from treating all record selectors in an overloaded way.

But since this is new functionality, it can use a new syntax, tentatively a prefix # sign (meaning that use of # as an operator will require a space afterwards when the extension is enabled). This means that it will be possible to write #personId for the overloaded selector function. Since we have a syntactic cue, it is easy to identify such overloaded uses of selector functions, without looking at the field names that are in scope.

Typeclasses and type families will be used to implement polymorphism over fields belonging to record types, though the details are beyond the scope of this blog post. For example, the following definition is polymorphic over all types r that have a personId :: Int field:

getId :: r { personId :: Int } => r -> Int
getId x = #personId x

Moreover, we are not limited to using #personId as a selector function. The same syntax can also be given additional interpretations, allowing overloaded updates and making it possible to produce lenses for fields without needing Template Haskell. In fact, the syntax is potentially useful for things that have nothing to do with records, so it will be available as a separate extension (implied by, but distinct from, OverloadedRecordFields).

Further reading

More details of the redesigned extensions are available on the GHC wiki, along with implementation notes for GHC hackers. Last year, I gave a talk about the previous design which is still a good guide to how the types work under the hood, even though it predates the redesign.