Edsko will be talking about the problems discussed in this blog post in his Haskell Implementors’ Workshop talk this Sunday, Aug 22. The talk will be broadcast live on YouTube.
Consider a module that contains nothing but the definition of a single large record and some type class instances:
{-# OPTIONS_GHC -fplugin=RecordDotPreprocessor #-}
module Example where
import Data.Aeson
import Generics.SOP.JSON
import Generics.SOP.TH
import GHC.TypeLits
newtype T (i :: Nat) = MkT Word
deriving (Show, Eq, ToJSON)
data R = MkR {
f00 :: T 00
f01 :: T 01
,-- .. lots more ..
f98 :: T 98
, f99 :: T 99
,
}deriving (Eq, Show)
'R
deriveGeneric '
instance ToJSON R where
= gtoJSON defaultJsonOptions toJSON
As it stands—using ghc
’s standard representation for records, along with the code generated by the RecordDotPreprocessor
plugin and the Generic
instance generated by generics-sop
—this results in a core
representation of a whopping 450,000 terms/types/coercions, takes 3 seconds to compile, and requires 500M of RAM to compile.1
If we change this module to
module Example where
import Data.Aeson (ToJSON(..))
import Data.Record.Generic.JSON
import Data.Record.TH
import Test.Record.Size.Infra
largeRecord defaultLazyOptions [d|
data R = MkR {
f00 :: T 00
, f01 :: T 01
-- .. lots more ..
, f98 :: T 98
, f99 :: T 99
}
deriving (Eq, Show)
|]
instance ToJSON R where
= gtoJSON toJSON
we get a module with essentially the same functionality, but with a core
size of a mere 14,000 terms/types/coercions, which compiles within 1 second and requires roughly 100M of RAM.
In this blog post we describe why this simple module generates so much code, and how the large-records
library manages to reduce this by more than an order of magnitude.
We wrote this library because Juspay recently engaged Well-Typed’s services, and one of their requests to us was to try and improve compilation time and compilation memory requirements for their code base. Juspay very generously allowed us to make large-records
open source, and it is now available on Hackage
.
Quadratic code size at every level
The reason the core
representation of our example module is so large is that unfortunately there are many examples of ghc
and other libraries being accidentally quadratic. Before we look at some concrete examples, let’s first investigate where this quadratic code size is coming from. As we will see, it arises at every level: terms, types, type classes, and type level programming.
Warmup: terms
For our running example, we will want to have a record with lots of fields. To avoid some “accidental optimizations,” we’ll give each of those fields a different type. To make that a little bit easier, we’ll just introduce a single type that is indexed by a natural number, so that this one type definition gives us as many different types as we need:
data T (n :: Nat) = MkT Int
That out of the way, consider a record with lots of fields, such as
data R = MkR {
f00 :: T 00
f01 :: T 01
, f02 :: T 02
,-- .. lots more ..
f98 :: T 98
, f99 :: T 99
, }
When we define a record, ghc
will generate field accessors for all fields in the record. In other words, it will derive functions such as
f00 :: R -> T 0
These functions are not difficult to generate, of course. Each function is just a simple case
statement:
= \(r :: R) ->
f00 case r of
MkR x00 x01 x02 x03 x04 x05 x06 x07 x08 x09
x10 x11 x12 x13 x14 x15 x16 x17 x18 x19-- .. lots more ..
->
x90 x91 x92 x93 x94 x95 x96 x97 x98 x99 x00
Although simple, this case statement mentions a lot of unused variables (99 of them, in fact). Moreover, each of those variables is annotated with their type. This means that this one function is actually rather large; ghc
reports that it contains 5 terms and 202 types. The size of this function is clearly linear in the number n
of fields we have; moreover, ghc
will generate one function for each of those n
fields; that means that simply declaring the record will already generate code that is O(n²)
in size.
More subtle: types
Suppose we define an applicative “zip
” function for R
, something like
zipMyRecordWith ::
Applicative f
=> (forall n. T n -> T n -> f (T n))
-> R -> R -> f R
=
zipMyRecordWith f r r' pure MkR
<*> f (f00 r) (f00 r')
<*> f (f01 r) (f01 r')
<*> f (f02 r) (f02 r')
-- .. lots more ..
<*> f (f98 r) (f98 r')
<*> f (f99 r) (f99 r')
Clearly the size of this function is at least linear in the number of record fields, that much is expected. However, -ddump-simpl
tells us that this function contains 50,818 types! Where are all of those coming from?
Recall the type of (<*>)
:
(<*>) :: Applicative f => f (a -> b) -> f a -> f b
Those type variables need to be instantiated; f
is always instantiated to the f
type parameter passed to zipMyRecordWith
, a
is the type of the next field we’re applying, but what about b
? Let’s annotate zipMyRecordWith
with the types of a
and b
in pseudo-Haskell:
zipMyRecordWith ::
Applicative f
=> (forall n. T n -> T n -> f (T n))
-> R -> R -> f R
=
zipMyRecordWith f r r' pure MkR
<*> @(T 00) @(T 01 -> T 02 -> T 03 -> .. -> T 99 -> R) f (f00 r) (f00 r')
<*> @(T 01) @( T 02 -> T 03 -> .. -> T 99 -> R) f (f01 r) (f01 r')
<*> @(T 02) @( T 03 -> .. -> T 99 -> R) f (f02 r) (f02 r')
-- .. lots more ..
<*> @(T 98) @( T 99 -> R) f (f98 r) (f98 r')
<*> @(T 99) @( R) f (f99 r) (f99 r')
The first instantiation of (<*>)
mentions the type of every single field; the second mentions the types of all-but-one field, the next of all-but-two, etc. This means that the size of this single function is once again O(n²)
in the number of record fields.
Type class dictionaries
Suppose we wanted to capture the concept “some constraints c
applied to the types of all fields in our record”:
class (
T 00)
c (T 01)
, c (T 02)
, c (-- .. lots more ..
T 98)
, c (T 99)
, c (=> Constraints_R c )
This should be fine right? Right? Wrong.
When we declare a type class, we’re effectively constructing a record type with fields for each of the methods of the class; this record is known as the “dictionary” for the class. Superclass constraints translate to “subdictionaries”: when a class such as Ord
has a superclass constraint on Eq
, then the dictionary for Ord
will have a field for an Eq
dictionary.
This means that this definition of Constraints_R
is actually of a very similar nature to the definition of R
itself: it defines a record with 100 fields. And just like for the record, ghc
will generate “field accessors” to extract the fields of this dictionary; put another way, those field accessors “prove” that if we know Constraints_R c
, we also know c (T 00)
, c (T 01)
, etc. What do those field accessors look like? You guessed it, a big pattern match; in pseudo-Haskell:
$p1Constraints_R :: Constraints_R c => c (T 0)
$p1Constraints_R = \dict ->
case dict of
Constraints_R d00 d01 d02 d03 d04 d05 d06 d07 d08 d09
d10 d11 d12 d13 d14 d15 d16 d17 d18 d19-- .. lots more ..
->
d90 d91 d92 d93 d94 d95 d96 d97 d98 d99 d00
Since ghc
generates a projection like this for each superclass constraint, this once again results in code of size that is O(n²)
in the number of record fields.
Type level induction
So far all our examples have been simple Haskell; for our next example, we’ll get a bit more advanced. Just like we can have list values, we can also have list types; for example, here is a type-level list of the indices of the T
types used inside our running example record R
:
type IndicesR = '[
00, 01, 02, 03, 04, 05, 06, 07, 08, 09
10, 11, 12, 13, 14, 15, 16, 17, 18, 19
, -- .. lots more ..
90, 91, 92, 93, 94, 95, 96, 97, 98, 99
, ]
There are many use cases for type level lists. For example, we can define a type NP
such that NP f [x, y, .., z]
is basically the same as (f x, f y, .., f z)
:
data NP (f :: k -> Type) (xs :: [k]) where
Nil :: NP f '[]
(:*) :: f x -> NP f xs -> NP f (x ': xs)
If we have a T
for every index in IndicesR
, we can construct a value of our record:
npToR :: NP T IndicesR -> R
:* f01 :* f02 :* f03 :* f04 :* f05 :* f06 :* f07 :* f08 :* f09
npToR ( f00 :* f10 :* f11 :* f12 :* f13 :* f14 :* f15 :* f16 :* f17 :* f18 :* f19
-- .. lots more ..
:* f90 :* f91 :* f92 :* f93 :* f94 :* f95 :* f96 :* f97 :* f98 :* f99
:* Nil ) = MkR {..}
The compiled size of npToR
is large, but it is linear in size (total size of 4441 terms, types and coercions for 100 fields, and a total size of 2241 for 50 fields). So far so good.
In order to get to the problem I’d like to illustrate in this section, we need one more concept. Suppose wanted to write a function that can construct a value of NP T xs
for any xs
:
mkNP :: NP T xs
This should be possible, since T
is just a wrapper around an Int
, and so all we need to do is generate as many T
s as there are elements in the type level list xs
. However, xs
is a type level list, and we cannot pattern match on types in Haskell; indeed, they do not even exist at all at run-time.
Therefore we somehow need to reflect the type level list at the term level: we need a value that corresponds exactly to the type. We do this by introducing a new type, indexed by a type level list, so that given a type level list of a particular length, our new type has exactly one value. Such a type—a type with exactly one value—is known as a singleton type:
data SList :: [k] -> Type where
SNil :: SList '[]
SCons :: SList xs -> SList (x ': xs)
SList
gives us a value that we can pattern match on, and when we do we discover something about the shape of the type level list xs
:
mkNP' :: SList xs -> NP T xs
SNil = Nil
mkNP' SCons s) = MkT 0 :* mkNP' s mkNP' (
The closest we can come to mkNP
is to make this singleton value implicit:
class SListI (xs :: [k]) where
sList :: SList xs
mkNP :: SListI xs => NP T xs
= mkNP' sList mkNP
If we now try to use mkNP
to construct a value of our record
r0 :: R
= npToR mkNP r0
we will of course find that we need an instance of SListI
for IndicesR
. Our first instinct might be to write something like
instance SListI IndicesR where
=
sList SCons
$ SCons
$ SCons
-- .. lots more ..
$ SCons
$ SCons
$ SNil
but if we do that, we will soon discover that the compiled code is quadratic in size. We could have predicted that: it’s the same problem as in the “Types” section above, with ($)
playing the role of (<*>)
. But even if we write it as
instance SListI IndicesR where
=
sList SCons (
SCons (
SCons (
-- .. lots more ..
SCons (
SCons (
SNil
{- .. lots more brackets .. -} )) )))
we’re still in trouble: each of those SCons
applications has two type argument x
and xs
(the type level list of the tail). So with some type annotations, this code is
instance SListI IndicesR where
=
sList SCons @00 @'[01, 02, 03, .., 99] (
SCons @01 @'[ 02, 03, .., 99] (
SCons @02 @'[ 03, .., 99] (
-- .. lots more ..
SCons @98 @'[ 99] (
SCons @99 @'[ ] (
SNil
{- .. lots more brackets .. -} )) )))
So this code is again O(n²)
in size (actually, the real code generated by ghc
is much worse than this, due to the fact that SList
is a GADT; after desugaring, the function has a total size of 15,352, and after the simplifier runs (in -O0
) that expands to a whopping 46,151).
Experienced type-level Haskellers might be surprised that we’d try to write this SListI
instance by hand. After all, the definition of a singleton type is that it is a type with only a single value, and so we should be able to just derive it automatically. Indeed we can:
instance SListI '[] where
= SNil
sList
instance SListI xs => SListI (x ': xs) where
= SCons sList sList
Surely we should be good now, right? These definitions are small, and don’t deal with concrete large lists, and so we avoid quadratic code size. Right? Wrong.
Although it is true that the two instances for SListI
are unproblematic, the moment that we use npToR mkNP
, ghc
needs to prove SListI '[00, 01, .. 99]
. In other words, it must generate code that produces a dictionary for SListI [00, 01, .. 99]
. Since SListI
for (x ': xs)
has a superclass constraint SListI xs
, the dictionary for SListI [00, 01, .., 99]
will have a field for the dictionary for SListI [01, .., 99]
, all the way down to the empty type level list. This means that ghc
will generate 100 dictionaries; each of those dictionaries contains an SCons
application with the same type annotation as in hand-written code above. This means that we still have code that is O(n²)
in size.
Concrete examples
In the previous section we discussed the ways in which we might end up with accidentally quadratic code size. In this section we will consider some examples of code generated by specific libraries. We will start with GHC generics, which is actually a good example: it generates code of size O(n log n)
rather than O(n²)
. After that we will discuss record-dot-preprocessor
and generics-sop
, both of which do generate code of O(n²)
size.
GHC Generics
The goal of generic programming is to be able to write a single function that can be applied to values of lots of different types. Generics libraries such as GHC.Generics
and generics-sop
(discussed below) do this by translating the value to a representation type; since every type can be translated to only a handful of different representation types, it suffices to write a function over all of those representation types.
Here we will discuss a simplified form of GHC.Generics
that still illustrates the same point. The generic representation of a record such as the one above is essentially just a large nested tuple. For the GHC library itself it does not actually matter terribly how this tuple is created; for example, this would work:
type family GHC_Rep (a :: Type) :: Type
type instance GHC_Rep R = (T 00, (T 01, (T02, ... (T98, T99))))
Although it would work, it would not be great from a code size perspective. Consider the function that would translate R
to GHC_Rep R
:
ghcTo :: R -> GHC_Rep R
MkR{..} =
ghcTo @(T 00) @(T 01, (T02, ... (T98, T99))) f00 (
(,) @(T 01) @(T02, ... (T98, T99)) f01 (
(,) @(T 02) @(... (T98, T99)) f02 (
(,) -- .. lots more ..
@(T 99) @(()) f99 (
(,) () ))))
This pattern is starting to get familiar at this point; with this representation, ghcTo
would be O(n²)
in size. Fortunately, GHC generics avoids this problem by instead generating a balanced representation, something like
type instance GHC_Rep R =
T 00, ( T 01, T 02 ) )
( ( ( ( ( ( T 03, ( T 04, T 05 ) )
, (
)-- .. lots more ..
T 46, T 47 )
, ( ( T 48, T 49 )
, (
) ) ) ) )T 50, ( T 51, T 52 ) )
, ( ( ( ( ( T 53, ( T 54, T 55 ) )
, (
)-- .. lots more ..
T 96, T 97 )
, ( ( T 98, T 99 )
, ( ) ) ) ) ) )
With this representation, the number of branches is still the same, but in the translation function the type annotations are now halved in size at each branch, rather than reduced by 1:
ghcTo :: R -> GHC_Rep R
MkR{..} =
ghcTo
( ( ( ( ( ( f00, ( f01, f02 ) )
, ( f03 , ( f04, f05 ) )
)-- .. lots more ..
, ( ( f96, f97 )
, ( f98, f99 ) ) ) ) ) ) )
As a consequence, the size of this version, and the cost of GHC generics in general, is actually O(n log n)
in the number of record fields, although the constant factor is reasonably high.
It is worth emphasizing how much better O(n log n)
is than O(n²)
. Here is a plot of the cost of GHC generics (in terms of AST size: terms, types and coercions) as we vary the number of record fields from 100 fields to 1000 fields:
This almost looks linear. It isn’t; the cost per field is roughly 460 terms/types/coercions when we have 100 fields, and that increases to roughly 625 when we get to 1000 fields, but the cost only goes up very slowly.
The RecordDotSyntax
preprocessor
The record-dot-preprocessor
is a preprocessor and GHC plugin for the RecordDotSyntax GHC proposal. The preprocessor interprets specialized “record syntax”; for example, it translates
.lbl2 = val} expr{lbl1
to
@"lbl1" expr $ setField @"lbl2" (getField @"lbl1" expr) val setField
These two functions getField
and hasField
come from a HasField
class currently provided by record-hasfield
(although now that the Add setField
to HasField
proposal is accepted, this should eventually move to base
):
class HasField x r a | x r -> a where
hasField :: r -> (a -> r, a)
When you include
{-# OPTIONS_GHC -fplugin=RecordDotPreprocessor #-}
at the top of your Haskell file, the RecordDotPreprocessor
plugin will generate HasField
instances for you. They look innocuous enough:
instance HasField "f00" R (T 00) where
= (\x -> r { f00 = x }, f00 r) hasField r
Unfortunately, once we get to ghc
’s internal representation, this is much less innocent:
hasField_f00 :: R -> (T 0 -> R, T 0)
= (
hasField_f00 r -> case r of
\new MkR x00 x01 x02 x03 x04 x05 x06 x07 x08 x09
x10 x11 x12 x13 x14 x15 x16 x17 x18 x19-- .. lots more ..
->
x90 x91 x92 x93 x94 x95 x96 x97 x98 x99 MkR new x01 x02 x03 x04 x05 x06 x07 x08 x09
x10 x11 x12 x13 x14 x15 x16 x17 x18 x19-- .. lots more ..
x90 x91 x92 x93 x94 x95 x96 x97 x98 x99case r of
, MkR x00 x01 x02 x03 x04 x05 x06 x07 x08 x09
x10 x11 x12 x13 x14 x15 x16 x17 x18 x19-- .. lots more ..
->
x90 x91 x92 x93 x94 x95 x96 x97 x98 x99
x00 )
We saw this before when we discussed the record field accessors; the same linear cost that a field accessor induces is induced here as well, with a larger constant factor. As for field accessors, we need to generate a HasField
instance for every field, and hence altogether we once again have code that is O(n²)
in size.
This really matters: a module containing just the declaration of our record size has a total size of 22,065 terms/types/coercions after desugaring, and 22,277 after the simplifier (with -O0
). This is already much bigger than it should be, due to the quadratic nature fo the field accessors. Generating HasField
instances for all fields results in a total code size of 58,665 after desugaring and 78,977 after the simplifier. And all we’ve done is define the record! (For comparison, with large-records
, a module containing a single record with 100 fields has a mere total size of 8,305 after desugaring, expanding to 13,958 after the simplifier, and that is including support for generics).
SOP generics
The generics-sop
library is similar in nature to GHC generics, but it uses a different generic representation. It is described in detail in the paper True Sums of Products; here we give a simplified presentation.
In fact, we have already seen most ingredients. The generics-sop
representation for a record is essentially the NP
type that we discussed in “Type level induction,” above. In that section we saw that the function npToR
that translates from the generic representation to R
is linear in size. Unfortunately, the same is not true for the inverse function:
npFromR :: R -> NP T IndicesR
MkR{..} = (
npFromR :* f01 :* f02 :* f03 :* f04 :* f05 :* f06 :* f07 :* f08 :* f09
f00 :* f10 :* f11 :* f12 :* f13 :* f14 :* f15 :* f16 :* f17 :* f18 :* f19
-- .. lots more ..
:* f90 :* f91 :* f92 :* f93 :* f94 :* f95 :* f96 :* f97 :* f98 :* f99
#endif
:* Nil
)
That wildcard pattern match MkR{..}
will expand to a pattern match for a variable for every field, but that doesn’t matter here: we only generate one translation function, and it’s fine if that is linear in size. The problem however is in the body of this function. After the previous examples, perhaps you can spot the problem already: (:*)
has a bunch of type arguments, and one of those is the list of indices at the tail; so this code looks something like
MkR{..} =
npFromR :*) @00 @'[1, 2, .., 98, 99] f00 (
(:*) @01 @'[ 2, .., 98, 99] f01 (
(:*) @02 @'[ .., 98, 99] f02 (
(-- .. lots more ..
:*) @98 @'[ 99] f98 (
(:*) @99 @'[ ] f99 (
(Nil )))))
a depressingly familiar sight at this point (and again, the real code is worse, due to the fact that NP
is a GADT). This matters: the size of npToR
is 4,441 terms/types/coercions, the size of npFromR
is 46,459.
The generics-sop
library suffers from quadratic code size in other places as well. It makes heavy use of type-level lists, in a similar style to SListI
above, and with the same kinds of problems. It also represents metadata at the type-level, rather than just at the term level, which are more type-level lists. These are not fundamental problems; generics-sop
simply wasn’t designed with the goal to optimize for code size reduction in mind. As we saw in the section on GHC generics, these costs can probably be brought down to O(n log n)
, though this will require careful thought.
The large-records
library
When you use the large-records
library and define
largeRecord defaultLazyOptions [d|
data R = MkR {
f00 :: T 00
, f01 :: T 01
, f02 :: T 02
-- .. lots more ..
, f98 :: T 98
, f99 :: T 99
} |]
You get the definition of a type R
with field accessors, HasField
instances for every field, and a Generic
instance (albeit for a custom generics library), but the code will be entirely linear—O(n)
—in the size of the record. In this section we will see how large-records
achieves this for the basic definitions; we will discuss generics separately in the next section.
Representation
As we saw, the moment that we declare a record, ghc
will generate field accessors for each field of the record, resulting in code that is O(n²)
in size. It follows that we cannot use the normal representation of the record. Instead, large-records
generates the following:
newtype R = LR__MkR { vectorFromR :: Vector Any }
That is, we are representing the record basically by an untyped vector which will have an entry for every field in the record.2 Of course, typically users will never deal with this untyped representation directly, but use the field accessors or HasField
instances, which we will discuss next.
Field accessors
Along with the definition of R
, large-records
generates an unsafe function that can return any element of the vector at any type:
unsafeGetIndexR :: Int -> R -> a
= noInlineUnsafeCo $ vectorFromR t n unsafeGetIndexR n t
where noInlineUnsafeCo
is a non-inlinable form of unsafeCoerce
3. Just like the internal representation of R
, this function is not intended for normal use. Instead, it is used to define field accessors for each field. For example, here is the definition of the accessor f00
:
f00 :: R -> T 0
= unsafeGetIndexR 0 f00
One of these accessors is generated for every field, but the size of each accessor is constant (and tiny), so the generation of all accessors results in code that is O(n)
in size.
HasField
instance
The HasField
instance is very similar. Along with the unsafe accessor, we also define an unsafe update function:
unsafeSetIndexR :: Int -> R -> a -> R
= LR__MkR $
unsafeSetIndexR n r x unsafeUpd (vectorFromR r) [(n, noInlineUnsafeCo x)]
The HasField
instance is now easy, and once again constant in size and tiny:
instance HasField "f00" R (T 0) where
= (unsafeSetIndexR 0 r, unsafeGetIndexR 0 r) hasField r
Pattern synonym
By default large-records
does not generate a pattern synonym for R
. It can do, if requested:
= True}) [d|
largeRecord (defaultLazyOptions {generatePatternSynonym
data R = MkR {
f00 :: T 00
, f01 :: T 01
, f02 :: T 02
-- .. lots more ..
, f98 :: T 98
, f99 :: T 99
} |]
With the generatePatternSynonym
option, large-records
generates two new definitions. First, a function that constructs a tuple containing all fields of the record:
tupleFromR :: R
-> ( (T 00, T 01, T 02, {- .. lots more .. -}, T 60, T 61)
T 62, T 63, T 64, {- .. lots more .. -}, T 98, T 99)
, (
)= (
tupleFromR r 00 r
unsafeGetIndexR 01 r
, unsafeGetIndexR 02 r
, unsafeGetIndexR -- .. lots more ..
60 r
, unsafeGetIndexR 61 r
, unsafeGetIndexR
)62 r
, ( unsafeGetIndexR 63 r
, unsafeGetIndexR 64 r
, unsafeGetIndexR -- .. lots more ..
98 r
, unsafeGetIndexR 99 r
, unsafeGetIndexR ) )
(It was careful to use a nested tuple, because ghc
does not support tuples with more than 62 fields.) It then uses this function as a view pattern in an explicitly bidirectional pattern synonym:
pattern MkR :: T 0 -> T 1 -> T 2 -> {- .. lots more .. -} -> T 98 -> T99 -> R
pattern MkR{f00, f01, f02, {- .. lots more .. -}, f98, f99} <-
-> ( ( f00, f01, f02, f03, f04, f05, f06, f07, f08, f09
tupleFromR
, f10, f11, f12, f13, f14, f15, f16, f17, f18, f19-- .. lots more ..
, f60, f61
)
, ( f62, f63, f64, f65, f66, f67, f68, f69-- .. lots more ..
, f90, f91, f92, f93, f94, f95, f96, f97, f98, f99
) )where
MkR x00 x01 x02 {- .. lots more .. -} x98 x99 = RFromVector $ fromList [
unsafeCoerce x00
, unsafeCoerce x01
, unsafeCoerce x02-- .. lots more ..
, unsafeCoerce x98
, unsafeCoerce x99 ]
The pattern synonym makes it possible to pattern match on R
or construct R
values as if it was defined in the normal way. The reason that we don’t generate it by default is an annoying one: when we declare a record pattern synonym, ghc
very “helpfully” generates field accessors, which was precisely what we were trying to avoid.
So the option is there for code bases that need it; when enabled, we reintroduce a quadratic component, albeit a reasonably small one, and the rest of the generated code is still linear. Fortunately, many code bases that deal with large records never actually construct such records directly. They are instead populated from databases or from REST requests; in such cases, the pattern synonym is not required. In addition, large-records
provides a quasi-quoter that can be used in lieu of the pattern synonym so that you can write, for example,
firstTwo :: R -> (T 0, T1)
= (x0, x1) firstTwo [lr| MkR { f00 = x0, f01 = x1 } |]
In ghc- 9.2
we have the new NoFieldSelectors
language pragma (ghc proposal, merge request) which might improve the situation here, but I haven’t experimented with that yet.
Generics
The large-records
library comes with its own Generic
class. Although the class and its instances are defined rather differently, usage of the class—that is, the way you’d write generic functions—is very similar in style to generics-sop
. The class is defined as
class Generic a where
-- Translation
from :: a -> Rep I a
to :: Rep I a -> a
-- Constraints
type Constraints a :: (Type -> Constraint) -> Constraint
dict :: Constraints a c => Proxy c -> Rep (Dict c) a
-- Metadata
metadata :: Proxy a -> Metadata a
We will discuss the various parts of this class separately, and finish with an example of a generic function that ties all this together.
Translation to and from the generic representation
The generic representation used by large-records
is
newtype Rep f a = Rep (Vector (f Any))
In other words, if we pick the identity functor for f
, then Rep f a
is just Vector Any
; this means that from
and to
can just be coerce
:4
instance Generic R where
= coerce
from = coerce
to -- .. more ..
Constraints
The Constraints
type family is meant to capture the concept of “a constraint applied to all fields in the record.” We encountered this idea above in the section “Type class dictionaries,” where we noticed we have to be careful. Here is how large-records
instantiates Constraints
for our running example:
class Constraints_R c where
dictConstraints_R :: Proxy c -> Rep (Dict c) R
instance Generic R where
type Constraints R = Constraints_R
= dictConstraints_R
dict -- .. more ..
Constraints_R
does not have any superclass constraints, thus avoiding the quadratic cost of all the dictionary projections. Instead, it is constructing a vector of dictionaries. Essentially, we are constructing our own representation of a dictionary, just like we constructed our own representation for a record. Unlike the GHC representation of dictionaries, we can project from our representation by just doing a vector index.
Fortunately, we can give an instance for Constraints_R
:
instance ( c (T 0)
T 1)
, c (T 2)
, c (-- .. lots more ..
T 98)
, c (T 99)
, c (=> Constraints_R c where
) = Rep $ fromList [
dictConstraints_R p Proxy @(T 0))
unsafeCoerce (dictFor p) (Proxy @(T 1))
, unsafeCoerce (dictFor p) (Proxy @(T 2))
, unsafeCoerce (dictFor p) (-- .. lots more ..
Proxy @(T 98))
, unsafeCoerce (dictFor p) (Proxy @(T 99))
, unsafeCoerce (dictFor p) ( ]
In core, this results in a single function with 100 arguments, that constructs our custom dictionary representation. This function is reasonably large (3523 terms/types/coercions), but not excessively so, and moreover, it is O(n)
in the number of record fields.
Metadata
The Metadata
provided by large-records
is currently fairly minimal; it tells us the names of the record, its constructor, and the fields, as well as the strictness of each field:
data Metadata a = Metadata {
recordName :: String
recordConstructor :: String
, recordSize :: Int
, recordFieldMetadata :: Rep FieldMetadata a
,
}
data FieldStrictness = FieldStrict | FieldLazy
data FieldMetadata x where
FieldMetadata ::
KnownSymbol name
=> Proxy name
-> FieldStrictness
-> FieldMetadata x
Metadata is generated automatically by the largeRecord
invocation.
Note that the name of each field is at the type level; this is primarily to help with integration with GHC generics. This is essential for the integration of large-records
with other libraries, but a detailed discussion of this is outside the scope of this blog post.
Writing generic functions
An in-depth discussion on how to write generic functions in the style of generics-sop
is beyond the scope of this blog post; interested readers may wish to refer to the True Sums of Products paper or watch Andres Löh’s generics-sop
tutorial at ZuriHac 2016 or read his SSGEP summer school lecture notes. Here we will discuss one (typical) example only.
Generic functions in generics-sop
style are written using a set of combinators on the generic representation. As a simple example, we can zip two Rep
s:
zipWith :: (forall x. f x -> g x -> h x)
-> Rep f a -> Rep g a -> Rep h a
It is worth spelling out what this function is doing: it is zipping together all the fields of two records, albeit in their generic representation (Rep
). We therefore require that the function we’re zipping with is polymorphic: we must be able to zip fields no matter their type.
As another example, we can map a function over all fields in the record; there is a function map
, as you’d expect:
map :: (forall x. f x -> g x) -> Rep f a -> Rep g a
but there is also a slightly more useful version of this:
cmap :: (Generic a, Constraints a c)
=> Proxy c
-> (forall x. c x => f x -> g x)
-> Rep f a -> Rep g a
This constrained map still requires the function we’re applying to all fields to be polymorphic, but it is given the guarantee that the types it is applied to all satisfy constraint c
. Of course, in order to be able to do that, we need constraint c
to hold for all fields, which is why this function requires Constraints a c
. There is a similar generalization of zipWith
called czipWith
, as well as applicative versions of both of these functions.
The final combinator we will discuss applies when we have a vector in which every element has the same type: in this case, we can collapse the vector to an ordinary list:
collapse :: Rep (K a) b -> [a]
Here is an example of a generic function that can translate any value to JSON, provided it has a Generic
instance and provided we have ToJSON
instances for all fields:
gtoJSON :: forall a. (Generic a, Constraints a ToJSON) => a -> Value
=
gtoJSON
Aeson.object. Rep.collapse
. Rep.zipWith (mapKKK $ \n x -> (Text.pack n, x)) recordFieldNames
. Rep.cmap (Proxy @ToJSON) (K . toJSON . unI)
. from
where
Metadata{..} = metadata (Proxy @a)
If you are new to generics-sop
style programming, this function may require careful reading, but the essence of it is simple:
- Translate the value to its generic representation using
from
- Apply
toJSON
to all fields - Zip it with the metadata to get a vector of
(name, value)
pairs - Collapse that vector to a list
- Use Aeson’s standard
object
function to create the JSON object.
Once you are used to the paradigm, generics-sop
style programming allows for the very succinct definition of generic functions (I’d go so far as calling it “elegant,” but then I am biased). Typically a function like gtoJSON
would be used to derive a ToJSON
instance:
instance ToJSON R where
= gtoJSON toJSON
Indeed, the large-records
library can generate such instances automatically for a handful of classes.
Transforms
The only aspect of the generics programming aspect of large-records
we have not covered here is what is known as “transforms” in generics-sop
style programming. Transforms allow you to generically modify type indices; for example, suppose we have a HKD-style definition of a table, as used in the beam
database library:
data Table f = MkTable {
tableField1 :: f Int
tableField2 :: f Maybe
, }
We could write a generic function that could, say, turn a Table Identity
into a Table Maybe
. Doing this without using type-level lists anywhere is surprisingly difficult, and a discussion of how this is done in large-records
is probably a good topic for a separate blog post. For now we just refer any interested readers to the examples in the test suite.
Talking of beam
, large-records
integrates smoothly with beam-core
by means of an auxiliary library beam-large-records
. This library is on GitHub but is not available on Hackage yet because it depends on a small patch to beam
. Hopefully it will be released soon.
Benchmarks
For the purpose of these benchmarks, we are interested in compile-time, not run-time performance. I have not yet done any benchmarks of run-time performance. There might be some run-time benefits compared to other generics approaches because there is no conversion to and from a generic representation, but there are probably some performance penalties for normal functions because the record representation used internally in large-records
is much less transparent to the GHC optimizer. Measuring this is future work.
In terms of compile time performance, however, the results are pretty clear. Here is a graph of the size of the core AST of the module (the sum of the terms, types and coercions in the module) plotted against the number of record fields, after desugaring:
As you can see, for vanilla ghc
the size grows quadratically to nearly 450,000 nodes, but stays nice and flat for large-records
. The GHC simplifier reduces the size of the AST a bit in the case of vanilla ghc
and increases it a bit for large-records
, but large-records
is still the clear winner:
In terms of “real” performance measurements, here is a plot of ghc
memory usage as reported by the OS (in kB), against the number of record fields:
As you can see, ghc
climbs to roughly 500 MB for 100 fields, whereas large-records
stays near the 100 MB mark. There is a similar graph for compilation time (in terms of elapsed wall-clock time in seconds):
For vanilla ghc
compilation takes roughly 3 seconds for 100 fields, whereas large-records
is 3x times faster and stays under one second.
(These numbers have been normalized against compiling an empty module. The error bars here indicate standard error; each module has been compiled 100 times.)
Conclusions
The large-records
library generates code that is strictly O(n)
in size in the number of record fields. It does so at the cost of generating basically untyped core
. This does not matter for code using large-records
, which is still presented with a perfectly safe, typed interface. It might however reduce the applicability of the techniques employed by large-records
more generally within ghc
: the History of Haskell paper (Section 9.1) emphasizes the importance of having a typed internal ghc
language to catch and help debug compiler bugs.
Perhaps aiming for linear cost however is not necessary. We saw in the section on GHC generics that if we are careful how we set things up, we can generate code that is O(n log n)
in size. Similar techniques can be applied in libraries such as generics-sop
as well, provided that we are careful at every step; for example, the use of type-level lists should be avoided and we should be using type-level (balanced) trees instead.
In order to tackle the problem at a more fundamental level, we might need to be able to introduce and control sharing at the type level; Richard Eisenberg opened ghc ticket #20264 after my HIW talk on large-records
with some thoughts on the topic. As a completely alternative approach, the work on Scrap your type applications removes type arguments altogether. This may well address many of the concerns in this blog post, but it’s such a radically different internal language that it’s hard to foresee quite all the consequences.
Addressing the overabundance with types is however only one aspect of the problem, albeit an important one. Specifically, it does not prevent record pattern matches and record updates (either on term level records or on type class dictionaries) in core
always being linear in the size of the record. Perhaps an extension to core
could alleviate this problem, though changes to core
affect to the entire compiler backend and so are rightly treated with extreme prejudice.
When such problems are addressed in ghc
, a library like large-records
may no longer be required. Until such time, however, the large-records
library can be used to significantly reduce compile time and memory requirements for code bases that contain many large records.
Compilation time and memory usage normalised to an empty module.↩︎
We will not discuss it in this blog post, but the library currently has support for records where all fields are strict and records where all fields are lazy. It would not be too difficult to extend it to records where some fields are lazy and some are strict. It does not currently support
UNPACK
pragmas; adding support for that would be more difficult, we would need to add an unboxed vector for each type of unpacked field.↩︎This is to avoid a segfault with
ghc
prior to version 9.0. See GHC ticket #16893.↩︎The actual code generated uses an indirection through another function to deal with strictness: if the record has strict fields, then those fields are forced in the
to
translation.↩︎