In this blog post we consider the problem of defining `Binary`

instances for
GADTs such as

```
data Val :: * -> * where
VI :: Int -> Val Int
VD :: Double -> Val Double
```

If you want to play along the full source code for the examples in this blog post can be found on github.

## Failed attempt

The “obvious” way in which you might attempt to serialize and deserialize `Val`

could look something like

```
instance Binary (Val a) where
VI i) = putWord8 0 >> put i
put (VD d) = putWord8 1 >> put d
put (
= do
get <- getWord8
tag case tag of
0 -> VI <$> get -- Couldn't match type ‘a’ with ‘Int’
1 -> VD <$> get -- Couldn't match type ‘a’ with ‘Double’
-> error "invalid tag" _
```

However, this does not work. The definition of `put`

is type correct (but
dubious), but the definition of `get`

is not type correct. And actually this
makes sense: we are claiming that we can define `Binary (Val a)`

*for any*
`a`

; but if the tag is 0, then that `a`

can only be `Int`

, and if the tag is
1, then that `a`

can only be `Double`

.

One option is to instead give a `Binary (Some Val)`

instance with `Some`

defined
as

```
data Some :: (* -> *) -> * where
Exists :: forall f x. f x -> Some f
```

That is often independently useful, but is a different goal: in such a case
we are *discovering* type information when we deserialize. That’s not what
we’re trying to achieve in this blog post; we want to write a `Binary`

instance
that can be used when we know from the context what the type must be.

## Working, but inconvenient

The next thing we might try is to introduce `Binary`

instances for the specific
instantiations of that `a`

type variable:

```
instance Binary (Val Int) where
VI i) = put i
put (= VI <$> get
get
instance Binary (Val Double) where
VD d) = put d
put (= VD <$> get get
```

Note that there is no need to worry about any tags in the encoded bytestring; we always know the type. Although this works, it’s not very convenient; for example, we cannot define

```
encodeVal :: Val a -> ByteString
= encode encodeVal
```

because we don’t have a polymorphic instance `Binary (Val a)`

. Instead we’d have
to define

```
encodeVal :: Binary (Val a) => Val a -> ByteString
= encode encodeVal
```

but that’s annoying: we *know* that that `a`

can only be `Int`

or `Double`

,
and we have `Binary`

instances for both of those cases. Can’t we do better?

## Introducing RTTI

Although *we* know that `a`

can only be `Int`

or `Double`

, we cannot take
advantage of this information in the code. Haskell types are erased at compile
time, and hence we cannot do any kind of pattern matching on them. The key to
solving this problem then is to introduce some explicit runtime type information
(RTTI).

We start by introducing a data family associating with each indexed datatype a corresponding datatype with RTTI:

`data family RTTI (f :: k -> *) :: (k -> *)`

For the example `Val`

this runtime type information tells us whether we’re
dealing with `Int`

or `Double`

:

```
data instance RTTI Val a where
RttiValInt :: RTTI Val Int
RttiValDouble :: RTTI Val Double
```

For serialization we don’t need to make use of this:

```
putVal :: Val a -> Put
VI i) = put i
putVal (VD d) = put d putVal (
```

but for *deserialization* we can now pattern match on the RTTI to figure out
what kind of value we’re expecting:

```
getVal :: RTTI Val a -> Get (Val a)
RttiValInt = VI <$> get
getVal RttiValDouble = VD <$> get getVal
```

We’re now almost done: the last thing we need to express is that if we know
*at the type level* that we have some RTTI available, *then* we can serialize.
For this purpose we introduce a type class that returns the RTTI:

```
class HasRTTI f a where
rtti :: RTTI f a
```

which we can use as follows:

```
instance HasRTTI Val a => Binary (Val a) where
= putVal
put = getVal rtti get
```

This states precisely what we described in words above: as long as we have some
RTTI available, we can serialize and deserialize any kind of `Val`

value.

The last piece of the puzzle is to define some instances for `HasRTTI`

; right
now, if we try to do `encode (VI 1234)`

ghc will complain

`No instance for (HasRTTI Val Int)`

Fortunately, these instances are easily defined:

```
instance HasRTTI Val Int where rtti = RttiValInt
instance HasRTTI Val Double where rtti = RttiValDouble
```

and the good news is that this means that whenever we construct specific `Val`

s
we never have to construct the RTTI by hand; ghc’s type class resolution takes
care of it for us.

## Taking stock

Instead of writing

```
encodeVal :: Binary (Val a) => Val a -> ByteString
= encode encodeVal
```

we can now write

```
encodeVal :: HasRTTI Val a => Val a -> ByteString
= encode encodeVal
```

While it may seem we haven’t gained very much, `HasRTTI`

is a much more
fine-grained constraint than `Binary`

; from `HasRTTI`

we can derive `Binary`

constraints, like we have done here, but also other constraints that rely on
RTTI. So while we do still have to carry these RTTI constraints around, those
are – ideally – the *only* constraints that we still need to carry around.
Moreover, as we shall see a little bit further down, RTTI also scales nicely to
composite type-level structures such as type-level lists.

## Another example: heterogeneous lists

As a second—slightly more involved—example, lets consider heterogeneous
lists or *n*-ary products:

```
data NP (f :: k -> *) (xs :: [k]) where
Nil :: NP f '[]
(:*) :: f x -> NP f xs -> NP f (x ': xs)
```

An example of such a heterogeneous list is

`VI 12 :* VD 34.56 :* Nil :: NP Val '[Int, Double]`

The type here says that this is a list of two `Val`

s, the first `Val`

being
indexed by `Int`

and the second `Val`

being indexed by `Double`

. If that makes
zero sense to you, you may wish to study Well-Typed’s
Applying Type-Level and Generic Programming in Haskell
lecture notes.

As was the case for `Val`

, we always *statically* know how long such a list is,
so there should be no need to include any kind of length information in the
encoded bytestring. Again, for serialization we don’t need to do anything very
special:

```
putNP :: All Binary f xs => NP f xs -> Put
Nil = return ()
putNP :* xs) = put x >> putNP xs putNP (x
```

The only minor complication here is that we need `Binary`

instances for all the
elements of the list; we guarantee this using the `All`

type family (which is a
minor generalization of the `All`

type family explained in the same set of
lecture notes linked above):

```
type family All p f xs :: Constraint where
All p f '[] = ()
All p f (x ': xs) = (p (f x), All p f xs)
```

Deserialization however needs to make use of RTTI again. This means we need to define what we mean by RTTI for these heterogenous lists:

```
data instance RTTI (NP f) xs where
RttiNpNil :: RTTI (NP f) '[]
RttiNpCons :: (HasRTTI f x, HasRTTI (NP f) xs)
=> RTTI (NP f) (x ': xs)
instance HasRTTI (NP f) '[] where
= RttiNpNil
rtti instance (HasRTTI f x, HasRTTI (NP f) xs)
=> HasRTTI (NP f) (x ': xs) where
= RttiNpCons rtti
```

In this case the RTTI gives us the shape of the list. We can take advantage of this during deserialization:

```
getNP :: All Binary f xs => RTTI (NP f) xs -> Get (NP f xs)
RttiNpNil = return Nil
getNP RttiNpCons = (:*) <$> get <*> getNP rtti getNP
```

allowing us to give the `Binary`

instance as follows:

```
instance (All Binary f xs, HasRTTI (NP f) xs)
=> Binary (NP f xs) where
= putNP
put = getNP rtti get
```

## Serializing lists of `Val`

s

If we use this `Binary`

instance to serialize a list of `Val`

s, we would end
up with a type such as

```
decodeVals :: (HasRTTI (NP Val) xs, All Binary Val xs)
=> ByteString -> NP Val xs
= decode decodeVals
```

This `All Binary Val xs`

constraint however is unfortunate, because we *know*
that all `Val`

s can be deserialized! Fortunately, we can do better. The RTTI for
the `(:*)`

case (`RttiNpCons`

) included RTTI for the *elements* of the list.
We made no use of that above, but we *can* make use of that when giving a
specialized instance for lists of `Val`

s:

```
putNpVal :: NP Val xs -> Put
Nil = return ()
putNpVal :* xs) = putVal x >> putNpVal xs
putNpVal (x
getNpVal :: RTTI (NP Val) xs -> Get (NP Val xs)
RttiNpNil = return Nil
getNpVal RttiNpCons = (:*) <$> get <*> getNpVal rtti
getNpVal
instance {-# OVERLAPPING #-} HasRTTI (NP Val) xs
=> Binary (NP Val xs) where
= putNpVal
put = getNpVal rtti get
```

This allows us to define

```
decodeVals :: HasRTTI (NP Val) xs => ByteString -> NP Val xs
= decode decodeVals
```

Note that this use of overlapping type classes instances is perfectly safe: the overlapping instance is fully compatible with the overlapped instance, so it doesn’t make a difference which one gets picked. The overlapped instance just allows us to be more economical with our constraints.

Here we can appreciate the choice of `RTTI`

being a data family indexed by `f`

;
indeed the constraint `HasRTTI f x`

in `RttiNpCons`

is generic as possible.
Concretely, `decodeVals`

required *only* a single `HasRTTI`

constraint, as
promised above. It is this compositionality, along with the fact that we can
derive many type classes from just having RTTI around, that gives this approach
its strength.

## Advanced example

To show how all this might work in a more advanced example, consider the following EDSL describing simple functions:

```
data Fn :: (*,*) -> * where
Exp :: Fn '(Double, Double)
Sqrt :: Fn '(Double, Double)
Mod :: Int -> Fn '(Int, Int)
Round :: Fn '(Double, Int)
Comp :: (HasRTTI Fn '(b,c), HasRTTI Fn '(a,b))
=> Fn '(b,c) -> Fn '(a,b) -> Fn '(a,c)
```

If you are new to EDSLs (embedded languages) in Haskell, you way wish to watch
the Well-Typed talk
Haskell for embedded domain-specific languages.
However, hopefully the intent behind `Fn`

is not too difficult to see: we have a
datatype that describes functions: exponentiation, square root, integer modules,
rounding, and function composition. The two type indices of `Fn`

describe the
function input and output types. A simple interpreter for `Fn`

would be

```
eval :: Fn '(a,b) -> a -> b
Exp = exp
eval Sqrt = sqrt
eval Mod m) = (`mod` m)
eval (Round = round
eval `Comp` f) = eval g . eval f eval (g
```

In the remainder of this blog post we will consider how we can define a `Binary`

instance for `Fn`

. Compared to the previous examples, `Fn`

poses two new challenges:

- The type index does not uniquely determine which constructor is used; if the
type is
`(Double, Double)`

then it could be`Exp`

,`Sqrt`

or indeed the composition of some functions. - Trickier still,
`Comp`

actually introduces an existential type: the type “in the middle”`b`

. This means that when we serialize and deserialize we*do*need to include*some*type information in the encoded bytestring.

### RTTI for `Fn`

To start with, let’s define the RTTI for `Fn`

:

```
data instance RTTI Fn ab where
RttiFnDD :: RTTI Fn '(Double, Double)
RttiFnII :: RTTI Fn '(Int, Int)
RttiFnDI :: RTTI Fn '(Double, Int)
instance HasRTTI Fn '(Double, Double) where rtti = RttiFnDD
instance HasRTTI Fn '(Int, Int) where rtti = RttiFnII
instance HasRTTI Fn '(Double, Int) where rtti = RttiFnDI
```

For our DSL of functions, we only have functions from `Double`

to `Double`

,
from `Int`

to `Int`

, and from `Double`

to `Int`

(and this is closed under
composition).

### Serializing type information

The next question is: when we serialize a `Comp`

constructor, how much
information do we need to serialize about that existential type? To bring this
into focus, let’s consider the type information we have when we are dealing
with composition:

```
data RttiComp :: (*,*) -> * where
RttiComp :: RTTI Fn '(b,c) -> RTTI Fn '(a,b) -> RttiComp '(a,c)
```

Whenever we are deserializing a `Fn`

, if that `Fn`

happens to be the composition
of two other functions we *know* RTTI about the composition; but since the “type
in the middle” is unknown, we have no information about that at all. So what do
we need to store? Let’s start with serialization:

`putRttiComp :: RTTI Fn '(a,c) -> RttiComp '(a,c) -> Put`

The first argument here is the RTTI about the composition as a whole, and sets the context. We can look at that context to determine what we need to output:

```
putRttiComp :: RTTI Fn '(a,c) -> RttiComp '(a,c) -> Put
RttiComp rbc rab) = go rac rbc rab
putRttiComp rac (where
go :: RTTI Fn '(a,c) -> RTTI Fn '(b,c) -> RTTI Fn '(a,b) -> Put
RttiFnDD RttiFnDD RttiFnDD = return ()
go
RttiFnII RttiFnII RttiFnII = return ()
go RttiFnII RttiFnDI rAB = case rAB of {}
go
RttiFnDI RttiFnII RttiFnDI = putWord8 0
go RttiFnDI RttiFnDI RttiFnDD = putWord8 1 go
```

Let’s take a look at what’s going on here. When we know from the context that
the composition has type `Double -> Double`

, then we *know* that the types of
both functions in the composition must also be `Double -> Double`

, and hence we
don’t need to output *any* type information. The same goes when the composition
has type `Int -> Int`

, although we need to work a bit harder to convince `ghc`

in this case. However, when the composition has type `Double -> Int`

then the
first function might be `Double -> Int`

and the second might be `Int -> Int`

, or
the first function might be `Double -> Double`

and the second might be `Double -> Int`

. Thus, we need to distinguish between these two cases (in principle a
single bit would suffice).

Having gone through this thought process, deserialization is now easy: remember
that we *know* the context (the RTTI for the composition):

```
getRttiComp :: RTTI Fn '(a,c) -> Get (RttiComp '(a,c))
RttiFnDD = return $ RttiComp RttiFnDD RttiFnDD
getRttiComp RttiFnII = return $ RttiComp RttiFnII RttiFnII
getRttiComp RttiFnDI = do
getRttiComp <- getWord8
tag case tag of
0 -> return $ RttiComp RttiFnII RttiFnDI
1 -> return $ RttiComp RttiFnDI RttiFnDD
-> fail "invalid tag" _
```

`Binary`

instance for `Fn`

The hard work is now mostly done. Although it is probably not essential, during
serialization we can clarify the code by looking at the RTTI context to know
which possibilities we need to consider at each type index. For example, if we
are serializing a function of type `Double -> Double`

, there are three
possibilities (`Exp`

, `Sqrt`

, `Comp`

). We did something similar in the previous
section.

```
putAct :: RTTI Fn a -> Fn a -> Put
= go
putAct where
go :: RTTI Fn a -> Fn a -> Put
@RttiFnDD fn =
go rcase fn of
Exp -> putWord8 0
Sqrt -> putWord8 1
Comp g f -> putWord8 255 >> goComp r (rtti, g) (rtti, f)
@RttiFnII fn =
go rcase fn of
Mod m -> putWord8 0 >> put m
Comp g f -> putWord8 255 >> goComp r (rtti, g) (rtti, f)
@RttiFnDI fn =
go rcase fn of
Round -> putWord8 0
Comp g f -> putWord8 255 >> goComp r (rtti, g) (rtti, f)
goComp :: RTTI Fn '(a,c)
-> (RTTI Fn '(b,c), Fn '(b,c))
-> (RTTI Fn '(a,b), Fn '(a,b))
-> Put
= do
goComp rAC (rBC, g) (rAB, f) RttiComp rBC rAB)
putRttiComp rAC (
go rBC g go rAB f
```

Deserialization proceeds along very similar lines; the only difficulty is that
when we *deserialize* RTTI using `getRttiComp`

we somehow need to reflect that
to the type level; for this purpose we can provide a function

`reflectRTTI :: RTTI f a -> (HasRTTI f a => b) -> b`

It’s definition is beyond the scope of this blog post; refer to the source code on github instead. With this function in hand however deserialization is no longer difficult:

```
getAct :: RTTI Fn a -> Get (Fn a)
= go
getAct where
go :: RTTI Fn a -> Get (Fn a)
@RttiFnDD = do
go r<- getWord8
tag case tag of
0 -> return Exp
1 -> return Sqrt
255 -> goComp r
-> error "invalid tag"
_ @RttiFnII = do
go r<- getWord8
tag case tag of
0 -> Mod <$> get
255 -> goComp r
-> error "invalid tag"
_ @RttiFnDI = do
go r<- getWord8
tag case tag of
0 -> return Round
255 -> goComp r
-> error "invalid tag"
_
goComp :: RTTI Fn '(a,c) -> Get (Fn '(a,c))
= do
goComp rAC RttiComp rBC rAB <- getRttiComp rAC
$ reflectRTTI rAB $
reflectRTTI rBC Comp <$> go rBC <*> go rAB
```

We can define the corresponding `Binary`

instance for `Fn`

simply using

```
instance HasRTTI Fn a => Binary (Fn a) where
= putAct rtti
put = getAct rtti get
```

If desired, a specialized instance for `HList Fn`

can be defined that relies
only on RTTI, just like we did for `Val`

(left as exercise for the reader).

## Conclusion

Giving type class instances for GADTs, in particular for type classes that
*produce* values of these GADTs (deserialization, translation from Java values,
etc.) can be tricky. If not kept in check, this can result in a code base with
a lot of unnecessarily complicated function signatures or frequent use of
explicit computation of evidence of type class instances. By using run-time
type information we can avoid this, keeping the code clean and allowing
programmers to focus at the problems at hand rather than worry about
type classes instances.

#### PS: Singletons

RTTI looks a lot like singletons, and indeed things can be set up in such a way that singletons would do the job. The key here is to define a new kind for the type indices; for example, instead of

```
data Val :: * -> * where
VI :: Int -> Val Int
VD :: Double -> Val Double
```

we’d write something like

```
data U = Int | Double
data instance Sing (u :: U) where
SI :: Sing 'Int
SD :: Sing 'Double
data Val :: U -> * where
VI :: Int -> Val 'Int
VD :: Double -> Val 'Double
instance SingI u => Binary (Val u) where
VI i) = put i
put (VD d) = put d
put (
= case sing :: Sing u of
get SI -> VI <$> get
SD -> VD <$> get
```

In such a setup singletons can be used as RTTI. Which approach is preferable
depends on questions such as are singletons already in use in the project, how
much of their infrastructure can be reused, etc. A downside of using singletons
rather than a more direct encoding using RTTI as I’ve presented it in this blog
post is that using singletons probably means that some kind of type level
decoding needs to be introduced (in this example, a type family `U -> *`

); on
the other side, having specific kinds for specific purposes may also clarify the
code. Either way the main ideas are the same.