A co-worker of mine had a random thought on IRC:

Is data type “unrolling” a valid performance optimization? Used anywhere?

Something like:

`data List a = Nil | Cons a {-# Unroll 10 #-} (List a)`

I thought, that sounds like the `vec`

package I wrote. One module there uses the GADT definition

```
data Vec (n :: Nat) a where
VNil :: Vec 'Z a
(:::) :: a -> Vec n a -> Vec ('S n) a
```

and a clever way to define functions so GHC unrolls them. Another module uses the data family

```
data family Vec (n :: Nat) a
data instance Vec 'Z a = VNil
data instance Vec ('S n) a = a ::: !(Vec n a)
```

which in some cases optimizes better than the GADT version. However in neither version we can `{-# UNPACK #-}`

the tail. GHC simply says

`Ignoring unusable UNPACK pragma on the second argument of ‘:::’`

It’s not “obvious” to the compiler that tail is a *single constructor data type*: technically it’s not even a single constructor in the GADT; and it’s not clear in data family case either, because of the degenerate `Vec ('S Any)`

case. But a single contructor data type is a requirement for UNPACK to work.

There’s however another trick we can use: **Backpack**.

Backpack is a system for retrofitting Haskell with an applicative, mix-in module system. In short, we are able to write more efficient *monomorphic* code. For example, the description of `unpacked-containers`

says:

This backpack mixin package supplies unpacked sets and maps exploiting backpack’s ability to unpack through signatures.

Unpacking is exactly what we want to do.

### Backpack

Backpack is an underdocumented feature of GHC and Cabal. There are Edward Yang’s blog posts, his PhD thesis (which I only scanned through), and in addition to what’s in the source code, that’s all. We won’t let that stop us from experimenting!

The idea of Backpack is that you can write modules which are parametrised by other modules.

Short glossary:

**Module**is what you think it is.**Signature**is a “type signature of module.”**Unit**is a collection of modules and signatures. Think`library`

.- Units can be
**open**, contain “holes” (specified by signatures) to be filled by other modules. See`OpenUnitId`

in`Cabal`

docs. **Mixin**is a unit with “inputs” and “outputs” possibly renamed.

One advice I can give: give types and signatures different names (i.e. avoid `signature Str where data Str`

), until you are comfortable with Backpack concepts. That’s similar to giving your types and constructors different names (i.e. avoid `data Unit = Unit`

). Rather say:

```
Sig where
signature data S
module Impl where
data S = MkS
```

Let’s write the Backpack version of `Vec`

next …

### Nil

List-like structure have two cases: *nil* and *cons*. We’ll start with a simple one: nil. We create an ordinary Haskell-module in an internal library (unit) of the package:

```
library v0
exposed-modules: V0
```

And the Haskell source is below:

```
module V0 where
data Vec a = Nil
deriving (Eq, Ord, Show, Functor, Foldable, Traversable)
instance Applicative Vec where
pure _ = Nil
<*> _ = Nil _
```

`Nil`

is essentially a new name for `Proxy`

, if you are familiar with that.

### Cons

The cons-case is a recursive case. The `cons`

unit requires a module, and exposes a module. In other words it’s a mixin.

I suggestively named the signature of the required input module `In`

, and exported output module `Out`

.

```
library cons
signatures: In
exposed-modules: Out
```

Signature `In`

is like a module without any implementations or definitions. The module `Out`

is not different from an ordinary Haskell module. We `import In`

and can use whatever is defined by its signature. To be used for vectors, we are careful to implement `Out`

in such a way that it itself satisfies the `In`

signature again, but that fact is not checked until the later instantiation.

#### In.hsig

The `In`

signature specifies an API that we expect the tail of the vector, and therefore the whole vector to satisfy. In this example it as simple as specifying:

- the type of vectors,
`Vec`

, - a list of instances that
`Vec`

should satisfy.

```
In where
signature
data Vec a
instance Eq a => Eq (Vec a)
instance Ord a => Ord (Vec a)
instance Show a => Show (Vec a)
instance Functor Vec
instance Foldable Vec
instance Traversable Vec
instance Applicative Vec
```

#### Out.hs

The output of the mixin is implementation of *cons* on vectors.

Note: at this stage nothing checks that `Out`

implements `In`

signature. The compatibility checks are done at instantiation phase. This reminds me of C++ templates, which are properly checked only when instantiated too.

Also note that we use `UNPACK`

on the tail field `!(In.Vec a)`

. This actually works, as we’ll see later.

```
module Out where
import qualified In
data Vec a = !a ::: {-# UNPACK #-} !(In.Vec a)
deriving (Eq, Ord, Functor, Foldable, Traversable)
infixr 5 :::
instance Applicative Vec where
pure x = x ::: pure x
::: g <*> x ::: y = f x ::: (g <*> y) f
```

We also write a `Show`

instance manually for a bit prettier output:

```
instance Show a => Show (Vec a) where
showsPrec d (x ::: xs) = showParen (d > 5)
$ showsPrec 6 x
. showString " ::: "
. showsPrec 5 xs
```

### Non-empty vectors: v1, v2, v3

Now it’s time to instantiate the `cons`

mixin. We already have a `v0`

vector. We can instantiate `cons`

with `V0`

to get `V1`

:

```
library v1
mixins: cons (Out as V1) requires (In as V0)
reexported-modules: V1
build-depends: cons, v0
```

The `v1`

library depends on `v0`

(one smaller vector), and `cons`

. The `mixins`

field says:

- take
`cons`

- rename its exported module from
`Out`

to`V1`

- and instantiate the required signature
`In`

to`V0`

.

Then, the `Cabal`

*Backpack* magic will connect the dots, as `v0`

exposes `V0`

.

And finally, we make `V1`

available by *re-exporting* it, using `reexported-modules`

.

Note: for larger vectors, we require no more Haskell code to be written; we are only connecting existing units.

While larger vectors follow the same pattern, we need to pre-instantiate them manually: they **cannot be created at the run-time**.

```
library v2
mixins: cons (Out as V2) requires (In as V1)
reexported-modules: V2
build-depends: cons, v1
library v3
mixins: cons (Out as V3) requires (In as V2)
reexported-modules: V3
build-depends: cons, v2
```

### Example: Some calculations

Before an actual example, we can define a library which will re-export all the modules we defined:

```
library
build-depends: v0, v1, v2, v3
reexported-modules: V0, V1, V2, V3
```

Finally we can write a program that will use all these vectors:

```
executable main
main-is: Main.hs
build-depends:
, backpack-vector
, base
```

#### Main.hs

First the necessary imports:

```
module Main (main) where
import qualified V0
import qualified V1
import qualified V2
import qualified V3
import Control.Applicative (liftA2)
import Data.List (mapAccumL)
```

Next we define a few values. We use `pure`

and `mapAccumL`

. Using actual constructors would be even less pretty, as we would have to use qualified names.

```
v0 :: V0.Vec Double
= pure 0.0
v0
v1 :: V1.Vec Double
= pure 1.0
v1
v2 :: V2.Vec Double
= pure 2.0
v2
-- 0, 1, 2
v3 :: V3.Vec Double
= snd $ mapAccumL (\n _ -> (succ n, n)) 0 (pure ()) v3
```

```
main :: IO ()
= do
main print v0
print v1
print v2
print $ liftA2 (*) v3 v3
```

Running this will print

```
Nil
1.0 ::: Nil
2.0 ::: 2.0 ::: Nil
0.0 ::: 1.0 ::: 4.0 ::: Nil
```

If we dump the Core of this module, we’ll see bindings like

```
-- RHS size: {terms: 4, types: 0, coercions: 0, joins: 0/0}
main11
main11 = $w$sshowSignedFloat $fShowDouble2 $fShowVec3 0.0##
-- RHS size: {terms: 4, types: 0, coercions: 0, joins: 0/0}
main10
main10 = $w$sshowSignedFloat $fShowDouble2 $fShowVec3 1.0##
-- RHS size: {terms: 4, types: 0, coercions: 0, joins: 0/0}
main9
main9 = $w$sshowSignedFloat $fShowDouble2 $fShowVec3 4.0##
```

A strong hint that computations are unrolled, and perfomed at compile-time.

### Excursion: Prettier syntax

In example above we had to jump through a hoop to define an interesting (= non-uniform value) `V3.Vec`

. It’s not nice.

One solution is to use pattern synonyms. Pattern synonyms cannot be defined in a type-class, but the helper functions could. Here’s a type-class:

```
class ConsPattern v t | v -> t, t -> v where
match :: v a -> (a, t a)
build :: a -> t a -> v a
```

Then we can add an instance of this class to each `cons`

case. Note, we shouldn’t require the instance in the signature `In`

(think why!)

```
instance ConsPattern Vec In.Vec where
= (:::)
build ::: xs) = (x, xs) match (x
```

Given `ConsPattern`

type-class we can define an actual bidirectional pattern synonym.

```
pattern (:::) :: ConsPattern v t => a -> t a -> v a
pattern x ::: xs <- (match -> (x, xs))
where x ::: xs = build x xs
```

With the pattern synonym we have recovered a nice syntax for constructing vectors:

```
v3a :: V3.Vec Double
= 1 ::: 2 ::: 3 ::: Nil v3a
```

But why stop here? Using source plugins, we can overload the list syntax! This is similar to what the `OverloadedLists`

extension does but using a different desugaring. This approach is implemented in my `overloaded`

library.

By adding few more instances

```
-- in V0.hs
instance Overloaded.Lists.Nil (Vec a) where
= Nil
nil
-- in Out.hs of cons
instance Overloaded.Lists.Cons a (In.Vec a) (Vec a) where
= (:::) cons
```

and enabling the source plugin in `Main`

module, we can write a vector as simply as

```
v3b :: V3.Vec Double
= [5, 6, 7] v3b
```

This small syntax digression is here to show that modern GHC has ways to improve the syntax of certain constructs considerably. Pattern synonyms are great and worth using where appropriate, but I’d not recommend using `overloaded`

library outside of blog posts.

### Memory representation

GHC starting with version 8.6 bundles a library called `ghc-heap`

which allows to look into heap representation of the values. `ghc-heap`

documentation for a version bundled with GHC-8.8.1 can be found next to GHC documentation. We are particularly interested in `GHC.Exts.Heap`

module.

Using the module functionality we can write a small function to compute a size of fully evaluated values. From all of possible `Closure`

constructors, we are only interested in `ConstrClosure`

– data constructors. Data constructor closures can have data directly there (non-pointer arguments, `dargs`

), or data referred to by pointers (pointer-arguments, `pargs`

). Also the info table takes one `Word`

.

```
calculateSize :: HasHeapRep a => a -> IO (Maybe Int)
= do
calculateSize val <- getClosureData val
c
runMaybeT (go c)where
go :: Closure -> MaybeT IO Int
ConstrClosure _ pargs dargs _ _ _) = do
go (<- traverse (liftIO . getBoxedClosureData >=> go) pargs
ds return
$ 1 -- info table
+ length dargs -- data words
+ length pargs -- pointers
+ sum ds -- size of pointed data
= do
go x $ print x -- something unexpected
liftIO MaybeT (return Nothing)
```

If we ask for sizes of `v0`

, `v1`

, `v2`

and `v3`

we get following results:

value | size in words |
---|---|

`v0 :: V0 Double` |
2 |

`v1 :: V1 Double` |
4 |

`v2 :: V2 Double` |
7 |

`v3 :: V3 Double` |
10 |

To understand these numbers, let’s draw a memory representation of `V3.Vec`

:

```
V3.Vec a
+-------+-------+-------+-------+
| | | | |
| hdr | * | * | * |
| | | | | | | |
+-------+---|---+---|---+---|---+
| | |
v v v
a a a
```

`V3.Vec Double`

has

- 1 word for info table
- 3 pointers (= words) to
- 3 boxed doubles, 2 words each: info table and an actual number.

In total 10 words. Similarly, `V2.Vec Double`

takes 1 + 2 * 3 = 7 words, and `V1.Vec Double`

1 + 1 * 3 = 4 words. `V0.Vec Double`

is 2 words, as all GHC heap values have to be at least 2 words in size (header + 1 word of payload). It’s not a problem here, as `V0.Vec a`

should be a static value, i.e. pre-allocated and shared.

I wish there were a library which would draw that kind of diagrams, using `ghc-heap`

for inspection.

We should also ask how many words the GADT or data family representations take. The answer is 17 for both. For each element, there are three words as with Backpack version, but also each cell has own info table and a pointer to the next one: it’s a linked list. Thus 5 words per element, and additional 2 words for the nil. The nil should be shared value, so effectively the size of the `V3.Vec Double`

is “only” 15 words.

```
V3.Vec a V2.Vec a
+-------+-------+-------+ +-------+-------+-------+
| | | | | | | |
| hdr | * | *------> | hdr | * | *------> V1.Vec a...
| | | | | | | | | |
+-------+---|---+-------+ +-------+---|---+-------+
| |
v v
a a
```

More memory, more pointers to chase, those must be slower. Next some benchmarks.

### Benchmarks

We might have made strong claims that Backpack’d structure is efficient. Let’s compare it with other similar data-structures in a toy benchmark.

Given a hundred of vectors of size 3, sum them element-wise and multiply the elements at the end. For `V3`

defined above, that would be trivially:

```
benchV3 :: [V3.Vec Double] -> Double
= foldl' (*) 1.0 . foldl' (liftA2 (+)) (pure 0.0) benchV3
```

where the right `foldl'`

is over the list, and the left `foldl'`

is over the `Vec`

structure.

We can compare that with using bare lists, and maintaining the length variant. The code is still quite the same. We only change `liftA2`

to `zipWith`

.

```
benchList :: [[Double]] -> Double
= foldl' (*) 1.0 . foldl' (zipWith (+)) [0, 0, 0] benchList
```

Another comparison is with `Vec`

from the `vec`

library. There are two variants, GADT- and data-family-based. We need to cheat a bit and use `foldr`

while folding over vector itself. There is no `foldl'`

, as `foldr`

should be as fast when the size is known.

```
benchGADT :: [GADT Double] -> Double
= GADT.foldr (*) 1.0 . foldl' (GADT.zipWith (+)) (pure 0.0)
benchGADT
benchDF :: [DF Double] -> Double
= DF.foldr (*) 1.0 . foldl' (DF.zipWith (+)) (pure 0.0) benchDF
```

And the final contestant is manually written vector, the definition is straight-forward.

```
data Manual a = Manual !a !a !a
deriving (Functor, Foldable, Traversable)
instance Applicative Manual where
pure x = Manual x x x
Manual x1 x2 x3) (Manual y1 y2 y3) =
liftA2 f (Manual (f x1 y1) (f x2 y2) (f x3 y3)
```

```
benchManual :: [Manual Double] -> Double
= foldl' (*) 1.0 . foldl' (liftA2 (+)) (pure 0.0) benchManual
```

Next the results.

#### Results

The results of criterion benchmarks are below, we can observe the following:

**Backpack**run takes 500 nanoseconds, this is the baseline.**List**version is about 10 times slower. The benchmark is chosen so GHC cannot fuse lists away, we see their costs.**GADT**version is a little faster than List version, but still 8x slower than backpack version.**Data-family**version is only 35% slower than Backpack, and 6 times faster than GADT version. I suspect that index-driven code-generation just works better for data-families. Note: that backpack and manual version have strict fields, where GADT and data-family version have lazy ones. Maybe the data-family could be made a little faster.- And finally, the
**manual**version … exactly the same performance as with Backpack.

```
benchmarking Backpack
time 501.2 ns (500.9 ns .. 501.5 ns)
1.000 R² (1.000 R² .. 1.000 R²)
mean 501.3 ns (501.0 ns .. 501.9 ns)
std dev 1.257 ns (675.9 ps .. 2.228 ns)
benchmarking List
time 4.714 μs (4.710 μs .. 4.719 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 4.724 μs (4.716 μs .. 4.746 μs)
std dev 41.35 ns (16.52 ns .. 80.19 ns)
benchmarking GADT
time 4.023 μs (4.021 μs .. 4.025 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 4.030 μs (4.027 μs .. 4.037 μs)
std dev 15.62 ns (6.974 ns .. 29.17 ns)
benchmarking DF
time 677.9 ns (677.3 ns .. 678.7 ns)
1.000 R² (1.000 R² .. 1.000 R²)
mean 678.7 ns (677.7 ns .. 680.5 ns)
std dev 4.253 ns (2.624 ns .. 7.070 ns)
benchmarking Manual
time 500.8 ns (498.5 ns .. 505.0 ns)
1.000 R² (0.999 R² .. 1.000 R²)
mean 500.7 ns (498.9 ns .. 506.0 ns)
std dev 9.150 ns (1.279 ns .. 18.06 ns)
```

### Conclusion

We have now seen how one can use Backpack to implement `Vec`

. The approach is different from GADTs and data family way, as the tail can be `UNPACK`

ed. One other variant, is to backpack the element, like `Key`

is backpacked in `unpacked-containers`

.

There are non-Backpack implementations of `V0`

… `V4`

in `linear`

package. If you’d ever need `V5`

, Backpack approach would give it with less copy & paste. Would you ever need `V6`

?

One downside of Backpack approach is that we have to explicitly instantiate structures. With the GADT or data-family approach, whatever the run-time size argument is, the program will run (maybe very slowly, but still …). With the Backpack approach, there’s some maximum limit determined at compile time, with no way to surpass it at run-time.

The accompanying code for this post is on GitHub at phadej/vec-backpack.