Thanks to the organisers for a wonderful event!

The videos of our sessions as well as the materials used in them are available online, so those who could not attend during the event itself can still do so by following the links below.

**Watch video on YouTube** — **Read lecture notes on Github** — **Repository with source code**

Datatype-Generic programming is a powerful tool that allows the implementation of functions that adapt themselves to a large class of datatypes and can be made available on new datatypes easily by means such as “deriving”.

In this workshop, we focus on the ideas at the core of two popular approaches to generic programming: `GHC.Generics`

and `generics-sop`

. Both can be difficult to understand at first. We build simpler versions of both approaches to illustrate some of the design decisions taken. This exploration will lead to a better understanding of the trade-offs and ultimately also make using these libraries easier.

This presentation involves various type-level programming concepts such as type families, data kinds, GADTs and higher-rank types. It’s *not* a requirement to be an expert in using these features, but I do not focus on explaining them in detail either; so having some basic familiarity with the syntax and semantics is helpful.

**Watch video on YouTube** — **View slides on Google Drive**

In this workshop, we look at Haskell from an information security point of view. How do security vulnerabilities such as SQL Injection (SQLi) or Cross-Site Scripting (XSS) work? How does a hacker exploit them? What can we, as Haskell programmers, do to prevent them, and how can Haskell help us with that? And what principles can we extract from this to develop more security-aware coding habits?

No knowledge of or experience with information security is required for this course, but participants are expected to have a working knowledge of practical Haskell. If you can write a simple web application with, e.g., Scotty, you should be fine.

Of course, there was more to ZuriHac than just the Advanced Track. If you haven’t yet, you might want to have a look at the YouTube playlist, which also contains all the keynotes as well as the lectures of the GHC track.

If you are interested in our courses or other services, check our Training page, Services page, or just send us an email.

]]>In particular we will turn the initialization of a fully static data structure into a compile time operation. This pattern works for many data structures but we will look at `IntSet`

in particular.

As an example, consider a function such as:

We have a set of known things here, represented by a list named `staticIds`

.

We use `Int`

as it makes the example easier. But these could be Strings or all kinds of things. In particular, I was inspired by GHC’s list of known builtin functions.

The advantage of the code as written above is that the list is statically known. As a result the list will be built into the final object code as static data, and accessing it will not require any allocation/computation.

You can check this by looking at the core dump (`-ddump-simpl`

). Don’t forget to enable optimizations or this might not work as expected. In the core there should be a number of definitions like the one below.

```
-- RHS size: {terms: 3, types: 1, coercions: 0, joins: 0/0}
isStaticId3
isStaticId3 = : isStaticId8 isStaticId4
```

Note that the above is Core syntax for `isStaticId3 = isStaticId8 : isStaticId4`

, i.e., this is just denoting a part of the list, and each element gets its own identifier. All these definitions will be compiled to static data, and will eventually be represented as just a number of words encoding the constructor and its fields.

We can confirm this by looking at the Cmm output where the corresponding fragment will look like this:

```
[section ""data" . isStaticId3_closure" {
isStaticId3_closure:
const :_con_info;
const isStaticId8_closure+1;
const isStaticId4_closure+2;
const 3;
}]
```

I won’t go into the details of how to read the `Cmm`

, but it shows us that the binding will end up in the data section. The constant `:_con_info;`

tells us that we are dealing with a Cons cell, and then we have the actual data stored in the cell.

What is important here is that this is *static* data. The GC won’t have to traverse it so having the data around does not affect GC performance. We also don’t need to compute it at runtime as it’s present in the object file in its fully evaluated form.

`IntSet`

What if we aggregate more data? If we blow up the list to a hundred, a thousand or more elements, it’s likely that performing a linear search will become a bottleneck for performance.

So we rewrite our function to use a set as follows:

```
isStaticIdSet :: Int -> Bool
isStaticIdSet x =
x `S.member` staticIds
where
staticIds = S.fromList [1,2,3,5,7 :: Int] :: IntSet
```

This looks perfectly fine on the surface. Instead of having `O(n)`

lookups we should get `O(log(n))`

lookups, right?

However, what happens at runtime? In order to query the set we have to first convert the list into a set. This is where disaster strikes. We are no longer querying static data, as the list argument has to be converted into a set. The `S.fromList`

function call will not be evaluated at compilation time.

In many cases, GHC may manage to at least share our created set `staticIds`

across calls. But this is fragile, and depending on the exact code in question, it might not. Then we can end up paying the cost of set construction for each call to `isStaticIdSet`

.

So while we reduced the lookup cost from `O(n)`

to `O(log(n))`

the total cost is now `O(n*min(n,W)+ log(n))`

, where `n*min(n,W)`

is the cost of constructing the set from a list. We could optimize this somewhat by making sure the list is sorted and has no duplicates. But we would still end up worse than with the list-based code we started out with.

It’s a shame that GHC can’t evaluate `S.fromList`

at compile time … or can it?

What we really want to do is to force GHC to fully evaluate our input data to an `IntSet`

. Then ensure the `IntSet`

is stored as static data just like it happens for the list in our initial example.

Template Haskell allows us to specify parts of the program to compute at compile time.

So we “simply” tell GHC to compute the set at compile time and are done.

Like so:

```
{-# NOINLINE isStaticIdSet #-}
isStaticIdSet :: Int -> Bool
isStaticIdSet x =
x `S.member` staticIds
where
staticIds = $$( liftTyped (S.fromList [1,2,3,5,7] :: IntSet))
```

This results in Core that is even simpler as in the `[Int]`

example above:

```
-- RHS size: {terms: 3, types: 0, coercions: 0, joins: 0/0}
isStaticIdSet1
isStaticIdSet1 = Tip 0# 174##
-- RHS size: {terms: 7, types: 3, coercions: 0, joins: 0/0}
isStaticIdSet
isStaticIdSet
= \ x_a5ar ->
case x_a5ar of { I# ww1_i5r2 -> $wmember ww1_i5r2 isStaticIdSet1 }
```

No longer will we allocate the set at compilation time; instead the whole set is encoded in `isStaticIdSet1`

. We only get a single constructor because IntSet can encode small sets using a single constructor.

From the outside in:

`$$( .. )`

is TH syntax for a *typed splice*.^{1} *Splicing* is the process of inserting generated syntax into our program. The splice construct takes an expression denoting a syntax tree, evaluates it and inserts the resulting syntax at the place where the splice occurs.

The next piece of magic is `liftTyped`

. It takes a regular *Haskell* expression, evaluates it at *compile time* to an abstract syntax tree that, when spliced, equals the evaluated value of the Haskell expression.

This leaves `S.fromList [1,2,3,5,7]`

which is regular set creation.

Putting these together, during compilation GHC will:

- Evaluate
`S.fromList [1,2,3,5,7]`

. - Turn the resulting set into an abstract syntax tree using
`liftTyped`

. - Splice that abstract syntax tree into our program using
`$$(..)`

, effectively inserting the fully evaluated set expression into our program.

The resulting code will be compiled like any other, in this case resulting in fully static data.

Now you might think this was too easy, and you are partially right. The main issue is that `liftTyped`

requires a instance of the `Lift`

typeclass.

But for the case of `IntSet`

, we can have GHC derive one for us. So all it costs us is slightly more boiler plate.

Here is a full working example for you to play around with:

```
-- First module
{-# LANGUAGE TemplateHaskell #-} -- Enable TH
{-# LANGUAGE StandaloneDeriving #-}
{-# LANGUAGE DeriveLift #-}
module TH_Lift where
import Language.Haskell.TH.Syntax
import Data.IntSet.Internal
deriving instance Lift (IntSet)
---------------------------------
-- Second module
{-# LANGUAGE TemplateHaskell #-}
module M (isStaticIdSet) where
import TH_Lift
import Data.IntSet as S
import Language.Haskell.TH
import Language.Haskell.TH.Syntax
type Id = Int
isStaticIdSet :: Int -> Bool
isStaticIdSet x =
x `S.member` staticSet
where
staticSet = $$(liftTyped (S.fromList [1,2,3,5,7] :: IntSet))
```

We translate `liftTyped (S.fromList [1,2,3,5,7] :: IntSet)`

into a TH expression at compile time. For this, GHC will call the (already compiled) lift method of the `Lift`

instance.

However if we define `isStaticIdSet`

and the `Lift`

instance in the same module, GHC can’t call `liftTyped`

as it’s not yet compiled by the time we need it.

In practice most packages have companions which already offer `Lift`

instances. For example, `th-lift-instances`

offers instances for the `containers`

package.^{2}

For many data types the result of `liftTyped`

will be an expression that can be compiled to static data as long as the contents are known.

This is in particular true for “simple” ADTs like the ones used by `IntSet`

or `Set`

.

However certain primitives like arrays can’t be allocated at compile time. This sadly means this trick won’t currently work for `Array`

s or `Vector`

s. There is a ticket about removing this restriction on arrays on GHC’s issue tracker.. So hopefully, we will be able to lift arrays at some point in the future.

Note furthermore that lifting won’t work for infinite data structures, as it usually requires its argument to be evaluated completely if we want it to result in static data.

We are using Typed Template Haskell here in order to advertise it a bit better. Typed Template Haskell ensures that we are building type-correct code. In an example, as simple as this, it hardly makes a difference, because even normal Template Haskell does type-check the

↩︎*generated*code. We could equally well have writtenUnfortunately, some of the instances defined in

`th-lift-instances`

up to version 0.1.16 are “wrong” for the purposes of this post. For example, the`IntSet`

instance is based on a call to`fromList`

, not statically building the internal representation. Make sure that you use`th-lift-instances`

version 0.1.17 or later.↩︎

If you want to ask questions during the sessions or otherwise discuss the workshops in more detail, you should sign up for ZuriHac and join its Discord server if you have not already. More details on registration, the program, and how the setup with YouTube and Discord works will be available via the ZuriHac site.

The Advanced Track sessions are as follows:

On Saturday, Andres Löh will lead a session on Datatype-Generic Programming in Haskell.

Datatype-Generic programming is a powerful tool that allows the implementation of functions that adapt themselves to a large class of datatypes and can be made available on new datatypes easily by means such as “deriving”.

In this workshop, we focus on the ideas at the core of two popular approaches to generic programming: `GHC.Generics`

and `generics-sop`

. Both can be difficult to understand at first. We will build simpler versions of both approaches to illustrate some of the design decisions taken. This exploration will lead to a better understanding of the trade-offs and ultimately also make using these libraries easier.

This presentation will involve various type-level programming concepts such as type families, data kinds, GADTs and higher-rank types. It’s *not* a requirement to be an expert in using these features, but I will not focus on explaining them in detail either; so having some basic familiarity with the syntax and semantics will be helpful.

On Sunday afternoon, Tobias Dammers will lead a session on Haskell and Infosec.

In this workshop, we will look at Haskell from an information security point of view. How do security vulnerabilities such as SQL Injection (SQLi) or Cross-Site Scripting (XSS) work? How does a hacker exploit them? What can we, as Haskell programmers, do to prevent them, and how can Haskell help us with that? And what principles can we extract from this to develop more security-aware coding habits?

No knowledge of or experience with information security is required for this course, but participants are expected to have a working knowledge of practical Haskell. If you can write a simple web application with, e.g., Scotty, you should be fine.

We would be delighted to welcome you at these sessions or to discuss any interesting topics with you at ZuriHac.

If you cannot make it to ZuriHac but are still interested in our courses or other services, check our Training page, Services page, or just send us an email.

]]>It’s been almost a year since I touched the `kleene`

library, and almost two years since I published it – a good time to write a little about regular expressions.

I like regular expressions very much. They are truly declarative way to write down a grammar… as long as the grammar is expressible in regular expressions.

Matthew Might, David Darais and Daniel Spiewak have written a paper *Functional Pearl: Parsing with Derivatives* published in ICFP ’11 Proceedings [Might2011] in which regular expressions are extended to handle context-free languages. However, they rely on *memoization*, and – as structures are infinite – also on reference equality. In short, their approach is not implementable in idiomatic Haskell.^{1}

There’s another technique that works for a subset of context-free languages. In my opinion, it is very elegant, and it is at least not painfully slow. The result is available on Hackage: the `rere`

library. The idea is to treat regular expressions as a proper programming language, and add a constructions which proper languages should have: *variables* and *recursion*.

This blog post will describe the approach taken by `rere`

in more detail.

The abstract syntax of a regular expression (over the alphabet of unicode characters) is given by the following “constructors”:

- Null regexp:
- Empty string:
- Characters: , etc
- Concatenation:
- Alternation:
- Kleene star:

The above can be translated directly into Haskell:

In the `rere`

implementation, instead of bare `Char`

we use a set of characters, `CharSet`

, as recommended by Owens et al. in *Regular-expression derivatives reexamined* [Owens2009]. This makes the implementation more efficient, as a common case of character sets is explicitly taken into account. We write them in curly braces:
.

We can give *declarative* semantics to these constructors. These will look like typing rules. A judgment
denotes that the regular expression
successfully recognises the string
.

For example, the rule for application now looks like:

This rule states that if recognises , and recognises , then the concatenation expression recognises the concatenated string .

For alternation we have two rules, one for each of the alternatives:

The rules resemble the structure of *non-commutative intuitionistic linear logic*, if you are into such stuff. Not only do you have to use everything exactly once; you have to use everything in order, there aren’t any substructural rules, no weakening, no contraction and even no exchange. I will omit the rest of the rules, look them up (and think how rules for Kleene star would look like ‘why not’ exponential `?`

).

It’s a good idea to define smart versions of the constructors, which simplify regular expressions as they are created. For example, in the following `Semigroup`

instance for concatenation, `<>`

is a smart version of `App`

:

```
instance Semigroup RE where
-- Empty annihilates
Empty <> _ = Empty
_ <> Empty = Empty
-- Eps is unit of <>
Eps <> r = r
r <> Ep s = r
-- otherwise use App
r <> s = App r s
```

The smart version of `Alt`

is called `\/`

, and the smart version of `Star`

is called `star`

.

We can check that the simplifications performed by the smart constructors are sound, by using the semantic rules. For example, the simplification `Eps <> r = r`

is justified by the following equivalence of derivation trees:

If string
is matched by
, then “the match” can be constructor only in one way, by applying the
rule. Therefore
is also matched by bare
. If we introduced *proof terms*, we’d have a concrete evidence of the match as terms in this language.

There is, however, a problem: matching using declarative rules is not practical. At several points in these rules, we have to guess. We have to guess whether we should pick left or right branch, or where we should split string to match concatenated regular expression. For a practical implementation, we need a *syntax-directed* approach. Interestingly, we then need just two rules:

In the above rules, we use two operations: The decision procedure “nullable” that tells whether a regular expression can recognise the empty string, and a mapping
that, given a single character
and a regular expression
computes a new regular expression called the *derivative* of
with respect to
.

Both operations are quite easy to map to Haskell. The function `nullable`

is defined as a straight-forward recursive function:

```
nullable :: RE -> Bool
nullable Empty = False
nullable Eps = True
nullable (Ch _) = False
nullable (App r s) = nullable r && nullable s
nullable (Alt r s) = nullable r || nullable s
nullable (Star _) = True
```

The *Brzozowski derivative* is best understood by considering the *formal language*
regular expressions represent:

In Haskell terms: `derivative c r`

matches string `str`

if and only if `r`

matches `c : str`

. From this equivalence, we can more or less directly infer an implementation:

```
derivative :: Char -> RE -> RE
derivative _ Empty = Empty
derivative _ Eps = Empty
derivative _ (Ch x)
| c == x = Eps
| otherwise = Empty
derivative c (App r s)
| nullable r = derivative c s \/ derivative c r <> s
| otherwise = derivative c r <> s
derivative c (Alt r s) = derivative c r \/ derivative c s
derivative c r0@(Star r) = derivative c r <> r0
```

We could try to show that the *declarative* and *syntax directed* systems are equivalent, but I omit it here, because it’s been done often enough in the literature (though probably not in exactly this way and notation).

We can now watch how a regular expression “evolves” while matching a string. For example, if we take the regular expression , which in code looks like

then the following is how `match ex1 "abab"`

proceeds:

We can see that there’s implicitly a small finite state automaton, with two states: an initial state
and secondary state
. This is the approach taken by the `kleene`

package to transform regular expressions into finite state machines. There is an additional *character set* optimization from *Regular-expression derivatives re-examined* [Owens2009] by Owens, Reppy and Turon, but in essence, the approach works as follows: Try all possible derivatives, and in the process collect all the states and construct a transition function.^{2}

The string is accepted as the matching process stops at the state, which is nullable.

The first new construct we now add to regular-expressions are *let expressions*. They alone do not add any matching power, but they are prerequisite for allowing recursive expressions.

We already used meta-variables in the rules in the previous section. Let expressions allow us to internalise this notion. The declarative rule for let expressions is:

Here, the notation denotes substituting the variable by the regular expression in the regular expression .

To have let expressions in our implementation, we need to represent *variables* and we must be able to perform substitution. My tool of choice for handling variables and substitution in general is the `bound`

library. But for the sake of keeping the blog post self-contained, we’ll define the needed bits inline. We’ll reproduce a simple variant of `bound`

, which amounts to using de Bruijn indices and polymorphic recursion.

We define our own datatype to represent variables (which is isomorphic to `Maybe`

):

With this, we can extend regular expressions with let. First we make it a functor, i.e., change `RE`

to `RE a`

, and then also add two new constructors: `Var`

and `Let`

:

```
data RE a
= Empty
| Eps
| Ch Char
| App (RE a) (RE a)
| Alt (RE a) (RE a)
| Star (RE a)
| Var a
| Let (RE a) (RE (Var a))
```

Note that we keep the argument `a`

unchanged in all recursive occurrences of `RE`

, with the exception of the body of the `Let`

, where use use `Var a`

instead, indicating that we can use `B`

in that body to refer to the variable bound by the `Let`

.

In the actual `rere`

library, the `Let`

(and later `Fix`

) constructors additionally have an irrelevant `Name`

field, which allows us to retain the variable names and use them for pretty-printing. I omit them from the presentation in this blog post.

Now, we can write a regular expression with repetitions like instead of ; or in Haskell:

The use of `Void`

as parameter tells us that expression is *closed*, i.e., doesn’t contain any free variables.

We still need to extend `nullable`

and `derivative`

to work with the new constructors. For `nullable`

, we’ll simply pass a function telling whether variables in context are nullable. The existing constructors just pass a context around:

```
nullable :: RE Void -> Bool
nullable = nullable' absurd
nullable' :: (a -> Bool) -> RE a -> Bool
nullable' _ Empty = False
nullable' _ Eps = True
nullable' _ (Ch _) = False
nullable' f (App r s) = nullable' f r && nullable' f s
nullable' f (Alt r s) = nullable' f r || nullable' f s
nullable' _ (Star _) = True
```

The cases for `Var`

and `Let`

use and extend the context, respectively:

```
-- Var: look in the context
nullable' f (Var a) = f a
-- Let: - compute `nullable r`
-- - extend the context
-- - continue with `s`
nullable' f (Let r s) = nullable' (unvar (nullable' f r) f) s
```

The `unvar`

function corresponds to `maybe`

, but transported to our `Var`

type:

How to extend `derivative`

to cover the new cases requires a bit more thinking. The idea is similar: we want to add to the context whatever we need to know about the variables. The key insight is to replace every `Let`

binding by two `Let`

bindings, one copying the original, and one binding to the `derivative`

of the let-bound variable. Because the number of let bindings changes, we have to carefully re-index variables as we go.

Therefore, the context for `derivative`

consists of three pieces of information per variable:

- whether the variable is
`nullable`

(we need it for`derivative`

of`App`

), - the variable denoting the derivative of the original variable,
- the re-indexed variable denoting the original value.

The top-level function `derivative :: Char -> RE Void -> RE Void`

now makes use of a local helper function

which takes this context. Note that, as discussed above, `derivative'`

changes the indices of the variables. However, at the top-level, both `a`

and `b`

are `Void`

, and the environment can be trivially instantiated to the function with empty domain.

The `derivative'`

case for `Var`

is simple: we just look up the derivative of the `Var`

in the context.

The case for `Let`

is quite interesting:

```
derivative' f (Let r s)
= let_ (fmap (trdOf3 . f) r) -- rename variables in r
$ let_ (fmap F (derivative' f r)) -- binding for derivative of r
$ derivative' (\case
B -> (nullable' (fstOf3 . f) r, B, F B)
F x -> bimap (F . F) (F . F) (f x))
$ s
...
```

As a formula it looks like:

For our running example
or `Let (star (Ch 'a')) (Var B <> Var B)`

, we call `derivative'`

recursively with an argument of type `RE (Var a)`

, corresponding to the one variable
, and we get back a `RE (Var (Var b))`

, corresponding to the two variables
and
.

The careful reader will also have noticed the smart constructor `let_`

, which does a number of standard rewritings on the fly (which I explain in a *Do you have a problem? Write a compiler!* talk). These are justified by the properties of substitution:

```
-- inlining of cheap bindings
let x = a in b
-- ==>
b [ x -> a ] -- when a is cheap, i.e. Empty, Eps, Ch or Var
```

And importantly, we employ a quick form common-subexpression-elimination (CSE):

This form of CSE is easy and fast to implement, as we don’t introduce new `let`

s, only consider what we already bound and try to increase sharing.

It’s time for examples: Recall again `ex2`

which was defined as
or

Let’s try to observe the match of the string step by step:

As our smart constructors are quite smart, the automaton stays in its single state, the union comes from the `derivative`

of `App`

, as `r`

is nullable, we get `derivative 'a' r \/ derivative 'a' r <> r`

. And as `derivative 'a' r = r`

, we don’t see any additional `let`

bindings.

Now we are ready for the main topic of the post: *recursion*. We add one more constructor to our datatype of regular expressions:

The `Fix`

construct looks similar to `Let`

, except that the bound variable is semantically equivalent to the whole expression. We can *unroll* each
expression by substituting it into itself:

The `Fix`

constructor subsumes the Kleene star, as
can now be expressed as
, which feels like a very natural definition indeed. For example `ex1`

previously defined using Kleene star as
could also be re-defined as
. That looks like

in code.

The problem is now the same as with `Let`

: How to define `nullable`

and `derivative`

? Fortunately, we have most of the required machinery already in place from the addition of `Var`

and `Let`

.

Nullability of `Fix`

relies on Kleene’s theorem to compute the least fixed point of a monotonic recursive definition, like in *Parsing with Derivatives*. The idea is to unroll `Fix`

once, and to pretend that the nullability of the recursive occurrence of the bound variable in `Fix`

is `False`

:

In other words, we literally assume that the nullability of new binding is `False`

, and see what comes out. We don’t need to iterate more then once, as `False`

will flip to `True`

right away, or will never do so even with further unrollings.

Following a similar idea, our smart constructor `fix_`

is capable of recognising a `Empty`

fixed point by substituting `Empty`

for the recursive occurrence in the unrolling:

This works because `Empty`

is a bottom of the language-inclusion lattice (just as `False`

is a bottom of the `Bool`

lattice).

The extension of `derivative`

is again a bit more involved, but it resembles what we did for `Let`

: As the body
of a
contains self references
, the derivative of a
will also be a
. Thus, when we need to compute the derivative of
, we’ll use
. It is important that not all occurrences of
in the body of a
will turn into references to its derivative (e.g., if they appear to the right of an `App`

, or in a `Star`

), so we *need to save the value of
in a let binding* – how fortunate that we just introduced those … Schematically, the transformation looks as follows:

In the rest, we will use a shorthand notation for a let binding to a
, as in
. We will write such a binding more succinctly as
with the
subscript indicating that the binding is recursive. We prefer this notation over introducing
, because in a cascade of
expressions, we can have individual bindings being recursive, but we still *cannot* forward-reference to later bindings.

Applying the abbreviation to our derivation rule above yields

Let’s compare this to the let case, rearranged slightly, to establish the similarity:

Consequently, the implementation in Haskell also looks similar to the `Let`

case:

```
derivative' f r0@(Fix r)
= let_ (fmap (trdOf3 . f) r0)
$ fix_
$ derivative' (\case
B -> (nullable' (fstOf3 . f) r0, B, F B)
F x -> bimap (F . F) (F . F) (f x))
$ r
```

Let’s see how it works in practice. We observe the step-by-step matching of `ex3`

on `abab`

, which was `ex1`

defined using a fixed point rather than the Kleene star:

We couldn’t wish for a better outcome. We see the same two-state ping-pong behavior as we got using the Kleene star.

The
/ `Fix`

is a much more powerful construction than the Kleene star. Let’s look at some examples …

Probably the simplest non-regular language is some amount of
s followed by the *same* amount of
s:

We can describe that language using our library, thanks to the presence of fixed points: (note the variable in between the literal symbols). Transcribed to Haskell code, this is:

And we can test the expression on a string in the language, for example `"aaaabbbb"`

:

Now things become more interesting. We can see how in the trace of this not-so-regular expression, we obtain let bindings resembling the stack of a pushdown automaton.

From the trace one can relatively easily see that if we “forget” one `b`

at the end of the input string, then the “state” `b`

isn’t `nullable`

, so the string won’t be recognized.

Previously in this post, we have rewritten as . But another option is to use recursion on the left, i.e., to write instead:

This automaton works as well. In fact, in some sense it works better than the right-recursive one: we can see (as an artifact of variable naming), that we get the derivatives as output of each step. We do save the original expression in a , but as it is unused in the result, our smart constructors will drop it:

Another go-to example of context free grammars is arithmetic expressions:

The Haskell version is slightly more inconvenient to write due to the use of de Bruijn indices, but otherwise straight-forward:

```
ex6 :: RE Void
ex6 = let_ (Ch "0123456789")
$ let_ (Var B <> star_ (Var B))
$ fix_
$ ch_ '(' <> Var B <> ch_ ')'
\/ Var (F B)
\/ Var B <> ch_ '+' <> Var B
\/ Var B <> ch_ '*' <> Var B
```

Here is an (abbreviated) trace of matching the input string :