Quasi-quoting DSLs for free

Sunday, 19 October 2014, by Edsko de Vries.
Filed under coding.

Suppose you are writing a compiler for some programming language or DSL. If you are doing source to source transformations in your compiler, perhaps as part of an optimization pass, you will need to construct and deconstruct bits of abstract syntax. It would be very convenient if we could write that abstract syntax using the syntax of your language. In this blog post we show how you can reuse your existing compiler infrastructure to make this possible by writing a quasi-quoter with support for metavariables. As we will see, a key insight is that we can reuse object variables as meta variables.

Toy Language “Imp

For the sake of this blog post we will be working with a toy language called Imp. The abstract syntax for Imp is defined by

type VarName = String

data Expr =
    Var VarName
  | Add Expr Expr
  | Sub Expr Expr
  | Int Integer
  | Read
  deriving (Data, Typeable, Show, Eq)

data Cmd =
    Write Expr
  | Assign VarName Expr
  | Decl VarName
  deriving (Data, Typeable, Show)

data Prog = Prog [Cmd]
  deriving (Data, Typeable, Show)

and we will assume that we have some parsec parsers

parseExpr :: Parser Expr
parseProg :: Parser Prog

We will also make use of

topLevel :: Parser a -> Parser a
topLevel p = whiteSpace *> p <* eof

and the following useful combinator for running a parser:

parseIO :: Parser a -> String -> IO a

The details of these parsers are beyond the scope of this post. There are plenty of parsec tutorials online; for instance, you could start with the parsec chapter in Real World Haskell. Moreover, the full code for this blog post, including a simple interpreter for the language, is available on github if you want to play with it. Here is a simple example of an Imp program:

var x ;
x := read ;
write (x + x + 1)

A simple quasi-quoter

We want to be able to write something like

prog1 :: Prog
prog1 = [prog|
    var x ;
    x := read ;
    write (x + x + 1)
  |]

where the intention is that the [prog| ... |] quasi-quote will expand to something like

prog1 = Prog [
      Decl "x"
    , Assign "x" Read
    , Write (Add (Add (Var "x") (Var "x")) (Int 1))
    ]

To achieve this, we have to write a quasi-quoter. A quasi-quoter is an instance of the following data type:

data QuasiQuoter = QuasiQuoter {
    quoteExp  :: String -> Q Exp
  , quotePat  :: String -> Q Pat
  , quoteType :: String -> Q Type
  , quoteDec  :: String -> Q [Dec]
  }

The different fields are used when using the quasi-quoter in different places in your Haskell program: at a position where we expect a (Haskell) expression, a pattern (we will see an example of that later), a type or a declaration; we will not consider the latter two at all in this blog post.

In order to make the above example (prog1) work, we need to implement quoteExp but we can leave the other fields undefined:

prog :: QuasiQuoter
prog = QuasiQuoter {
      quoteExp = \str -> do
        l <- location'
        c <- runIO $ parseIO (setPosition l *> topLevel parseProg) str
        dataToExpQ (const Nothing) c
    , quotePat  = undefined
    , quoteType = undefined
    , quoteDec  = undefined
    }

Let’s see what’s going on here. The quasi-quoter gets as argument the string in the quasi-quote brackets, and must return a Haskell expression in the Template-Haskell Q monad. This monad supports, amongst other things, getting the current location in the Haskell file. It also supports IO.

Location

The first thing that we do is find the current location in the Haskell source file and convert it to parsec format:

location' :: Q SourcePos
location' = aux <$> location
  where
    aux :: Loc -> SourcePos
    aux loc = uncurry (newPos (loc_filename loc)) (loc_start loc)

Running the parser

Once we have the location we then parse the input string to a term in our abstract syntax (something of type Prog). We use parsec’s setPosition to tell parsec where we are in the Haskell source file, so that if we make a mistake such as

prog1 :: Prog
prog1 = [prog|
    var x ;
    x := read ;
    write (x + x + )
  |]

we get an error that points to the correct location in our Haskell file:

TestQQAST.hs:6:9:
    Exception when trying to run compile-time code:
      user error ("TestQQAST.hs" (line 9, column 20):
unexpected ")"
expecting "(", "read", identifier or integer)

Converting to Haskell abstract syntax

The parser returns something of type Prog, but we want something of type Exp; Exp is defined in Template Haskell and reifies the abstract syntax of Haskell. For example, we would have to translate the Imp abstract syntax term

Var "x" :: Prog

to its reflection as a piece of abstract Haskell syntax as

AppE (ConE 'Var) (LitE (StringL "x")) :: TH.Exp

which, when spliced into the Haskell source, yields the original Prog value. Fortunately, we don’t have to write this translation by hand, but we can make use of the following Template Haskell function:

dataToExpQ :: Data a
           => (forall b. Data b => b -> Maybe (Q Exp))
           -> a -> Q Exp

This function can translate any term to a reified Haskell expression, as long as the type of the term derives Data (Data instances can be auto-derived by ghc if you enable the DeriveDataTypeable language extension). The first argument allows you to override the behaviour of the function for specific cases; we will see an example use case in the next section. In our quasi-quoter so far we don’t want to override anything, so we pass a function that always returns Nothing.

Once we have defined this quasi-quoter we can write

prog1 :: Prog
prog1 = [prog|
    var x ;
    x := read ;
    write (x + x + 1)
  |]

and ghc will run our quasi-quoter and splice in the Haskell expression corresponding to the abstract syntax tree of this program (provided that we enable the QuasiQuotes language extension).

Meta-variables

Consider this function:

prog2 :: VarName -> Integer -> Prog
prog2 y n = [prog|
    var x ;
    x := read ;
    write (x + y + n)
  |]

As mentioned, in the source code for this blog post we also have an interpreter for the language. What happens if we try to run (prog2 "x" 1)?

*Main> intIO $ intProg (prog2 "x" 2)
5
*** Exception: user error (Unbound variable "y")

Indeed, when we look at the syntax tree that got spliced in for prog2 we see

Prog [ Decl "x"
     , Assign "x" Read
     , Write (Add (Add (Var "x") (Var "y")) (Var "n"))
     ]

What happened? Didn’t we pass in "x" as the argument y? Actually, on second thought, this makes perfect sense: this is after all what our string parses to. The fact that y and n also happen to be Haskell variables, and happen to be in scope at the point of the quasi-quote, is really irrelevant. But we would still like prog2 to do what we expected it to do.

Meta-variables in Template Haskell

To do that, we have to support meta variables: variables from the “meta” language (Haskell) instead of the object language (Imp). Template Haskell supports this out of the box. For example, we can define

ex :: Lift a => a -> Q Exp
ex x = [| id x |]

Given any argument that supports Lift, ex constructs a piece of abstract Haskell syntax which corresponds to the application of the identity function to x. (Don’t confuse this with anti-quotation; see Brief Intro to Quasi-Quotation.) Lift is a type class with a single method

class Lift t where
  lift :: t -> Q Exp

For example, here is the instance for Integer:

instance Lift Integer where
  lift x = return (LitE (IntegerL x))

Meta-variables in quasi-quotes

Quasi-quotes don’t have automatic support for meta-variables. This makes sense: Template Haskell is for quoting Haskell so it has a specific concrete syntax to work with, where as quasi-quotes are for arbitrary custom syntaxes and so we have to decide what the syntax and behaviour of meta-variables is going to be.

For Imp we want to translate any unbound Imp (object-level) variable in the quasi-quote to a reference to a Haskell (meta-level) variable. To do that, we will introduce a similar type class to Lift:

class ToExpr a where
  toExpr :: a -> Expr

and provide instances for variables and integers:

instance ToExpr VarName where
  toExpr = Var

instance ToExpr Integer where
  toExpr = Int

We will also need to know which variables in an Imp program are bound and unbound; in the source code you will find a function which returns the set of free variables in an Imp program:

fvProg :: Prog -> Set VarName

Overriding the behaviour of dataToExpQ

In the previous section we mentioned that rather than doing the Prog -> Q Exp transformation by hand we use the generic function dataToExpQ to do it for us. However, now we want to override the behaviour of this function for the specific case of unbound Imp variables, which we want to translate to Haskell variables.

Recall that dataToExpQ has type

dataToExpQ :: Data a
           => (forall b. Data b => b -> Maybe (Q Exp))
           -> a -> Q Exp

This is a rank-2 type: the first argument to dataToExpQ must itself be polymorphic in b: it must work on any type b that derives Data. So far we have been passing in

const Nothing

which is obviously polymorphic in b since it completely ignores its argument. But how do we do something more interesting? Data and its associated combinators come from a generic programming library called Scrap Your Boilerplate (Data.Generics). A full discussion of SYB is beyond the scope of this blog post; the SYB papers are a good starting point if you would like to know more (I would recommend reading them in chronological order, the first published paper first). For the sake of what we are trying to do it suffices to know about the existence of the following combinator:

extQ :: (Typeable a, Typeable b) => (a -> q) -> (b -> q) -> a -> q

Given a polymorphic query (forall a)—in our case this is const NothingextQ allows to extend the query with a type specific case (for a specific type b). We will use this to give a specific case for Expr: when we see a free variable in an expression we translate it to an application of toExpr to a Haskell variable with the same name:

metaExp :: Set VarName -> Expr -> Maybe ExpQ
metaExp fvs (Var x) | x `Set.member` fvs =
  Just [| toExpr $(varE (mkName x)) |]
metaExp _ _ =
  Nothing

The improved quasi-quoter

With this in hand we can define our improved quasi-quoter:

prog :: QuasiQuoter
prog = QuasiQuoter {
      quoteExp = \str -> do
        l <- location'
        c <- runIO $ parseIO (setPosition l *> topLevel parseProg) str
        dataToExpQ (const Nothing `extQ` metaExp (fvProg c)) c
    , quotePat  = undefined
    , quoteType = undefined
    , quoteDec  = undefined
    }

Note that we are extending the query for Expr, not for Prog; dataToExpQ (or, more accurately, SYB) makes sure that this extension is applied at all the right places. Running (prog2 "x" 2) now has the expected behaviour:

*Main> intIO $ intProg (prog2 "x" 2)
6
14

Indeed, when we have a variable in our code that is unbound both in Imp and in Haskell, we now get a Haskell type error:

prog2 :: VarName -> Integer -> Prog
prog2 y n = [prog|
    var x ;
    x := read ;
    write (x + z + n)
  |]

gives

TestQQAST.hs:15:19: Not in scope: ‘z’

Parenthetical remark: it is a design decision whether or not we want to allow local binding sites in a splice to “capture” meta-variables. Put another way, when we pass in "x" to prog2, do we mean the x that is bound locally in prog2, or do we mean a different x? Certainly a case can be made that we should not be able to refer to the locally bound x at all—after all, it’s not bound outside of the snippet! This is an orthogonal concern however and we will not discuss it any further in this blog post.

Quasi-quoting patterns

We can also use quasi-quoting to support patterns. This enables us to write something like

optimize :: Expr -> Expr
optimize [expr| a + n - m |] | n == m = optimize a
optimize other = other

As before, the occurrence of a in this pattern is free, and we intend it to correspond to a Haskell variable, not an Imp variable; the above code should correspond to

optimize (Sub (Add a n) m) | n == m = optimize a

(note that this is comparing Exprs for equality, hence the need for Expr to derive Eq). We did not mean the pattern

optimize (Sub (Add (Var "a") (Var "n")) (Var "m"))

To achieve this, we can define a quasi-quoter for Expr that supports patterns (as well as expressions):

expr :: QuasiQuoter
expr = QuasiQuoter {
      quoteExp  = \str -> do
        l <- location'
        e <- runIO $ parseIO (setPosition l *> topLevel parseExpr) str
        dataToExpQ (const Nothing `extQ` metaExp (fvExpr e)) e
    , quotePat  = \str -> do
        l <- location'
        e <- runIO $ parseIO (setPosition l *> topLevel parseExpr) str
        dataToPatQ (const Nothing `extQ` metaPat (fvExpr e)) e
    , quoteType = undefined
    , quoteDec  = undefined
    }

The implementation of quotePat is very similar to the definition of quoteExp. The only difference is that we use dataToPatQ instead of dataToExpQ to generate a Haskell pattern rather than a Haskell expression, and we use metaPat to give a type specific case which translates free Imp variables to Haskell pattern variables:

metaPat :: Set VarName -> Expr -> Maybe PatQ
metaPat fvs (Var x) | x `Set.member` fvs = Just (varP (mkName x))
metaPat _ _ = Nothing

Note that there is no need to lift now; the Haskell variable will be bound to whatever matches in the expression.

Limitations

We might be tempted to also add support for Prog patterns. While that is certainly possible, it’s of limited use if we follow the same strategy that we followed for expressions. For instance, we would not be able to write something like

opt [prog| var x ; c |] | x `Set.notMember` fvProg c = opt c

The intention here is that we can remove unused variables. Unfortunately, this will not work because this will cause a parse error: the syntax for Imp does not allow for variables for commands, and hence we also don’t allow for meta-variables at this point. This is important to remember:

By using object-level variables as stand-ins for meta-level variables, we only allow for meta-level variables where the syntax for the object-level language allows variables.

If this is too restrictive, we need to add special support in the ADT and in the corresponding parsers for meta-variables. This is a trade-off in increased expressiveness of the quasi-quotes against additional complexity in their implementation (new parsers, new ADTs).

Conclusions

By reusing object-level variables as stand-ins for meta variables you can reuse existing parsers and ADTs to define quasi-quoters. Using the approach described in this blog we were able to add support for quasi-quoting to a real compiler for a domain specific programming language with a minimum of effort. The implementation is very similar to what we have shown above, except that we also dealt with renaming (so that meta variables cannot be captured by binding sites in the quasi quotes) and type checking (reusing the existing renamer and type checker, of course).