# Simple SMT solver for use in an optimizing compiler

## Edsko de Vries – Wednesday, 31 December 2014

coding

This is a second blog post in a series about engineering optimizing compilers; the previous was Quasi-quoting DSLs for free. In this blog we will see how we might write a very simple SMT solver that will allow us to write an optimization pass that can turn something like

``````if a == 0 then
if !(a == 0) && b == 1 then
write 1
else
write 2
else
write 3``````

into

``````if a == 0 then
write 2
else
write 3``````

without much effort at all.

## Preliminaries

For the sake of this blog post we will consider a very simple imperative object language, defined as

``````data Expr =
-- Arithmetic expressions
ExprInt Integer           -- ^ Integer constants
-- Boolean expressions
| ExprBool Bool             -- ^ Boolean constants
| ExprEq Expr Expr          -- ^ Equality
| ExprNot Expr              -- ^ Negation
| ExprAnd Expr Expr         -- ^ Logical conjunction
| ExprOr Expr Expr          -- ^ Logical disjunction
-- Variables
| ExprVar VarName           -- ^ Read from a variable
| ExprAssign VarName Expr   -- ^ Write to a variable
-- Control flow
| ExprSeq Expr Expr         -- ^ Sequential composition
| ExprIf Expr Expr Expr     -- ^ Conditional
-- Input/output
| ExprWrite Expr            -- ^ Write an integer to the console``````

We will assume the existence of a quasi-quoter for this language so that we can write Haskell fragments such as

``[expr| if a == 0 then read else write b |]``

``````ExprIf (ExprEq (ExprVar a) (ExprInt 0))
(ExprWrite (ExprVar b))``````

How you can write such a quasi-quoter was the topic of the previous blog post. You should however be able to read this blog post without having read the previous post; hopefully the mapping from the concrete syntax to the constructors of `Expr` is pretty obvious.

## Simplifying assumption

Our goal will be to write a function

``provable :: Expr -> Bool``

Let’s consider some examples:

• The expression `a || True` should be provable: no matter what value variable `a` has, `a || True` is always `True`.

• Likewise, the expression `a || !a` should be provable, for the same reason.

• However, expression `a` should not be provable (`a` might be `False`)

• Likewise, expression `!a` should also not be provable (`a` might be `True`)

Note that this means that it’s perfectly possible for both an expression and the negation of that expression to be unprovable.

What about an expression `!(n == 0 && n == 1)`? Morally speaking, this expression should be provable. However, we will be making the following very important simplifying assumption:

`provable` is allowed to give up: when `provable` returns `False`, this should be interpreted as “failed to prove”, rather than “there exist a counterexample”.

From a compiler perspective, if something is not statically provable, that simply means that an optimization may not be applied even though it could: that is, we miss an opportunity to make the program go faster, but we will not break the code.

## An evaluator

We don’t want (or need) to embed a full blown theorem prover into our compiler: ideally we write something simple that will still catch a lot of the common cases. Moreover, we would prefer to reuse as much of our existing compiler infrastructure as we can. Our optimizer is likely to contain an interpreter for the object language, so that we can attempt to statically evaluate expressions. We are going to adapt this interpreter so that we can also use it in our solver. In fact, as we shall see, the solver will be a one-liner.

But we’re getting ahead of ourselves. Let’s consider how to write the interpreter first. The interpreter will be running in an `Eval` monad; for our first attempt, let’s define this monad as a a simple wrapper around the list monad:

``````newtype Eval a = Eval { unEval :: [a] }
deriving ( Functor
, Applicative
, Alternative
)

runEval :: Eval a -> [a]
runEval act = unEval act``````

We will use the monad for failure:

``````throwError :: Eval a
throwError = Eval []``````

We can provide error recovery through

``````onError :: Eval a -> Eval a -> Eval a
onError (Eval act) (Eval handler) = Eval \$
case act of
[] -> handler
rs -> rs``````

We will see why we need the list monad when we discuss the evaluator for boolean expressions; but let’s consider the evaluator for integer expressions first:

``````evalInt :: Expr -> Eval Integer
evalInt = go
where
go (ExprInt i)    = return i
go (ExprAdd a b)  = (+) <\$> evalInt a <*> evalInt b
go (ExprIf c a b) = do cond <- evalBool c
evalInt (if cond then a else b)
go _              = throwError ``````

Hopefully this should be pretty self explanatory; our toy language only has a few integer-valued expressions, so there isn’t much to do. The interpreter for boolean expressions is more interesting:

``````evalBool :: Expr -> Eval Bool
evalBool = \e -> go e `onError` guess e
where
go (ExprBool a)   = return a
go (ExprEq   a b) = (==) <\$> evalInt a <*> evalInt b
go (ExprNot  a)   = not  <\$> evalBool a
go (ExprAnd  a b) = (&&) <\$> evalBool a <*> evalBool b
go (ExprOr   a b) = (||) <\$> evalBool a <*> evalBool b
go (ExprIf c a b) = do cond <- evalBool c
evalBool (if cond then a else b)
go _              = throwError

guess _e = return True <|> return False``````

The definition of `go` contains no surprises, and follows the definition of `go` in `evalInt` very closely. However, the top-level definition

``evalBool = \e -> eval e `onError` guess e``

is more interesting. If for some reason we fail to evaluate a boolean expression (for example, because it contains a variable) then `guess` returns both `True` and `False`. Let’s consider some examples:

``runEval \$ evalBool [expr| True |]``

evalutes to `[True]`;

``runEval \$ evalBool [expr| a |]``

evaluates to `[True, False]` because we don’t know what value `a` has, but

``runEval \$ evalBool [expr| a || True |]``

evaluates to `[True, True]`: we still have two guesses for `a`, but no matter what we guess `a || True` always evaluates to `True`.

## Satisfiability

We can now write our SMT solver; as promised, it will be a single line:

``````satisfiable :: Expr -> Bool
satisfiable = or . runEval . evalBool``````

Function `satisfiable` (the “S” in SMT) checks if the expression is true for some values of the variables in the expression. How do we check this? Well, we just run our evaluator which, when it encounters a variable, will return both values for the variable. Hence, if any of the values returned by the evaluator is `True`, then the expression is true at least for one value for each variable.

Once we have an implementation of satisfiability, we can implement `provable` very easily: an expression is provable if its negation is not satisfiable:

``````provable :: Expr -> Bool
provable = not . satisfiable . ExprNot``````

If we consider the three examples from the previous section, we will find that both `True` and `a || True` are provable, but `a` by itself is not, as expected.

## Inconsistent guesses

The careful reader might at this point find his brow furrowed, because something is amiss:

``runEval \$ evalBool [expr| a || !a |]``

evaluates to

``[True, True, False, True]``

This happens because the evaluator will make two separate independent guesses about the value of `a`. As a consequence, `a || !a` will be considered not provable.

Is this a bug? No, it’s not. Recall that we made the simplifying assumption that `provable` is allowed to give up: it’s allowed to say that an expression is not provable even when it morally is. The corresponding (dual) simplifying assumption for satisfability, and hence the interpreter, is:

The interpreter, and hence satisfiability, is allowed to make inconsistent guesses.

Making an inconsistent guess is equivalent to assuming `False`: anything follows from `False` and hence any expression will be considered satisfiable once we make an inconsistent guess. As a consequence, this means that once we make inconsistent guesses, we will consider the expression as not provable.

## More precision

We can however do better: whenever we make a guess that a particular expression evaluates to `True` or `False`, then if we see that same expression again we can safely make the same guess, rather than making an independent guess. To implement this, we extend our `Eval` monad with some state to remember the guesses we made so far:

``````newtype Eval a = Eval { unEval :: StateT [(Expr, Bool)] [] a }
deriving ( Functor
, Applicative
, Alternative
)

runEval :: Eval a -> [a]
runEval act = evalStateT (unEval act) []

throwError :: Eval a
throwError = Eval \$ StateT \$ \_st -> []

onError :: Eval a -> Eval a -> Eval a
onError (Eval act) (Eval handler) = Eval \$ StateT \$ \st ->
case runStateT act st of
[] -> runStateT handler st
rs -> rs``````

The implementation of `evalInt` does not change at all. The implementation of `evalBool` also stays almost the same; the only change is the definition of `guess`:

``````guess e = do
st <- get
case lookup e st of
Just b  -> return b
Nothing -> (do put ((e, True)  : st) ; return True)
<|> (do put ((e, False) : st) ; return False)``````

This implements the logic we just described: when we guess the value for an expression `e`, we first look up if we already made a guess for this expression. If so, we return the previous guess; otherwise, we make a guess and record that guess.

Once we make this change

``runEval \$ evalBool [expr| a || !a |]``

will evaluate to `[True,True]` and consequently `a || !a` will be considered provable.

## Example: folding nested conditionals

As an example of how one might use this infrastructure, we will consider a simple pass that simplifies nested conditionals (if-statements). We can use `provable` to check if one expression implies the other:

``````(~>) :: Expr -> Expr -> Bool
(~>) a b = provable [expr| ! \$a || \$b |]``````

The simplifier is now very easy to write:

``````simplifyNestedIf :: Expr -> Expr
simplifyNestedIf = everywhere (mkT go)
where
go [expr| if \$c1 then
if \$c2 then
\$e1
else
\$e2
else
\$e3 |]
| c1 ~> c2              = [expr| if \$c1 then \$e1 else \$e3 |]
| c1 ~> [expr| ! \$c2 |] = [expr| if \$c1 then \$e2 else \$e3 |]
go e = e``````

The auxiliary function `go` pattern matches against any if-statement that has another if-statement as its “then” branch. If we can prove that the condition of the outer if-statement implies the condition of the inner if-statement, we can collapse the inner if-statement to just its “then” branch. Similarly, if we can prove that the condition of the outer if-statement implies the negation of the condition of the inner if-statement, we can collapse the inner if-statement to just its “else” branch. In all other cases, we return the expression unchanged. Finally, we use the Scrap Your Boilerplate operators `everywhere` and `mkT` to apply this transformation everywhere in an expression (rather than just applying it top-level). For example,

``````simplifyNestedIf [expr|
if a == 0 then
if !(a == 0) && b == 1 then
write 1
else
write 2
else
write 3
|]``````

evaluates to

``````if a == 0 then
write 2
else
write 3``````

as promised. Incidentally, perhaps you are wondering why such an optimization would be useful; after all, which self respecting programmer writes such code? However, code like the above may result from other compiler optimizations such as inlining: imagine that the inner if-statement came from a inlined function. A lot of compiler optimizations are designed to clean up other compiler optimizations.

## Conclusion

We can implement a very simple but useful SMT solver for use in an optimizing compiler by making a small change to the interpreter, which we need anyway.

Note that the increase in precision gained from recording previous guesses is a gradual increase only; satisfiability may still make some inconsistent guesses. For example

``runEval \$ evalBool [expr| !(n == 0 && n == 1) |]``

will evaluate to

``[False,True,True,True]``

because it is making independent guesses about `n == 0` and `n == 1`; consequently `!(n == 0 && n == 1)` will not be considered provable. We can increase the precision of our solver by making `guess` smarter (the “MT” or “modulo theories” part of SMT). For example, we could record some information about the guesses about integer valued variables.

At the extreme end of the scale we would be implementing a full decision procedure for first order arithmetic, which is probably optimization gone too far. However, the approach we took above where we start from the basis that we are allowed to make inconsistent guesses gives us a way to implement a simple solver that addresses the most important cases, even if it misses a few—without requiring a lot of complicated infrastructure in the compiler. And as long as we make sure that our evaluator never returns too few values (even if it may return too many) we don’t have to worry that we might generate invalid code: the worst that can happen is that we miss an optimization.