Simple SMT solver for use in an optimizing compiler

Wednesday, 31 December 2014, by Edsko de Vries.
Filed under coding.

This is a second blog post in a series about engineering optimizing compilers; the previous was Quasi-quoting DSLs for free. In this blog we will see how we might write a very simple SMT solver that will allow us to write an optimization pass that can turn something like

if a == 0 then
  if !(a == 0) && b == 1 then
    write 1
  else
    write 2
else
  write 3

into

if a == 0 then
  write 2
else
  write 3

without much effort at all.

Preliminaries

For the sake of this blog post we will consider a very simple imperative object language, defined as

data Expr =
    -- Arithmetic expressions
    ExprInt Integer           -- ^ Integer constants
  | ExprAdd Expr Expr         -- ^ Addition
    -- Boolean expressions
  | ExprBool Bool             -- ^ Boolean constants
  | ExprEq Expr Expr          -- ^ Equality
  | ExprNot Expr              -- ^ Negation
  | ExprAnd Expr Expr         -- ^ Logical conjunction
  | ExprOr Expr Expr          -- ^ Logical disjunction
    -- Variables
  | ExprVar VarName           -- ^ Read from a variable
  | ExprAssign VarName Expr   -- ^ Write to a variable
    -- Control flow
  | ExprSeq Expr Expr         -- ^ Sequential composition
  | ExprIf Expr Expr Expr     -- ^ Conditional
    -- Input/output
  | ExprRead                  -- ^ Read an integer from the console
  | ExprWrite Expr            -- ^ Write an integer to the console

We will assume the existence of a quasi-quoter for this language so that we can write Haskell fragments such as

[expr| if a == 0 then read else write b |]

instead of

ExprIf (ExprEq (ExprVar a) (ExprInt 0)) 
       ExprRead 
       (ExprWrite (ExprVar b))

How you can write such a quasi-quoter was the topic of the previous blog post. You should however be able to read this blog post without having read the previous post; hopefully the mapping from the concrete syntax to the constructors of Expr is pretty obvious.

Simplifying assumption

Our goal will be to write a function

provable :: Expr -> Bool

Let’s consider some examples:

Note that this means that it’s perfectly possible for both an expression and the negation of that expression to be unprovable.

What about an expression !(n == 0 && n == 1)? Morally speaking, this expression should be provable. However, we will be making the following very important simplifying assumption:

provable is allowed to give up: when provable returns False, this should be interpreted as “failed to prove”, rather than “there exist a counterexample”.

From a compiler perspective, if something is not statically provable, that simply means that an optimization may not be applied even though it could: that is, we miss an opportunity to make the program go faster, but we will not break the code.

An evaluator

We don’t want (or need) to embed a full blown theorem prover into our compiler: ideally we write something simple that will still catch a lot of the common cases. Moreover, we would prefer to reuse as much of our existing compiler infrastructure as we can. Our optimizer is likely to contain an interpreter for the object language, so that we can attempt to statically evaluate expressions. We are going to adapt this interpreter so that we can also use it in our solver. In fact, as we shall see, the solver will be a one-liner.

But we’re getting ahead of ourselves. Let’s consider how to write the interpreter first. The interpreter will be running in an Eval monad; for our first attempt, let’s define this monad as a a simple wrapper around the list monad:

newtype Eval a = Eval { unEval :: [a] }
  deriving ( Functor
           , Applicative
           , Alternative
           , Monad
           , MonadPlus
           )

runEval :: Eval a -> [a]
runEval act = unEval act

We will use the monad for failure:

throwError :: Eval a
throwError = Eval []

We can provide error recovery through

onError :: Eval a -> Eval a -> Eval a
onError (Eval act) (Eval handler) = Eval $
    case act of
      [] -> handler
      rs -> rs

We will see why we need the list monad when we discuss the evaluator for boolean expressions; but let’s consider the evaluator for integer expressions first:

evalInt :: Expr -> Eval Integer
evalInt = go
  where
    go (ExprInt i)    = return i
    go (ExprAdd a b)  = (+) <$> evalInt a <*> evalInt b
    go (ExprIf c a b) = do cond <- evalBool c
                           evalInt (if cond then a else b)
    go _              = throwError 

Hopefully this should be pretty self explanatory; our toy language only has a few integer-valued expressions, so there isn’t much to do. The interpreter for boolean expressions is more interesting:

evalBool :: Expr -> Eval Bool
evalBool = \e -> go e `onError` guess e
  where
    go (ExprBool a)   = return a
    go (ExprEq   a b) = (==) <$> evalInt a <*> evalInt b
    go (ExprNot  a)   = not  <$> evalBool a
    go (ExprAnd  a b) = (&&) <$> evalBool a <*> evalBool b
    go (ExprOr   a b) = (||) <$> evalBool a <*> evalBool b
    go (ExprIf c a b) = do cond <- evalBool c
                           evalBool (if cond then a else b)
    go _              = throwError 

    guess _e = return True <|> return False

The definition of go contains no surprises, and follows the definition of go in evalInt very closely. However, the top-level definition

evalBool = \e -> eval e `onError` guess e

is more interesting. If for some reason we fail to evaluate a boolean expression (for example, because it contains a variable) then guess returns both True and False. Let’s consider some examples:

runEval $ evalBool [expr| True |]

evalutes to [True];

runEval $ evalBool [expr| a |]

evaluates to [True, False] because we don’t know what value a has, but

runEval $ evalBool [expr| a || True |]

evaluates to [True, True]: we still have two guesses for a, but no matter what we guess a || True always evaluates to True.

Satisfiability

We can now write our SMT solver; as promised, it will be a single line:

satisfiable :: Expr -> Bool
satisfiable = or . runEval . evalBool

Function satisfiable (the “S” in SMT) checks if the expression is true for some values of the variables in the expression. How do we check this? Well, we just run our evaluator which, when it encounters a variable, will return both values for the variable. Hence, if any of the values returned by the evaluator is True, then the expression is true at least for one value for each variable.

Once we have an implementation of satisfiability, we can implement provable very easily: an expression is provable if its negation is not satisfiable:

provable :: Expr -> Bool
provable = not . satisfiable . ExprNot

If we consider the three examples from the previous section, we will find that both True and a || True are provable, but a by itself is not, as expected.

Inconsistent guesses

The careful reader might at this point find his brow furrowed, because something is amiss:

runEval $ evalBool [expr| a || !a |]

evaluates to

[True, True, False, True]

This happens because the evaluator will make two separate independent guesses about the value of a. As a consequence, a || !a will be considered not provable.

Is this a bug? No, it’s not. Recall that we made the simplifying assumption that provable is allowed to give up: it’s allowed to say that an expression is not provable even when it morally is. The corresponding (dual) simplifying assumption for satisfability, and hence the interpreter, is:

The interpreter, and hence satisfiability, is allowed to make inconsistent guesses.

Making an inconsistent guess is equivalent to assuming False: anything follows from False and hence any expression will be considered satisfiable once we make an inconsistent guess. As a consequence, this means that once we make inconsistent guesses, we will consider the expression as not provable.

More precision

We can however do better: whenever we make a guess that a particular expression evaluates to True or False, then if we see that same expression again we can safely make the same guess, rather than making an independent guess. To implement this, we extend our Eval monad with some state to remember the guesses we made so far:

newtype Eval a = Eval { unEval :: StateT [(Expr, Bool)] [] a }
  deriving ( Functor
           , Applicative
           , Alternative
           , Monad
           , MonadPlus
           , MonadState [(Expr, Bool)]
           )
           
runEval :: Eval a -> [a]
runEval act = evalStateT (unEval act) []

throwError :: Eval a
throwError = Eval $ StateT $ \_st -> []

onError :: Eval a -> Eval a -> Eval a
onError (Eval act) (Eval handler) = Eval $ StateT $ \st ->
    case runStateT act st of
      [] -> runStateT handler st
      rs -> rs

The implementation of evalInt does not change at all. The implementation of evalBool also stays almost the same; the only change is the definition of guess:

guess e = do
  st <- get
  case lookup e st of
    Just b  -> return b
    Nothing -> (do put ((e, True)  : st) ; return True)
           <|> (do put ((e, False) : st) ; return False)

This implements the logic we just described: when we guess the value for an expression e, we first look up if we already made a guess for this expression. If so, we return the previous guess; otherwise, we make a guess and record that guess.

Once we make this change

runEval $ evalBool [expr| a || !a |]

will evaluate to [True,True] and consequently a || !a will be considered provable.

Example: folding nested conditionals

As an example of how one might use this infrastructure, we will consider a simple pass that simplifies nested conditionals (if-statements). We can use provable to check if one expression implies the other:

(~>) :: Expr -> Expr -> Bool
(~>) a b = provable [expr| ! $a || $b |]

The simplifier is now very easy to write:

simplifyNestedIf :: Expr -> Expr
simplifyNestedIf = everywhere (mkT go)
  where
    go [expr| if $c1 then
                 if $c2 then
                   $e1
                 else
                   $e2
               else
                 $e3 |]
      | c1 ~> c2              = [expr| if $c1 then $e1 else $e3 |]
      | c1 ~> [expr| ! $c2 |] = [expr| if $c1 then $e2 else $e3 |]
    go e = e

The auxiliary function go pattern matches against any if-statement that has another if-statement as its “then” branch. If we can prove that the condition of the outer if-statement implies the condition of the inner if-statement, we can collapse the inner if-statement to just its “then” branch. Similarly, if we can prove that the condition of the outer if-statement implies the negation of the condition of the inner if-statement, we can collapse the inner if-statement to just its “else” branch. In all other cases, we return the expression unchanged. Finally, we use the Scrap Your Boilerplate operators everywhere and mkT to apply this transformation everywhere in an expression (rather than just applying it top-level). For example,

simplifyNestedIf [expr| 
    if a == 0 then 
      if !(a == 0) && b == 1 then 
        write 1 
      else 
        write 2 
    else 
      write 3 
  |]

evaluates to

if a == 0 then
  write 2
else
  write 3

as promised. Incidentally, perhaps you are wondering why such an optimization would be useful; after all, which self respecting programmer writes such code? However, code like the above may result from other compiler optimizations such as inlining: imagine that the inner if-statement came from a inlined function. A lot of compiler optimizations are designed to clean up other compiler optimizations.

Conclusion

We can implement a very simple but useful SMT solver for use in an optimizing compiler by making a small change to the interpreter, which we need anyway.

Note that the increase in precision gained from recording previous guesses is a gradual increase only; satisfiability may still make some inconsistent guesses. For example

runEval $ evalBool [expr| !(n == 0 && n == 1) |]

will evaluate to

[False,True,True,True]

because it is making independent guesses about n == 0 and n == 1; consequently !(n == 0 && n == 1) will not be considered provable. We can increase the precision of our solver by making guess smarter (the “MT” or “modulo theories” part of SMT). For example, we could record some information about the guesses about integer valued variables.

At the extreme end of the scale we would be implementing a full decision procedure for first order arithmetic, which is probably optimization gone too far. However, the approach we took above where we start from the basis that we are allowed to make inconsistent guesses gives us a way to implement a simple solver that addresses the most important cases, even if it misses a few—without requiring a lot of complicated infrastructure in the compiler. And as long as we make sure that our evaluator never returns too few values (even if it may return too many) we don’t have to worry that we might generate invalid code: the worst that can happen is that we miss an optimization.