Integrated versus Manual Shrinking

Monday, 13 May 2019, by Edsko de Vries.
Filed under coding.

TL;DR: Even with integrated shrinking, you still have to think about shrinking. There is no free lunch. Also, important new Hedgehog release!

Property-based testing is an approach to software testing where instead of writing tests we generate tests, based on properties that the software should have. To make this work, we need to be able to generate test data and, when we find a counter example, we need to shrink that test data to attempt to construct a minimal test case.

In Haskell, the library QuickCheck has long been the library of choice for property based testing, but recently another library called Hedgehog has been gaining popularity. One of the key differences between these two libraries is that in QuickCheck one writes explicit generation and shrinking functions, whereas in Hedgehog shrinking is integrated in generation. In this blog post we will explain what that means by developing a mini-QuickCheck and mini-Hedgehog and compare the two. We will consider some examples where integrated shrinking gives us benefits over manual shrinking, but we we will also see that the belief that integrated shrinking basically means that we can forget about shrinking altogether is not justified. There is no such thing as a free shrinker.

The release of this blog post coincides with release 1.0 of Hedgehog. This is an important update which, amongst lots of other goodies, includes many bug fixes and improvements to shrinking based on earlier drafts of this blog post. Upgrading is strongly recommended.

This blog post is not intended as an introduction to property-based testing. We will assume the reader has at least a superficial familiarity with setting up property based tests (in QuickCheck, Hedgehog, or otherwise). If you want to follow along, the code we present here is available from GitHub.

Mini-QuickCheck

In this section we will develop a mini-QuickCheck interface which will enable us to study how shrinking works in QuickCheck's manual approach. Although many readers will be more familiar with this than they might be with the integrated shrinking approach, understanding how shrinking works exactly can be quite subtle and so we will spend a bit of time here to set up our running examples. We will then come back to these examples when we look at integrated shrinking in the next section.

Generation

When we want to test a property requiring input data of type a, we have to write a generator that produces random elements of type a. In order to be able to do that we need access to some kind of pseudo-random number generator, and so we define the type of generators for type a as

where R is some module providing PRNGs; in this blog post will we use System.Random for simplicity’s sake1. Gen forms a monad; in return we simply ignore the PRNG, and in (>>=) we split the PRNG into two:

(the Applicative instance is then the implied one). Technically speaking this breaks the monad laws since

but we can argue that this satisfies the monad laws “up to choice of PRNG”, which is modelling randomness anyway and should not be observable2.

Shrinking

Generating random test data is not sufficient. For example, consider testing the property that “for any pair (x, y), the sum x + y must be zero”. Clearly this property does not hold, and a good generator will easily find a counter-example. However, the counter-example we find might not be minimal; for instance, we might find the counter-example (28,89). It is therefore important that we can shrink counter-examples to construct minimal test cases, just like one might do when testing something by hand.3 In this example, a minimal test case might be (0, 1) or (1, 0).

In QuickCheck’s manual approach to shrinking, shrinking is modelled by a function that produces possible smaller values from a given value; we package up the generator and the shrinking together4

Primitive generators

As a very simple first example, consider generating boolean values, shrinking True to False:

It is important that values don’t shrink to themselves; when we are trying to find a counter-example, we will shrink the test case until we can’t shrink any more; if a value would shrink to itself, this process would loop indefinitely.

As a slightly more involved example, consider writing a generator for a positive integer in the range (0, hi):

In the generator we simply pick a random value5, and in the shrinker we return half the value and one less than the value. Consider testing the property that “all numbers are less than 12”. If we start with the counter example 72, this will quickly shrink to 38, then 18, and then shrink more slowly to 17, 16, 15, 14, 13 and finally 12, which is indeed the minimal counter-example. (Note that a more realistic version of shrinkWord will try numbers in a different order for improved efficiency6.)

Generating pairs

Although Gen is a monad, Manual is not (indeed, it’s not even a functor). When we compose Manual instances together we must manually compose the generator (easy, since we have a Monad interface available) and the shrinker (harder). For example, here is a generator for pairs:

First attempting to shrink the left element introduces a slight bias. For example, consider again the example “for all pairs (x, y), the sum x + y is zero”. Starting from a counter-example (9, 11), due to this bias shrinking will shrink the first component

(9,11) ⇝ (4,11) ⇝ (2, 11) ⇝ (1, 11) ⇝ (0, 11)

and then the second component

(0, 11) ⇝ (0, 5) ⇝ (0, 2) ⇝ (0, 1)

Thus, no matter what counter-example we start with, we will always reduce that counter-example to (0, 1), not (1, 0) (unless the original counter-example happens to have a zero in the second component, of course).

In practice however this bias is not usually a concern, however, since we can shrink either the left or the right at every step in the shrinking process.7 For example, consider the property “for all pairs (x, y), x < y”. Starting with the counter example (8, 6), we will first shrink the first component

(8, 6) ⇝ (7, 6) ⇝ (6, 6)

At this point we cannot shrink the left component any further, and so we shrink the right component instead

(6, 6) ⇝ (6, 3)

Now we can shrink the left component again, and shrinking continues in this “interleaved” fashion

(6, 3) ⇝ (3, 3) ⇝ (3, 1) ⇝ (1, 1) ⇝ (1, 0) ⇝ (0, 0)

We’re putting so much emphasis on this ordering because this will become a concern once we start looking at integrated shrinking.

Recursive data types

As a simple example of generating recursive data types, we will consider how to generate lists of an arbitrary length:

The generator is straight-forward: we generate an arbitrary length n, and then use the standard monadic replicateM combinator to generate n elements.

The shrinker is more interesting: not only can we shrink any of the elements of the list, like we did for pairs, but now we can also drop elements from the list altogether. At every step it eithers drops an element or shrinks an element, using the function pickOne to choose an element:

Consider how this shrinker works for the property “all elements of a list are greater than or equal to the length of the list”. Suppose the original counter-example we find is [5,2,65]; this list will shrink as follows:

[5,2,65] ⇝ [2,2,65] ⇝ [1,2,65] ⇝ [1,65] ⇝ [0,65] ⇝ [0]

The length of this list is 3, and so the element that violates the property is 2. However, if we were to drop any element from this list, the length would be become 2, and so no matter which element we would drop, we would not have a counter-example to the properly anymore. We must therefore shrink one of the elements first; mList tries them in order, and so we shrink the first one to 2 and then to 1. At this point we can drop the 2 from the list because the resulting list [1, 65] has length 2 and so the element 1 still violates the property. This process repeats one more time, interleaving dropping elements with shrinking elements, until we reach the minimal counter example [0].

Filtering

The final example we will consider is how to generate elements satisfying a given predicate. We will first define a simple helper function that runs a monadic action as often as needed8 to generate a value satisfying a predicate:

This in hand, we can write a filter combinator as follows:

For the generator we repeat the generator until we hit on an element that satisfies the predicate, and for the shrinker we filter out any shrunk elements that don’t satisfy the predicate.

Although this combinator is not wrong, and occasionally useful, it is not always optimal. Consider using this filter to generate even numbers:

Suppose we are testing the property that “all even numbers are less than 5”, and we start with the counter example 88; this will now shrink as follows:

88 ⇝ 44 ⇝ 22

and then shrink no further. The problem is that 22 can only shrink to either 11 or 21, neither of which are even, and so mSuchThat_ filters both of them out, leaving us with no further shrink steps.

There are two solutions to this problem. One is to define a variant on mSuchThat_ that instead of removing a shrunk value that doesn’t satisfy the predicate, instead shrinks it again, in the hope of finding even smaller values that do satisfy the predicate:

If we use this combinator instead, the same counter example now shrinks

88 ⇝ 44 ⇝ 22 ⇝ 10 ⇝ 8 ⇝ 6

because 22 shrinks to 11 (which is not even) which in turn shrink to 5 (not even) and 10 (even), and we end up with 6, which is indeed the smallest even number which is not less than 5.

The alternative solution is not to use filter at all. Instead of generate-then-test, we can write a generator that produces even numbers by construction by generating any number and then multiplying it by two:

While the generator is simple, the shrinker is not, and we have logic for “evenness” both in the generator and in the shrinker. As we will see later, this is an example where integrated shrinking has clear benefits.

Integrated shrinking

It is now time to turn our attention to integrated shrinking. The key idea is straight-forward enough: instead of having the generator producing a single value, it will instead produce a tree of values. The root of the tree will correspond to the original value produced, the immediate children of the root correspond to the immediate shrink steps from the root, and so on.

where Tree here means “rose tree”: trees with an arbitrary number of children at every step:

For example, here is the tree that corresponds to the shrinker that we defined in mWord, shrinking a value x to half x or x - 1:

5
├─ 2
│  └─ 1
│     └─ 0
└─ 4
   ├─ 2
   │  └─ 1
   │     └─ 0
   └─ 3
      ├─ 1
      │  └─ 0
      └─ 2
         └─ 1
            └─ 0

Primitive examples

The easiest way to write primitive generators (generators not defined in terms of other generators) is to translate from a manual generator to an integrated one, constructing the tree by repeatedly applying the shrink function:

where unfoldTree builds a tree from a root and a function to construct the immediate children of that root:

For example, we can write integrated shrinkers for Bool and Word using

Applicative

For primitive generators integrated shrinking provides little benefit, but once we start composing generators things get more interesting. We can equip Integrated with an Applicative instance:

For pure we just return a singleton tree, but the case for (<*>) is more complicated. After we split the PRNG into two and use it, we end up with a Tree (a -> b) of functions and a Tree a of arguments, and need to construct a tree Tree b of results.

How might we combine these two trees? Remember that these trees are shrink trees: the roots are the unshrunk values, and the subtrees are different ways in which we can shrink those values. Thus, to combine the “left” tree of functions and the “right” tree of arguments, the root of the new tree will combine the unshrunk root of both trees, and then shrink either the function in the left tree or an argument from the right tree, much like we did for pairs above in mPair:

Just like in mPair this has a slight bias because it shrinks the left argument first but, like in mPair, in practice this bias does not matter too much.

Laws. We should verify that this definition of interleave is correct; that is, satisfies the laws for Applicative. This boils down to showing that

In the repository there is a Coq file that verifies these laws. Note that these laws are true whether we shrink the left tree first or the right one.

Example: generating pairs

The above description of the Applicative instance is rather abstract, so let’s consider a concrete example. We can write a combinator for generating pairs using

(Indeed, such combinators are so simple that there is no need to provide them explicitly; the Applicative interface suffices.) Let’s consider what happens when we use this to generate a pair of a boolean and a number. Suppose the boolean we pick is True, which has shrink tree

True
└─ False

and the number we pick is 2, with shrink tree

2
└─ 1
   └─ 0

We first fmap the function (,) over that first tree to end up with the tree

(True,_)
└─ (False,_)

When we then interleave these two trees, the final result is

(True,2)
├─ (False,2)
│  └─ (False,1)
│     └─ (False,0)
└─ (True,1)
   ├─ (False,1)
   │  └─ (False,0)
   └─ (True,0)
      └─ (False,0)

Note how this tree matches our intuition precisely: we start with the unshrunk value (True, 2); this has two immediate children, one first shrinking the bool (False, 2) and one first shrinking the number (True, 1). If we do first shrink the number, we again have the choice to shrink the bool or the number first.

The advantage of the applicative interface is that this is not restricted to pairs, but can be used for any number of elements. For example, we can write a generator for triples using

If we start with the shrink trees

True
└─ False

1
└─ 0

'b'
└─ 'a'

then the final interleaved tree will be

(True,1,'b')
├─ (False,1,'b')
│  ├─ (False,0,'b')
│  │  └─ (False,0,'a')
│  └─ (False,1,'a')
│     └─ (False,0,'a')
├─ (True,0,'b')
│  ├─ (False,0,'b')
│  │  └─ (False,0,'a')
│  └─ (True,0,'a')
│     └─ (False,0,'a')
└─ (True,1,'a')
   ├─ (False,1,'a')
   │  └─ (False,0,'a')
   └─ (True,0,'a')
      └─ (False,0,'a')

Notice how this models that we can pick any element in the triple to reduce at any given moment.

Monad

Using the applicative interface to generate lists of a fixed length is easy. Indeed, we can use a standard combinator on Applicative:9

to define

However, there is no way to use the Applicative interface to write a generator for lists of an arbitrary size. The problem is that in order to do that, we first need to generate the length, and then depending on the value n of the length that we pick, run the generator for the elements n times. This kind of dependency between generators is impossible using only an Applicative interface; instead, we need a Monad interface. This is however where trouble starts.

Joining trees

Suppose we have shrink tree corresponding to the length of the list

and a function

for some a that produces a shrink tree for the list itself given a length. The natural thing to try is to apply function f at every length

This gives us a tree of trees: for every value n in len, the corresponding shrink tree for lists of length n. The only thing left to do is to collapse this tree-of-trees into a tree. This is a standard combinator on monads called join. For trees, we can implement it as follows:

Laws. As for the Applicative interface, join should satisfy a number of laws:

The Coq file in the repo contains proofs of these properties.

However, we will not equip Integrated with a monad instance. In order to understand why, let’s suppose that it did have a monad instance. We could then write this alternative definition of iPair:

This looks deceptively simple, and almost identical to iPair, but iPair and iPairWRONG have very different behaviour. Starting from the tree ((,) <$> genA)

we get the tree

(True,2) ⇝ (True,1) ⇝ (True,0)
└─ (False,2) ⇝ (False,1) ⇝ (False,0)

which after join looks like

Monad                  Applicative
-----------------------------------------
(True,2)               (True,2)
├─ (False,2)           ├─ (False,2)
│  └─ (False,1)        │  └─ (False,1)
│     └─ (False,0)     │     └─ (False,0)
└─ (True,1)            └─ (True,1)
   └─ (True,0)            ├─ (False,1)
                          │  └─ (False,0)
                          └─ (True,0)
                             └─ (False,0)

where for comparison we have reproduced the shrink tree we got using the Applicative interface on the right. Notice the difference between the two trees: after we shrink the number, we do not go back to shrink the boolean anymore. This is important to remember: in a generator of the form

x >>= f

as soon as we start shrinking f we will not go back to shrink x. We can’t; since f is a function, we must first decide on a value of x before we can do anything with f x.

This has real consequences for testing. For example, consider the property that “for all pairs (x, y), x < y”. If we start with a counter example (80, 57), this will shrink as follows:

(80, 57) ⇝ (79, 57) ⇝  .. ⇝  (57, 57) ⇝ (57, 28) ⇝ .. ⇝  (57, 0)

When we reach (57, 57), we cannot shrink the first component anymore, since (56, 57) isn’t a counter-example to the property. However, as soon as we start strinking the second component, we will not go back to the first component anymore, and so we end up with (57, 0) as our rather poor “minimal” counter-example.

We have a choice in join in the order of the subtrees; we could have defined it like

For our example tree from above, this results in the tree

(True,2)
├─ (True,1)
│  └─ (True,0)
└─ (False,2)
   └─ (False,1)
      └─ (False,0)
Although this version of join still satisfies the monad laws, it is strictly worse. As before, when shrinking x >>= f, as soon as we start shrinking f, we will not go back anymore to shrink x. But worse, rather than trying to shrink x first, we will now first try to shrink f! This means that for the same property above, if we started with the counter example (80, 57), we would end up with the counter-example (80, 0). Even less “minimal” than before.

Dependent generators

Let’s go back to writing a generator for lists of an arbitrary length. Let’s suppose we did have a Monad instance available for Integrated. Writing a generator for lists is then easy:

If the belief that integrated shrinking means that we can mostly just forget about shrinking, this definition should be fine. However, it isn’t. Just like for iPairWRONG above, this shrinker shrinks very poorly. Part of the problem is actually very similar as for iPairWRONG. Consider checking the property that “all lists are sorted”, and suppose the initial counter-example we find is [81,27]; this will shrink as follows:

When we reach [28,27] we cannot shrink the first element any further, and so we start shrinking the next, never returning to the first element anymore. This problem is easily fixed though; the elements of the list are clearly independent from each other, and so we can use our iListOfSize function from above instead; this uses the Applicative interface and does not introduce this unwanted ordering between shrinking the elements of the list:

The dependency on the length however is a real dependency, and so cannot be removed. Although iListWRONG' shrinks a bit better than iListWRONG, it can still result in non-minimal counter-examples. For example, if for the same property (all lists are sorted) the initial counter example we find is [28,66,13], we cannot first shrink the length because the shorter list [28,66] is sorted. However, after we then start shrinking the elements of the list we never go back to try to shrink the length again, ending up with the non-minimal counter example [0,1,0].

Freezing the tree

So how do we fix it? If the implied shrinker for dependent generators is not good, we need a way to override it and define our own. In order to do that, we need to be able to manipulate the shrink trees explicitly. We therefore introduce a useful function called freeze:

This changes an integrated shrinker into a simple (“manual”) shrinker for trees; this is the key step that makes it possible to manipulate shrink trees explicitly. We will also find it useful to define a variant on freeze which just throws away any subtrees, leaving only the unshrunk root:

We can use these two combinators to define a explicit generator for lists of trees:

This is almost identical to our naive first attempt iListWRONG, but produces a list of shrink trees instead of a list of elements. In order to turn this into a proper shrink tree for lists of elements, we need to turn a list of trees into a tree of lists, and this operation corresponds precisely to the shrinker for lists that we defined back when we considered generating lists in the manual approach (mList, above):

All that’s left now is to turn a dependent generator that explicitly manipulates shrink trees back into a regular integrated generator:

leaving us with the following generator for lists:

Derived generators

At this point you might be wondering whether all of this is worth it; the combination of Integrated and freeze seems like a round-about way to introduce manual shrinking; all that just to get back to where we were in mini-QuickCheck. That is however overly pessimistic.

Suppose we have some application-specific datatype that counts elements:

and suppose we want to write a generator for this. One approach is to apply the same set of steps that we did in the previous section. We can write a function

and specialized shrinking function for Count

and finally

This is however a fair bit of work, interleaveCount in particular is non-trivial (see repo). However, if we have a function

we can avoid all this work and piggy-back on the generator for lists:

and we’re done with a one-line generator. This is much more difficult to do in QuickCheck. Although it is easy to generate a list and then generate the Count from that, when it comes to shrinking we don’t have the list available anymore and so we can’t piggy-back on the shrinker for lists there. We can introduce what’s known as a “shrink wrapper”

which pairs a Count value with the list that generated it; if we do that, then we can piggy-back on the generator and shrinker for lists:

However, this approach does not compose: if we have some other datatype that expects a Count as a subterm, we cannot use a WrapCount term there. This limits the applicability of this pattern rather severely. In integrated shrinking, however, this is really easy to do; a clear win.

Caution. Defining the generator for Count in terms of the one for lists is really only a valid approach if all Count values can be generated by some list. If this is not the case, your tests don’t cover all cases. This should be stated and tested separately.

Monad instance

What if we don’t care about shrinking, or feel that the implied shrinker is okay, even for dependent generators? Just to support this use case we can introduce Dependent alias which does have the Monad instance available10:

where the corresponding Applicative instance is the implied one.

When we define dependent generators we often want to use integrated ones, and so it is useful to “lift” an integrated shrinker to a dependent one:

Going in the other direction however is unsafe, as we have seen; unless we take special precautions, the implied shrinker behaviour of dependent shrinkers is very poor:

Filtering

As our final example, we will reconsider filtering in the context of integrated shrinking. Like generating lists, filtering requires a Monad interface. After all, the effects we need (how often we generate a value) depends on the value that we generated previously. If we did have a Monad instance for Integrated available, we could simply define

As for iListWRONG, this function looks simple, but as for iListWRONG, it is wrong; and in fact, this one is unuseably wrong. Remember what the monad interface does: it applies a function at every level in the shrink tree. In iSuchThatWRONG, the function we apply is a function that checks the predicate and if it fails, reruns the generator. This means that as soon as we shrink a value to something that does not satisfy the predicate anymore, we start over from scratch, pick an entirely new value (possibly even larger than what we started with), and repeat ad nauseam.

What we want to do, of course, is first generate the shrink tree, and then filter out the elements from that tree that don’t satisfy the predicate. When we discussed this in the context of manual shrinking, we mentioned that we had two possibilities: either stop as soon as we find an element that doesn’t satisfy the predicate, or else recursively apply shrinking in the hope of finding a even smaller element that does satisfy the predicate. Translated to trees, the former corresponds to

and the latter to

Defining filtering is now easy; since we want to explicitly manipulate the shrink tree, freeze comes in handy again:11

As an example use case, consider once more generating even numbers. As for manual shrinking, we have two options: generate-then-test or generate-even-by-construction. For the former, we can do

where we must use iSuchThat instead of iSuchThat_ (for the same reason that mEvenWRONG was wrong). For the latter, we can do

If we compare this to mEven', we can see that integrated shrinking here again gives us a clear advantage. In the manual case we had to reason about “evenness” in both the shrinker and the generator; no such duplication of logic happens here.

Conclusions

In this blog post we have compared the manual approach to shrinking from QuickCheck with the integrated approach from Hedgehog. There are many other differences between these two libraries that we have completely ignored here, and I can strongly recommend watching Jacob Stanley’s excellent Lambda Jam talk Gens N’ Roses: Appetite for Reduction. Even if you have no intention of switching from QuickCheck to Hedgehog, many of the gotchas of QuickCheck that Jacob mentions in that talk are well worth thinking about.

One of the problems that Jacob mentions in his talk is a social one: most QuickCheck users simply don’t write shrinkers. Indeed, this is true. Part of the problem comes from the fact that the QuickCheck type class Arbitary has a default implementation of shrink that returns the empty list. This means that by default the values you generate don’t shrink at all. This is clearly not good.

Unfortunately, it is not obvious that integrated shrinking solves this social problem. As we have seen, the implicitly defined shrinkers for dependent generators (generators that require the monad interface) are very poor (iListWRONG), and in some cases even unuseable (iSuchThatWRONG). It simply isn’t the case that we don’t have to think about shrinking, no matter which approach we use. Perhaps it can be argued that a default shrinker that shrinks a bit is better than one that doesn’t shrink at all; but not by much. Admittedly for generators that depend on the Applicative interface only the implied shrinker is fine, but this case is easy in QuickCheck also (it corresponds precisely with the behaviour of genericShrink).

Perhaps we can construct integrated generators with good shrinkers by writing them in clever ways. It is however not obvious how to do this, even for the relatively simple case of lists. This is an interesting topic of future work.


  1. In reality System.Random is a poor choice, and we should choose something different such as splitmix.

  2. A more obvious type for the generator might have been

    which would allow us to thread the PRNG through. We don’t do this because we would lose laziness; for example, when generating a pair of values; we would not be able to generate the second value until we finished generating the first. This can make testing much slower, and makes it impossible to generate infinite values.

  3. To some degree we can reduce the need for shrinking by trying small counter examples first; both QuickCheck and Hedgehog do this, though Hedgehog’s approach using first-class “ranges” is arguably nicer here. However, this is not sufficient. It is often the case that the probability that a larger test case hits a given bug is disproportionally larger than the probability that a small test case does, and we are therefore more likely to find bugs in larger test cases than smaller ones.

  4. QuickCheck uses type-classes instead of explicit records; for simplicity and to keep the comparison with Hedgehog as focussed as possible, we will not do that in this blog post.

  5. Picking a random value uniformly in the range (0, hi) might not be the best choice; we may wish to generate “edge cases” such as 0 with a higher probability. Moreover, if we want to generate smaller test cases first, we’d also do that here.

  6. This shrinker is sub-optimal; it will use binary search if the minimum test case happens to be near zero, but linear search if the value happens to be be near the upper end of the range. The shrinkers in this blog post are intended to illustrate how QuickCheck and Hedgehog work under the hood, not as examples of how to write good shrinkers.

  7. Sometimes the bias is a problem. For example, consider the property “for all pairs (x, y), x /= y”. If we start with a counter-example, say, (46, 46), we could only shrink this if we shrink both components at the same time. We can write shrinkers like this, but in general there are O(2^n) possible combinations of values to choose to shrink together when given n values, which makes shrinking much too costly.

  8. In reality we will want to impose a maximum number of iterations here and give up if we cannot find an element satisfying the predicate within that bound.

  9. In recent versions of base the function replicateM has this signature; we define this custom combinator for the sake of this blog post because the difference between the Monad and Applicative interface to the integrated shrinkers is crucial.

  10. In Hegdehog no distinction is made at the type level between generators satisfying the Applicative instance and the Monad instance, and so it is up to the programmer to make sure not to use the Monad instance where the Applicative instance would suffice, or overwrite the shrinker when the Monad instance is required. This can result in poor shrinkers, a problem which might materialize only much later if a bug is found and suddenly a test case does not shrink properly.

  11. The use of head and fromJust in these definitions is justified by the fact that we know that the very root of the tree must satisfy the predicate.