Consider this `falsify`

property test that tries to verify the (obviously false) property
that all elements of all lists of up to 10 binary digits are the same (we will explain the details below;
hopefully the intent is clear):

```
prop_list :: Property ()
= do
prop_list <- gen $ Gen.integral $ Range.between (0, 10)
n <- gen $ replicateM n $ Gen.int $ Range.between (0, 1)
xs $ P.pairwise P.eq .$ ("xs", xs) assert
```

we might get a counter-example such as this:

```
failed after 9 shrinks
(xs !! 0) /= (xs !! 1)
xs : [0,1]
xs !! 0: 0
xs !! 1: 1
```

More interesting than the counter-example itself is how `falsify`

*arrived*
at that counter-example; if we look at the shrink history (`--falsify-verbose`

),
we see that the list shrunk as follows:

```
[1,1,0,1,0,1]
~> [1,1,0] -- shrink the list length
~> [0,1,0] -- shrink an element of the list
~> [0,1] -- shrink the list length again
```

The test runner is able to go back and forth between shrinking the length and the list, and
shrinking elements *in* the list. That is, we have integrated shrinking (like in
`hedgehog`

: we do not specify a separate generator and shrinker), which is
*internal*: works across monadic bind. The Python
Hypothesis library showed the world how to achieve
this. In this blog post we will introduce falsify, a new
library that provides property based testing in Haskell and has an approach to
shrinking that is inspired by `Hypothesis`

. As we shall see, however, the details
are quite different.

## Background

In this first section we will discuss some of the background behind `falsify`

;
the next section will be a more tutorial-style introduction on how to use
it. This section is not meant to an exhaustive discussion of the theory behind
`falsify`

, or how the theory differs from that of `Hypothesis`

; both of those
topics will be covered in a paper, currently under review. However, a basic
understanding of these principles will help to use the library more effectively,
and so that will be our goal in this first section.

### Unit testing versus property based testing

In unit testing (for example using tasty-hunit), a test
for a function `f`

might look something like this:

```
test :: Assertion
=
test == expected) $
unless (f input "not equal" assertFailure
```

That is, we apply `f`

to specific `input`

, and then verify that we get an
`expected`

result. By contrast, in property based testing, we do not specify a
specific input, but instead *generate* a random input using some generator
`genInput`

, and then verify that the input and the output are related by some
*property* `prop`

:

```
test_property :: Property ()
= do
test_property <- gen $ genInput
input $
unless (prop input (f input)) "property not satisfied" testFailed
```

This blog post is not intended as an introduction to property-based testing;
merely observe that *generation* of input values is a critical ingredient in
property based testing. However, if you are not familiar with the topic, or not
yet convinced that you should be using it, I can highly recommend watching
Testing the Hard Stuff and Staying Sane, and then reading
How to Specify It!: A Guide to Writing Properties of Pure
Functions, both by the world’s foremost property-based
testing guru John Hughes.

### The importance of shrinking

Suppose we want to test the (false) property that for all numbers `x`

and `y`

,
`x - y == y - x`

:

```
prop_shrinking :: Property ()
= do
prop_shrinking <- gen $ Gen.int $ Range.between (0, 99)
x <- gen $ Gen.int $ Range.between (0, 99)
y - y == y - x) $
unless (x "property not satisfied" testFailed
```

Since the property is false, we will get a counter-example. Without shrinking, one such a counter-example might be

```
x = 38
y = 23
```

However, although that is indeed a counter-example to the property, it’s not a great counter-example. Why these specific numbers? Is there something special about them? To quote John Hughes in Experiences with QuickCheck: Testing the Hard Stuff and Staying Sane:

Random tests contain a great deal of junk—that is their purpose! Junk provokes unexpected behaviour and tests scenarios that the developer would never think of. But tests usually fail because of just a few features of the test case. Debugging a test that is 90% irrelevant is a nightmare; presenting the developer with a test whereevery part is known to be relevantto the failure, simplifies the debugging task enormously.

(emphasis mine). For our example, the numbers 38 and 23 are not particularly relevant to the failure; with shrinking, however, the counter-example we will get is

```
x = 0
y = 1
```

Indeed, this is the *only* counter-example we will ever get: `0`

is the smallest
number possible (“least amount of detail”), and the only thing that is relevant
about the second number is that it’s not equal to the first.

### Parsing versus generation

Generation of inputs relies on pseudo-random number generators (PRNGs). The typical interface to a PRNGs is something like this:

`nextSample :: PRNG -> (Word, PRNG)`

Given such an interface, we might define the type of generators as

`newtype Gen a = Gen (PRNG -> (a, PRNG))`

This covers generation, but not shrinking. The traditional approach in QuickCheck to shrinking is to pair a generator with a shrinking function, a function of type

`shrink :: a -> [a]`

This works, but it’s not without its problems; see my blog post Integrated
versus Manual Shrinking for an in-depth discussion.
The key insight of the `Hypothesis`

library is that instead of shrinking generated
*values*, we instead shrink the *samples produced by the PRNG*. Suppose we
unfold a PRNG to a stream of random samples:

```
unfoldLinear :: PRNG -> [Word]
=
unfoldLinear prng let (s, prng') = next prng
in s : unfoldLinear prng'
```

Then we can shift our perspective: rather than thinking of *generating* random
values from a PRNG we instead *parse* this stream of random samples:

`newtype Parser a = Parser ([Word] -> (a, [Word])`

Instead of having a separate shrinking function, we now simply shrink the list
of samples, and then re-run the parser. This is the `Hypothesis`

approach in a
nutshell; parsers of course need to ensure that the produced value shrinks as
the samples are shrunk. For example, here is a very simple (proof of concept)
generator for `Bool`

:

```
parseBool :: Parser Bool
= Parser $ \(s:ss) -> (
parseBool if s >= maxBound `div` 2 then True else False
, ss )
```

Assuming that the sample is chosen uniformly in the full `Word`

range, this
parser will choose uniformly between `True`

and `False`

; and as the sample
is shrunk towards zero, the boolean will shrink towards `False`

.

### Streams versus trees

If you look at the definition of `Gen`

in `QuickCheck`

,
you will see it’s actually different to the definition we showed above:

`newtype Gen a = Gen (PRNG -> a)`

Like our definition above, this generator takes a PRNG as input, *but it does
not return an updated PRNG*. This might seem confusing: suppose we are
generating two numbers, as in our example above; how do we ensure those two
numbers are generated from different PRNGs?

To solve this problem, we will need a PRNG that in addition to `next`

, also
provides a way to *split* the PRNG into two new PRNGs:

```
next :: PRNG -> (Word, PRNG) -- as before
split :: PRNG -> (PRNG, PRNG) -- new
```

Then to run two generators, we first split the PRNG:

```
both :: QcGen a -> QcGen b -> QcGen (a, b)
QcGen g1) (QcGen g2) = QcGen $ \prng ->
both (let (l, r) = split prng
in (g1 l, g2 r)
```

The advantage of this approach is *laziness*: we can produce the second value of
type `b`

without generating the value of type `a`

first. Indeed, if we never
demand the value of `a`

, we will not generate it at all! This is of critical
importance if we have generators for infinite values; for example, it is what
enables us to Generate Functions.

### The `falsify`

definition of `Gen`

If we apply the insight from `Hypothesis`

(that is, parse samples rather than
generate using PRNGs) to this new setting where *splitting* PRNGs is a
fundamental operation, we arrive at the definition of `Gen`

in `falsify`

. First,
unfolding a PRNG does not give us an infinite *stream* of samples, but rather an
infinite *tree* of samples:

```
data STree = STree Word STree STree
unfold :: PRNG -> STree
=
unfold prng let (s, _) = next prng
= split prng
(l, r) in STree s (unfold l) (unfold r)
```

A generator is then a function that takes a part of a sample tree, parses it, and produces a value and an updated sample tree:

`newtype Gen a = Gen (STree -> (a, [STree]))`

This does not reintroduce dependencies between generators: each generator will be run against a different subtree, and update only that subtree. For example, here is how we might run two generators:

```
both :: Gen a -> Gen b -> Gen (a, b)
Gen g1) (Gen g2) = Gen $ \(STree s l r) ->
both (let (a, ls) = g1 l
= g2 r
(b, rs) in ( (a, b)
STree s l' r | l' <- ls]
, [++ [STree s l r' | r' <- rs]
)
```

Note that we are focussing on the core concepts here, and are glossing over
various details. In particular, the actual definition
in `falsify`

has an additional constructor `Minimal`

, which is a finite
representation of the infinite tree that is zero everywhere. This is a key
component in making this work with infinite data structures; see upcoming paper
for an in-depth discussion. Users of the library however generally do not need
to be aware of this (indeed, the sample tree abstraction is not part of the
public API).

### Consequences of using sample trees

Arguably all of the key differences between `Hypothesis`

and `falsify`

stem from
the difference in representation of samples: a linear *stream* in `Hypothesis`

and an infinite *tree* in `falsify`

. In this section we will discuss two
consequences of this choice.

#### Shrinking the sample tree

First, we need to decide how to shrink a sample tree. In `Hypothesis`

, the sample
stream (known as a “choice sequence”) is subjected all kinds of passes (15 and
counting, according to Test-Case Reduction via Test-Case Generation: Insights
from the Hypothesis Reducer), which shrink the sample stream
according to lexicographical ordering; for example:

```
..¸ x, .. < .., x', .. -- shrink an element (x' < x)
.., x, y, z, .. < .., x, z, .. -- drop an element from the stream
.., x, y, z, .. < .., y, z, x, .. -- sort part of the stream (y < z < x)
```

When we are dealing with infinite sample trees, such a total ordering does not exist. For example, consider the following two trees:

```
= STree .. tree2 = STree ..
tree1 STree 1 ..) (STree 2 ..)
(STree 4 ..) (STree 3 ..) (
```

Sample `1`

in `tree1`

is *less* than the corresponding sample `2`

in `tree2`

,
but sample `4`

in `tree1`

is *greater* than the corresponding sample `3`

in
`tree2`

. Hence, we have neither `tree1 < tree2`

nor `tree2 < tree1`

: these two
trees are incomparable. Instead, `falsify`

works with a partial ordering;
instead of the multitude of shrinking passes of `Hypothesis`

, `falsify`

has
precisely *one* pass^{1}: shrink an individual sample in the tree.

#### Distributing samples to parsers

When we have a stream of values that we need to use for multiple parsers, we
need to decide which samples go to which parser. In `Hypothesis`

, this
essentially happens on a first-come-first-served basis: any samples left unused
by the first parser will be used by the next. As discussed, `falsify`

parsers do
not return “samples left unused”. Instead, the sample tree is split each time we
compose parsers, like we did in `both`

, shown above. In practice, this happens
primarily when using applicative `<*>`

or monadic `>>=`

.

#### Predictability

These two differences are rather technical in nature; how do they affect users? Suppose we have a generator that produces a list and then a number:

```
listThenNum :: Gen ([Bool], Int)
= do
listThenNum <- Gen.list ..
xs <- Gen.int ..
n return (xs, n)
```

If we are using a stream of samples, `Hypothesis`

style, and then drop a random
sample from that stream, the generator for `int`

might suddenly be run against
an entirely *different* sample; it might *increase* in value! Similarly, if we
run that `int`

generator against the first sample left over by the `list`

generator, and if that list generator uses fewer samples as it shrinks, we might
also run `int`

against an unrelated sample, and its value might again increase.

This is not *necessarily* a problem; after all, we can then start to decrease
that new `int`

value again. However, that is *only* possible if the generated
value with the larger `int`

is still a counter-example to whatever property is
being tested. If that is not the case, then we might not be able to shrink the
list, and we might end up with a non-minimal counter-example. That can make
debugging more difficult (we haven’t gotten rid of all the “junk”), and it can
be difficult for users to understand *why* this might not shrink any further;
even if the library offers facilities for showing why shrinking stopped (for
example, showing which shrunk examples were rejected; verbose mode in
`falsify`

), it can still be quite puzzling why the library is trying to
*increase* a value during shrinking.

Neither of these problems can arise in `falsify`

: it never drops samples at all
(instead, only shrinking individual samples), and since monadic bind splits the
sample tree, we are guaranteed that the behaviour of `int`

is entirely
unaffected by the behaviour of `list`

. This makes the shrinking behaviour in
`falsify`

more predictable and easier to understand.^{2}

### Monadic bind

We mentioned above that `QuickCheck`

’s approach to shrinking has its problems,
without going into detail about what those problems are. Instead, we referred to
the blogpost Integrated versus Manual Shrinking; this
blog post discusses not only the problems in `QuickCheck`

, but also shows one
alternative approach, known as *integrated* shrinking, used by QuviQ
QuickCheck and made popular in the Haskell
world by the library hedgehog.

The problem with integrated shrinking is that it does not work across monadic bind. The linked blogpost explains this in great detail, but the essence of the problem is not hard to see. Consider the type of monadic bind:

`(>>=) :: Gen a -> (a -> Gen b) -> Gen b`

We *cannot* shrink the right hand side of `(>>=)`

independent from the left hand
side, because the right hand side *is not a generator*. We only have a generator
once we apply the supplied function to the result of the first generator. This
means that we cannot shrink these two generators independently: if, after
shrinking the right hand side, we go back and then shrink the left hand side, we
get an entirely different generator, and the shrinking we did previously is just
wasted.

In practice what this means is that once we start shrinking the right hand side, we will never go back anymore and shrink the left hand side. In the example from the introduction we first generated a list length, and then the elements of the list:

```
prop_list :: Property ()
= do
prop_list <- gen $ Gen.integral $ Range.between (0, 10)
n <- gen $ replicateM n $ Gen.int $ Range.between (0, 1)
xs $ P.pairwise P.eq .$ ("xs", xs) assert
```

With integrated shrinking, once we start shrinking elements from the list, we
will never go back anymore and shrink the list length. With internal shrinking,
however, we *can* go back and forth across monadic bind. This is the *raison
d’être* of internal shrinking: it doesn’t matter that we cannot shrink the two
generators independently, *because we are not shrinking generators*! Instead, we
just shrink the samples that feed *into* those generators.

### Selective functors

It is important to understand the limitations of internal shrinking: it is
certaintly not a silver bullet. For example, consider this combinator that takes
two generators, flips a coin (generates a boolean, shrinking towards `True`

),
and then executes one of the two generators:

```
choose :: Gen a -> Gen a -> Gen a -- Suboptimal definition
= do
choose g g' <- Gen.bool True
b if b then g else g'
```

This combinator works, but it’s not optimal. Suppose the initial value of `b`

is
`False`

, and so we use `g'`

; and let’s suppose furthermore that we spend some
time shrinking the sample tree using `g'`

. Consider what happens if `b`

now
shrinks to `True`

. When this happens we will now run `g`

*against the sample
tree as it was left after shrinking with g'*. Although we

*can*do that, it very much depends on the specific details of

`g`

and `g'`

whether it’s *useful*to do it, and we will certainly lose the predictability we discussed above.

We could try to make the two generators shrink independent from each other by
simply running *both* of them, and using the boolean only to choose which
result we want. After all, Haskell is lazy, and so this should be fine:

```
choose :: Gen a -> Gen a -> Gen a -- Bad definition!
= do
choose g g' <- g
x <- g'
y <- Gen.bool True
b return $ if b then x else y
```

While is is true that *generation* using this definition of `choose`

will work
just fine (and laziness ensures that we will in fact only run whatever generator
is used), this combinator *shrinks* very poorly. The problem is that if we
generate a value but they don’t use it, the (part of) the sample tree that we
used to produce that value is irrelevant, and so by definition we can always
replace it by the sample tree that is zero *everywhere*. This means that if we
later want to switch to that generator, we will *only* be able to do so if the
absolute *minimum* value that the generator can produce happens to work for
whatever property we’re testing. This is an important lesson to remember:

Do not generate values and then discard them: such values will always shrink to their minimum. (Instead, don’t generate the value at all.)

To solve this problem, we need to make it visible to the library when we need a generator and when we do not, so that we it can avoid shrinking that part of the sample tree while the generator is not in use. Selectively omitting effects is precisely what selective applicative functors give us. A detailed discussion of this topic would take us well outside the scope of this blog post; in the remainder of this section we will discuss the basics only.

`Gen`

is a selective functor, which means that it is an instance of
Selective, which has a single method called `select`

:

`select :: Gen (Either a b) -> Gen (a -> b) -> Gen b`

The intuition is that we run the first generator; if that produces `Left a`

, we
run the second generator to get a `b`

; if the first generator produces `Right b`

, we skip the second generator completely. Like for applicative `<*>`

and
monadic `(>>=)`

, the two generators are run against different subtrees of the
sample tree, but the critical difference is that we will not try to shrink the
right subtree for the second generator unless that generator is used.

If that all sounds a bit abstract, perhaps suffices to say that any selective functor supports

`ifS :: Selective f => f Bool -> f a -> f a -> f a`

which we can use to implement `choose`

in a way that avoids reusing the sample
tree of the first generator for the second:

```
choose :: Gen a -> Gen a -> Gen a
= ifS (bool True) choose
```

Indeed, this is precisely the definition in the `falsify`

library
itself.

## Tutorial

With the background out of the way, let’s now consider how to actually use the
library. Probably the easiest way to get started is to use the `tasty`

integration. Here is a minimal template to get you started:

```
module Main (main) where
import Test.Tasty
import Test.Tasty.Falsify
main :: IO ()
= defaultMain $ testGroup "MyTestSuite" [
main "myFirstProperty" prop_myFirstProperty
testProperty
]
prop_myFirstProperty :: Property ()
= return () prop_myFirstProperty
```

This depends on tasty package, as well as
falsify of course. If you want, you can also use the
Test.Falsify.Interactive module to experiment with
`falsify`

in `ghci`

.

### Getting started

Suppose we want to test that if we multiply a number by two, the result must be even. Here’s how we could do it:

```
prop_multiply2_even :: Property ()
= do
prop_multiply2_even <- gen $ Gen.int $ Range.withOrigin (-100, 100) 0
x even (x * 2)) $ testFailed "not even" unless (
```

Some observations:

`Property`

is a monad, so the usual combinators (such as unless) for monads are available- gen runs a generator, and adds the output of the generator to the test log. (The test log is only shown when the property fails.)
- Gen.int is an alias for
Gen.integral, which can produce values for any
Integral type. There is no analogue of
`QuickCheck`

’s Arbitrary class in`falsify`

: like in`hedgehog`

and in`Hypothesis`

, every generator must be explicitly specified. For a justification of this choice, see Jacob Stanley’s excellent Lambda Jam 2017 presentation Gens N’ Roses: Appetite for Reduction (Jacob is the author of`hedgehog`

). - The specified Range tells the generator two things: in which range to produce a value, and how to shrink that value. In our example, withOrigin takes an “origin” as explicit value (here, 0), and the generator will shrink towards that origin.
- testFailed is the primitive way to make a test fail, but we shall see a better way momentarily.

### Predicates

Suppose we mistakingly think we need to multiply a number by *three* to get
a even number:

```
prop_multiply3_even :: Property ()
= do
prop_multiply3_even <- gen $ Gen.int $ Range.withOrigin (-100, 100) 0
x even (x * 3)) $ testFailed "not even" unless (
```

If we run this test, we will get a counter-example:

```
multiply3_even: FAIL
failed after 14 shrinks
not even
Logs for failed test run:
generated 1 at CallStack (from HasCallStack):
gen, called at demo/Demo/Blogpost.hs:217:10 in main:Demo.Blogpost
```

This counter-example is not awful: it gives us the counter-example (1), and that
counter-example is minimal. We can however do much better; the idiomatic way in
`falsify`

to test properties of values is to use a
Predicate. A predicate of type

`Predicate '[a, b, c, ..]`

is essentially a function

`-> b -> c -> .. -> Bool a `

but in such a way that it can produce a meaningful message if the predicate is not satisfied. Here’s how we might use it for our example:

```
prop_multiply3_even_pred :: Property ()
= do
prop_multiply3_even_pred <- gen $ Gen.int $ Range.withOrigin (-100, 100) 0
x $ P.even `P.dot` P.fn ("multiply3", (* 3)) .$ ("x", x) assert
```

Some comments:

- P.even, like
`even`

from the prelude. is a predicate that checks its argument is even - P.dot, like
`(.)`

from the prelude, composes a predicate with a function. In addition to the function itself, you also specify the*name*of the function, so that that name can be used in error messages. - [(.
)][docs:falsify:dollar], like `( )` from the prelude, applies a predicate to a named argument.

The use of predicates is not required, but can be very helpful indeed. For our running example, this will produce this test failure message instead:

```
multiply3_even_pred: FAIL
failed after 2 successful tests and 13 shrinks
not (even (multiply3 x))
x : 1
multiply3 x: 3
```

### Ranges, Labelling

We saw the use of `withOrigin`

already, and earlier in this blog post we
used between; a generator such as

`$ Range.between (10, 100) Gen.integral `

will produce a value between 10 and 100 (inclusive), shrinking towards 10; it is also possibly to flip the two bounds to shrink towards 100 instead.

The other very useful `Range`

constructor is skewedBy.
A generator such as

`$ Range.skewedBy 5 (0, 100) Gen.integral `

will produce values between (0, 100), like `between`

does, but skewed towards
zero; a negative skew value will instead skew towards 100 (but still shrink
towards zero). As an example use case, suppose that for a certain property we
need a list of `Int`

and an `Int`

, and sometimes that separate `Int`

should be a
member of the list, sometimes not:

```
prop_skew :: Double -> Property ()
= do
prop_skew skew <- gen $ Gen.list rangeListLen $ Gen.integral rangeValues
xs <- gen $ Gen.integral rangeValues
x "elem" [x `elem` xs]
collect where
rangeValues :: Range Word
rangeListLen,= Range.between (0, 10)
rangeListLen = Range.skewedBy skew (0, 100) rangeValues
```

This example is a property that always passes, but we use
collect to collect some statistics; specifically, in
what percentage of the tests `x`

is an element of `xs`

. If we run this with
a skew of 0, we might see something like:

```
100000 successful tests
Label "elem":
94.6320% False
5.3680% True
```

In only 5% of cases the element appears in the list. There are various ways in which we could change that distribution of test data, but the simplest way is simply to generate more values towards the lower end of the range; if we run the test with a skew of 5 we get

```
Label "elem":
41.8710% False
58.1290% True
```

### Generators

Nearly all generators are built using prim as their basic
building block, which returns the next sample from the sample tree. Higher-level
generators split into two categories: “simple” (non-compound) generators that produce a value given some arguments, and generator *combinators* that take generators as input and produce new generators. Some important examples in the first category are:

- integral, which we discussed already
- bool, which produces a
`Bool`

, shrinking towards a choice of`True`

or`False`

- elem, which picks a random element from a list, and shuffle, which shuffles a list
- etc.

The library also offers a number of *generator combinators*; here we list the
most important ones:

choose we saw when we discussed Selective functors, and chooses (uniformly) between two generators, shrinking towards the first.

list takes a range for the list length and a generator and produces a list of values. Unlike the simple “pick a length and then call

`replicateM`

” approach from the example from the introduction, this generator can drop elements anywhere in the list (it does this by using the combinator mark to mark elements in the list; as the marks shrink towards “drop”, the element is removed, up to the specified`Range`

).frequency, similar to the like-named function in

`QuickCheck`

, takes a list of generators and frequencies, and chooses a generator according to the specified frequency. This is another way in which we can tweak the distribution of test data.The implementation of

`frequency`

ensures that the generators can shrink indepedently from each other. This*could*be defined just using`Selective`

, but for improved performance it makes use of a low-level combinator called perturb; see also bindIntegral, which generalizes`Selective`

bindS, and has significantly better performance than`bindS`

.

### Generating functions

One of the most impressive aspects of `QuickCheck`

is that it can generate, show
and shrink functions. This is due to a functional pearl by Koen Claessen called
Shrinking and showing functions; the presentation is
available on YouTube and is well worth a watch. We
have adapted the `QuickCheck`

approach (and simplified it slightly) to `falsify`

;
the generator is called fun. Here is an example
(Fn is a pattern synonym; you will essentially always need it
when generating functions):

```
prop_fn1 :: Property ()
= do
prop_fn1 Fn (f :: [Int] -> Bool) <- gen $ Gen.fun $ Gen.bool False
$
assert
P.eq`P.on` P.fn ("f", f)
.$ ("x", [1, 2, 3])
.$ ("y", [4, 5, 6])
```

This property says that for any function `f :: [Int] -> Bool`

, if we apply that
function to the list `[1, 2, 3]`

we must get the same result as when we apply
it to the list `[4, 5, 6]`

. Of course, that is not true, and when we run this
test, `falsify`

will give us a counter-example:

```
failed after 53 shrinks
(f x) /= (f y)
x : [1,2,3]
y : [4,5,6]
f x: True
f y: False
Logs for failed test run:
generated {[1,2,3]->True, _->False} at CallStack (from HasCallStack):
gen, called at demo/Demo/Blogpost.hs:244:32 in main:Demo.Blogpost
```

Notice the counter-example we get: a function that returns `True`

for the list
`[1, 2, 3]`

, and `False`

everywhere else.^{3} It truly is quite astonishing that this
works: *we* can see that the list `[1, 2, 3]`

is special by inspecting the
source code, but of course `falsify`

(or indeed, `QuickCheck`

; this is not
unique to `falsify`

) cannot! Instead, `falsify`

will generate a random
infinitely large description of functions from `[Int] -> Bool`

, that covers
*all* possible input lists, and then start shrinking this description, throwing
away values for inputs it doesn’t need, until a minimal test case remains. Truly
a testament to the power of laziness in Haskell.

For a more realistic example, let’s port an example from Koen Claessen’s
presentation to `falsify`

. This example is testing the (wrong) property that for
all functions `f`

and predicates `p`

,

`map f . filter p == filter p . map f`

In `falsify`

, we might express this as:

```
prop_mapFilter :: Property ()
= do
prop_mapFilter Fn (f :: Int -> Int) <- gen $ Gen.fun genInt
Fn (p :: Int -> Bool) <- gen $ Gen.fun genBool
xs :: [Int] <- gen $ Gen.list (Range.between (0, 100)) genInt
$
assert
P.eq`P.split` (P.fn ("map f", map f), P.fn ("filter p", filter p))
`P.split` (P.fn ("filter p", filter p), P.fn ("map f", map f))
.$ ("xs", xs)
.$ ("xs", xs)
where
genInt :: Gen Int
= Gen.int $ Range.between (0, 100)
genInt
genBool :: Gen Bool
= Gen.bool False genBool
```

We generate a *random* function `f`

, a *random* predicate `p`

, a random list
`xs`

, and then assert the property; and of course, `falsify`

will happily give
us a counter-example:

```
failed after 25 shrinks
(map f (filter p xs)) /= (filter p (map f xs))
xs : [96]
xs : [96]
filter p xs : [96]
map f xs : [0]
map f (filter p xs): [0]
filter p (map f xs): []
Logs for failed test run:
generated {_->0} at CallStack (from HasCallStack):
gen, called at demo/Demo/Blogpost.hs:254:30 in main:Demo.Blogpost
generated {96->True, _->False} at CallStack (from HasCallStack):
gen, called at demo/Demo/Blogpost.hs:255:30 in main:Demo.Blogpost
generated [96] at CallStack (from HasCallStack):
gen, called at demo/Demo/Blogpost.hs:256:20 in main:Demo.Blogpost
```

It generated a function that maps anything to `0`

, a predicate that is
`True`

for `96`

and `False`

for everything else, and a list containing only the
value `96`

; this is indeed a nice counter-example, as the output from the
`assert`

explains.

Side note: in an ideal world that value `96`

would be shrunk too. However, this
would require shrinking *both* the `96`

in the generated list *and* the `96`

in
the generated predicate *at the same time*. Like `QuickCheck`

, `falsify`

never
takes more than one shrink step at once, to ensure that shrinking is `O(n)`

and
avoid exponential explosion. Section Dependencies between commands
of my blog post “An in-depth look at `quickcheck-state-machine`

” discusses this
kind of problem in the context of `quickcheck-state-machine`

.

### Testing shrinking

When we use internal (or indeed, integrated) shrinking, we don’t write a
separate shrinking function, but that doesn’t mean we cannot get shrinking
wrong. Shrinking *never* truly comes for free! As a simple example, consider
writing a generator that produces any value below a given maximum
(essentially, a more limited form of `integral`

). A first attempt might be:

```
below :: Word64 -> Gen Word64
= (`mod` n) <$> Gen.prim below n
```

While this generator does in fact produce values in the range `0 <= x < n`

, it
does not shrink very well! As the value produced `prim`

shrinks, the value
produced by `below`

will cycle. We can discover this by writing a property that
tests the shrinking behaviour of `below`

, using
testShrinkingOfGen:

```
prop_below_shrinking :: Property ()
= do
prop_below_shrinking <- gen $ Gen.integral $ Range.between (1, 1_000)
n $ below n testShrinkingOfGen P.ge
```

This property will fail:

```
failed after 4 successful tests and 14 shrinks
original < shrunk
original: 0
shrunk : 1
Logs for failed test run:
generated 2 at CallStack (from HasCallStack):
gen, called at demo/Demo/Blogpost.hs:281:10 in main:Demo.Blogpost
```

In addition to testing individual shrinking steps, we can also test that for
particular property and generator, we can generate a particular *minimum* using
testMinimum. Let’s consider the naive list generator
from the introduction one more time:

```
naiveList :: Range Int -> Gen a -> Gen [a]
= do
naiveList r g <- Gen.integral r
n replicateM n g
```

Suppose we want to verify that if we use this generator to test the (false)
property that “all elements of all lists are always equal” we should *always*
get either `[0, 1]`

or `[1, 0]`

as a counter-example; after all, those are
the two minimal counter-examples. We could test this as follows:

```
prop_naiveList_minimum :: Property ()
=
prop_naiveList_minimum .$ ("expected", [[0,1], [1,0]])) $ do
testMinimum (P.elem <- gen $ naiveList
xs 0, 10))
(Range.between (0, 1)))
(Gen.int (Range.between (case P.eval $ P.pairwise P.eq .$ ("xs", xs) of
Left _ -> testFailed xs
Right () -> return ()
```

The counter-example reported by `falsify`

is (somewhat shortened):

```
naiveList_minimum: FAIL
failed after 0 shrinks
minimum `notElem` expected
minimum : [0,0,1]
expected: [[0,1],[1,0]]
Logs for failed test run:
generated [0,0,1] at CallStack (from HasCallStack):
gen, called at demo/Demo/Blogpost.hs:294:13 in main:Demo.Blogpost
Logs for rejected potential next shrinks:
** Rejected run 0
generated [] at CallStack (from HasCallStack):
** Rejected run 3
generated [0] at CallStack (from HasCallStack):
** Rejected run 4
generated [0,0] at CallStack (from HasCallStack):
** Rejected run 8
generated [0,0,0] at CallStack (from HasCallStack):
```

This is telling us that the minimum value it produced was `[0, 0, 1]`

, instead
of one of the two lists that we expected. It also tells us what shrink steps
were rejected (because they weren’t counter-examples). This is informative,
because _none of those shrink steps is `[0, 1]`

: the naive list generator,
unlike the real one (which does pass this property) cannot drop elements from
the start of the list.

The `falsify`

test suite uses
testShrinkingOfGen (and its generalization
testShrinking) as well as
testMinimum extensively to test `falsify`

’s own
generators.

### Compatibility

Finally, `falsify`

offers two combinators shrinkWith
and fromShrinkTree which provide compatibility
with `QuickCheck`

style shrinking

`shrinkWith :: (a -> [a]) -> Gen a -> Gen a`

and `hedgehog`

style shrinking

`fromShrinkTree :: Tree a -> Gen a`

respectively. The implementation of these combinators depends on a minor generalization of the sample tree representation; the details are discussed in the paper.

## Conclusions

Shrinking is an essential component of any approach to property based testing.
In the Haskell world, two libraries offered two competing approaches to
shrinking: *manual* shrinking offered by `QuickCheck`

, where users are entirely
responsible for writing shrinkers for their generators, and *integrated*
shrinking, offered by `hedgehog`

. Integrated shrinking is nice, but does not work
well with monadic bind. The Python `Hypothesis`

library taught us how we can have
“internal” shrinking: like in integrated shrinking, we do not write a separate
generator and shrinker, but unlike in integrated shrinking, this approach *does*
work across monadic bind.

The Haskell `falsify`

library takes the core ideas of `Hypothesis`

and applies
them in the context of Haskell. As we have seen, however, the actual details of
how these two libraries work differ quite significantly. The `falsify`

approach is
more suitable to Haskell where we might deal with infinite data structures,
provides the user with more predictable shrinking, and provides the user with
tools for controlling generator independence (through the use of selective
functors).

##### Footnotes

Admittedly, it depends a bit on how you count: we can also replace any tree in a single step by the tree that is zero everywhere, which could be considered a separat “pass”.↩︎

Hypothesis does try to avoid redistributing samples during shrinking as part of it’s “hierarchical delta debugging”, essentially recovering some kind of tree structure, but this is not under the control of the user, and does not provide any guarantees.↩︎

It can also produce

`{[4,5,6]->True, _->False}`

, depending on the seed.↩︎