Getting the most out of ScalaCheck

We’ve been using ScalaCheck property checks at Sharethrough for about a year. For those not familiar, property checks test a condition multiple times, using freshly-generated data on each iteration. While a unit test exercises a very specific scenario, a property check’s ability to generate data broadens the scope of what’s being tested and increases the chance of catching edge cases which a developer never considered.

As a simple example:

1
2
3
4
5
6
7
describe("addition") {
  it("is communicative") {
    forAll { (x: Int, y: Int) =>
      (x + y) should equal (y + x)
    }
  }
}

Notes on conventions

  • Class#member indicates a member on an instance of Class, Class.member indicates an member on the companion object of Class
  • Code examples are using the FunSuite style of ScalaTest

Recommendations

1. Use ScalaCheck’s data generators in unit tests as well

So what’s the problem?

Let’s assume we have a simple case class like:

1
case class Foo(i: Int, d: Double, s: String)

Let’s further assume that we have some set of unit tests which only needs to vary the Foo#i field. The simplest set up of a fixture for that would be to just create the instances directly:

1
2
val f1 = Foo(1, 0.0, "")
val f2 = Foo(2, 0.0, "")

This works but has a couple disadvantages:

  • If the definition of Foo is ever updated to, say, add a new field, then the fixture here must be updated even if the specs themselves are unconcerned with that field.
  • It’s not clear from looking at the fixture that the only portion that is of consequence is the first field.

Generators can help

Luckily we can tap into the functionality of ScalaCheck’s Gen[A] to get around these shortcomings.

First off, what is the Gen[A] type? In order for the property checks to work it needs to be able to generate data of the correct type. To that purpose, ScalaCheck provides a Gen[A] type which is responsible for generating an instance of A. A few generators are provided out of the box which other generators can build on up on top of. For example, we could create a Gen[Foo] as

1
2
3
4
5
6
7
8
import org.scalacheck.Gen
import org.scalacheck.Arbitrary.arbitrary

val genFoo: Gen[Foo] = for {
  i <- arbitrary[Int]
  d <- arbitrary[Double]
  s <- arbitrary[String]
} yield Foo(i, d, s)

This can be further set up to be implicitly available to the forAll method in tests, but values can also be acquired directly from the generator using Gen[A]#sample to get an Option[A].

We can redefine our previous fixtures with this:

1
2
val f1 = genFoo.sample.get.copy(i = 1)
val f2 = genFoo.sample.get.copy(i = 2)

Now, if we update the definition for Foo we only need to update the definition of genFoo and the tests are clearer; they only depend on the field being set.

As a foundation for Object Mothers

An Object Mother is a class responsible for creating test data according to some stereotype. For example, different instances of a User object may have different abilities: one has admin capabilities, another has read and write capabilities. But some attributes of the User can still be arbitrary, such as user id or password.

Generators can help here as well, since there’s nothing preventing us from having multiple instances of type Gen[A] in the code base. Going back to our Foo case class, let’s say we’re interested in two stereotypes – one where the Int attribute is always positive and one where it is always negative. There are different ways to approach this, but here’s one:

1
2
3
4
5
6
7
8
9
val genPositiveFoo: Gen[Foo] = for {
  base <- genFoo
  pos  <- posNum[Int]
} yield base.copy(i = pos)

val genNegativeFoo: Gen[Foo] = for {
  base <- genFoo
  neg  <- negNum[Int]
} yield base.copy(i = neg)

Note that in this approach we use the fully arbitrary generator as a base and then create a copy, replacing the field we care about. This is a little wasteful since we eat the cost of generating the original value of i but in doing so the generator remains the single place to change when we add a new field.

A note about Gen[A]#suchThat and unchecked Gen[A]#sample.get

We created our generators which follow specific stereotypes by explicitly creating the value we wanted and modifying the return value. Another way to do this is by calling Gen[A]#suchThat, which takes a conditional function A => Boolean and rejects any samples where the condition returns false.

Doing so, however, is strongly discouraged: using it may cause calls to Gen[A]#sample to return a None. This not only then requires special handling when using the generator directly outside a property check, but even within a property check the the test will give up if it gets too many None instances from the generator.

2. Check invariants, laws, and contracts – but not scenarios

When first hearing about property checks, and how it can discover issues through repeated testing, there might be an inclination to use it everywhere. While it can be powerful, not everything which can be tested should be tested using property checks. Testing specific scenarios or specifications are best served with traditional testing approaches. Property checks work best when verifying broader assumptions, API contracts, or implications not captured in the code itself.

To make this less abstract, consider testing a codec which can translate an object to and from a serializable form. We could come up with specific scenarios in this case to test out the functionality:

1
2
3
4
5
6
7
// Code under test
import org.joda.time.DateTime

object DateTimeCodec {
  def encode(dateTime: DateTime): Long = dateTime.getMillis
  def decode(millis: Long): DateTime = new DateTime(millis)
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import org.joda.time.DateTime

describe("""DateTimeCodec""") {
  // shared fixture
  val date = new DateTime("2016-05-03T12:34.56Z")
  val millisSinceEpoch = 1462278873600L

  it ("encodes the DateTime as the milliseconds from epoch ") {
    DateTimeCodec.encode(date) should be (millisSinceEpoch)
  }

  it ("decodes a number representing the milliseconds from epoch into a DateTime") {
    DateTimeCodec.decode(millisSinceEpoch) should be (date)
  }
}

For something like this, a property check that verifies that encode or decode produce the exact expected output for any particular input would most likely be just repeating the code under test, and not that valuable. However, by considering the functions in combination with each other, they reveal the way in which one should be the inverse of the other:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
import org.joda.time.DateTime

describe("""DateTimeCodec""") {
  it ("an encoded DateTime decodes to the same value") {
    // Assumes we have an implicit Gen[DateTime] in scope
    forAll { original: DateTime =>
      val encoded = DateTimeCodec.encode(original)
      val decoded = DateTimeCodec.decode(encoded

      decoded should be (original)
    }
  }

  it ("a Long decoded to a DateTime encodes to the same value") {
    forAll { original: Long =>
      val decoded = DateTimeCodec.decode(original)
      val encoded = DateTimeCodec.encode(decoded)

      encoded should be (original)
    }
  }
}

Note that the property check in this case is not a replacement for the scenario spec – just because the functions are inverses of each other doesn’t mean that intermediate results are correct. It’s the combination that gives confidence that the implementation behaves as expected.

Beyond inverse functions in a codec, some other good use case are:

  • Implicit contracts between methods, such as #equals and #hashCode (as described in the JavaDoc for Object) or the stability of Comparators
  • Algebraic properties, particularly those implementing structures like Monoid or Functor with their associated laws
  • Numeric operations which may run into limitations from running on a machine, such as numeric overflow

3. Plan on accommodating slower builds

Each property check runs 100 times by default, so we shouldn’t surprised when tests are 100 times slower. However, that doesn’t include the cost of generating data for each test run. Data generation builds up quickly, especially for larger structures with nested lists. In our case, we found one group of checks in our code where each check ran in the 5 to 10 second range, a slow down of 1,000 to 10,000 times our vanilla specs.

So when that’s the case, what do you do? There are a couple of options:

Put reasonable bounds on the data be generated

This offers the biggest savings, but beware of being too conservative; after all, part of the point of property checks is to explore the full problem domain.

Remember that with large data structures, data-generation time can dominate the test time. Consider a simple structure like List[List[List[Int]]] – this quickly explodes in size if each List is unbounded in size. If it’s not reasonable to consider each level as unbounded, can be overkill without any practical benefit, though with a downside of needing to audit generators to ensure that the bounds remain reasonable.

Create a secondary parallel build just for property checks

It may make sense to break property checks out into their own test run, in the same way that might set up parallel integration tests. In some cases it makes sense to have checks run much more frequently – even without a code change trigger – to maximize the chances of catching an edge case.

4. Plan around test flakiness

Property checks are at their best when they hit those weird edge cases, when the generators just happen to hit a combination of data not handled in direct tests. This makes them interesting and useful, but also means that, as a rule they will be flaky. As with any flaky test, extra care should be taken to deal with them.

Create a unit test for property check failure

Unlink flaky integration tests that fail due to external dependencies; a property check failure should be completely internal. Once you find the failure, recreate it as a unit test so that case is no longer flaky.

Run in a test framework with matchers

One of the hardest parts of a flaky test is recreating the error scenario. ScalaCheck helps by printing out the data that caused the failure, but that uses the toString representation of that data, which can be incomplete or misleading.

It would be more useful to get the specific assertion that failed, ScalaCheck is less than helpful here. By itself, ScalaCheck asserts a property is correct true is returned by the test code, but this truncates the context of a failure when it occurs. ScalaCheck does have support for recording evidence prior to verify a check, but this requires developers to predict what data they need in advance, undermining ScalaCheck’s ability to find edge cases that weren’t considered.

As a simple example, consider a function that returns an Either[String, Double]. That could fail if we get a Left[String] when expecting a Right[Double] and vice versa, or because we got the right container but an unexpected contained value. The Boolean result of the test function does not indicate which failure was at fault.

Instead it’s better to run ScalaCheck in the context of another framework which supports better exposure of the failure context. The two heaviest hitters in the Scala ecosystem are probably ScalaTest and Specs2, both which can run ScalaCheck property checks but with validation using matchers which automatically captures details of the failure.