Across the bridge: why three trucks is not enough

Imagine you build a bridge and test it by driving three trucks across: a light one, a medium one, and a heavy one. All three make it. You declare the bridge safe. But you never tried two trucks at once. You never tried a truck in freezing rain. You never tried a truck with an uneven load that puts all the weight on one side. The bridge collapses six months later under conditions you did not think to test.

That is how most software testing works. You pick a few inputs, check that the output looks right, and move on. The inputs you did not pick are where the bugs live.

Property-based testing takes a different approach: instead of choosing specific examples, you describe what should always be true, and the computer generates hundreds of random inputs to try to prove you wrong.

From bridges to code

Hand-picked test inputs catch the bugs you anticipate. They miss the rest. Property-based testing runs assertions against many generated values instead, and it turns up failures at boundaries you would not have thought to check.

Haskell's QuickCheck showed this was practical back in 1999, and it was the direct influence for across. The idea is that you declare a property ("reverse applied twice gives back the original list") and the framework generates random inputs to try to break it. The idea spread to Scala (ScalaCheck), Python (Hypothesis), Erlang (PropEr), and most other ecosystems. But in every case, property-based testing is a library you add to a project. You learn the generator API, wire it into your test runner, and maintain the integration. Most teams try it once and go back to writing examples by hand.

Vary puts the QuickCheck idea in the compiler. One keyword, no dependencies. The compiler generates values based on declared types.

The problem with example-based tests

A typical test for an add function might look like this:

test "add works" {
    observe add(1, 2) == 3
    observe add(0, 0) == 0
    observe add(-1, 1) == 0
}

Three examples. Three data points out of a space of 2^128 possible input pairs. These tests pass, but they only cover what the author thought to check. If add has an overflow bug at large values, or mishandles specific bit patterns, these three lines will not find it.

What across does

across is a test-only construct that generates random typed inputs and runs your assertions against each one:

test "addition is commutative" {
    across(a: Int, b: Int) {
        observe a + b == b + a
    }
}

If you have used QuickCheck, this should look familiar. The difference is that there is no Arbitrary instance to define and no forAll combinator to import. You write across(a: Int, b: Int) and the compiler generates values based on the declared types.

This runs 100 iterations by default. Each iteration generates fresh values for a and b, then checks the observe expression. If any iteration fails, the test fails and reports which input caused the problem.

You state the property that should hold for all inputs, not the specific values you happened to pick.

No generator code required

In QuickCheck and its descendants, you eventually need custom Arbitrary instances or generator combinators for your domain types. That is where most people hit a wall and go back to example-based tests.

In Vary, generation is type-directed. The compiler looks at the declared type and generates values automatically:

TypeGeneration strategy
IntEdge values (0, 1, -1, bounds) then scaled random
FloatSpecial values (0.0, 1.0, -1.0) then bounded random
BoolTrue and False, then random
StrEmpty string, short strings, then random ASCII up to 32 chars
List[T]Empty list, single element, then random length with generated elements
T?None or a generated value of T
TuplesElement-wise generation

Data types get field-wise generation for free. Enums cycle through variants. So a property test needs no setup at all:

test "string concatenation preserves length" {
    across(a: Str, b: Str) {
        observe len(a + b) == len(a) + len(b)
    }
}

No imports, no generator registration. The compiler does the rest.

Multiple bindings, multiple assertions

You can bind several variables and check several properties in a single across block:

test "integer arithmetic laws" {
    across(a: Int, b: Int, c: Int) {
        observe (a + b) + c == a + (b + c)     # associativity
        observe a * (b + c) == a * b + a * c    # distributive
        observe a + b == b + a                  # commutativity
    }
}

All three properties are checked for every generated triple. If any one fails, the test stops at the first failing assertion.

Works with the rest of the test DSL

across composes with regular observe statements. You can mix property assertions with concrete examples in the same test:

test "sort properties" {
    # Concrete check
    observe sort([3, 1, 2]) == [1, 2, 3]

    # Property check
    across(xs: List[Int]) {
        observe len(sort(xs)) == len(xs)
    }
}

You can also put multiple across blocks in one test, each with different bindings.

Deterministic by default

Every across run is seeded. If a test fails, you get the seed in the output:

FAIL: addition is commutative

Seed: 12345
Iteration: 37

Pass --across-seed 12345 to reproduce the exact failure. This matters for CI: a failing property test should produce the same failure on every rerun until the bug is fixed. QuickCheck had this too, of course, but in Vary it is wired into the test runner with no extra configuration.

Connection to mutation testing

Vary has mutation testing built into the compiler. When vary mutate runs your test suite, each mutant faces not just your hand-picked examples but 100 generated inputs per across block. A mutant that slips past three examples is far less likely to survive 100 random ones.

QuickCheck never had this connection because Haskell has no built-in mutation engine. In Vary, across and mutate work together: across explores the input space, and mutate measures whether your assertions actually catch bugs when the code changes.

CLI configuration

vary test .                        # default: 100 iterations per across
vary test . --across-cases 500     # more iterations
vary test . --across-seed 42       # reproducible seed

Try it

Write a property test:

test "negation is self-inverse" {
    across(x: Int) {
        observe -(-x) == x
    }
}

Run it with vary test . and 100 generated inputs will validate your assertion. Start with algebraic properties (identity, commutativity, inverses) and work toward domain-specific invariants.

Supported types today: Int, Float, Bool, Str, List[T], optionals (T?), tuples, data types, and enums. Primitive shrinking is now shipped for failing across cases; broader domain and stateful property-testing features are still evolving.