Testing

Testing

Vary has a built-in test DSL. Tests are top-level test blocks with a name and a body. No imports, no test framework, no configuration. This is the only way to write tests in Vary.

Test blocks

def add(a: Int, b: Int) -> Int {
    return a + b
}

test "add returns the sum" {
    observe add(2, 3) == 5
}

test "add handles zero" {
    observe add(0, 5) == 5
    observe add(5, 0) == 5
}

Run tests with vary test:

vary test math.vary

Or run everything in a directory:

vary test src/

Observe

observe is the primary way to assert in test blocks. It takes a boolean expression and fails the test if it evaluates to False:

test "add returns the sum" {
    observe add(2, 3) == 5
    observe add(0, 5) == 5
    observe add(-1, 1) == 0
}

observe is a keyword, not a function call. It marks the oracle boundary: the point where your test defines what "correct" means. The compiler records each observe with its source location and expression text, so the mutation runner knows exactly what each test checks.

Throws expression

throws { } evaluates to True if the body raises an exception, False otherwise. Use it with observe to test error paths:

def divide(a: Int, b: Int) -> Float {
    return a / b
}

test "division by zero throws" {
    observe throws { divide(1, 0) }
}

Property-based testing with across

across generates random typed inputs and runs your assertions against each one. Instead of picking specific examples, you describe what should always be true and let the compiler find counterexamples:

test "addition is commutative" {
    across(a: Int, b: Int) {
        observe a + b == b + a
    }
}

If you want a side-by-side comparison with Python's property-testing ecosystem, see Hypothesis comparison.

This runs 100 iterations by default. Each iteration generates fresh values for a and b, then checks every observe in the block. If any iteration fails, the test reports which input caused the failure.

across can only appear inside test blocks. It requires at least one binding.

Bindings and supported types

Each binding declares a name and a type. The compiler generates values based on the type:

TypeGeneration strategy
IntEdge values (0, 1, -1, min/max bounds) then scaled random
FloatSpecial values (0.0, 1.0, -1.0) then bounded random
BoolTrue and False, then random
StrEmpty string, short strings, then random ASCII up to 32 chars
List[T]Empty list, single element, then random length with generated elements
T?None or a generated value of T
Data typesField-wise generation (all fields must be generatable types)
EnumsCycles through variants
TuplesElement-wise generation

No generator registration, no imports. The compiler handles it.

You can bind multiple variables and check several properties in one block:

test "integer arithmetic laws" {
    across(a: Int, b: Int, c: Int) {
        observe (a + b) + c == a + (b + c)
        observe a * (b + c) == a * b + a * c
        observe a + b == b + a
    }
}

Mixing across with concrete examples

across composes with regular observe statements. You can mix property assertions with hand-picked examples in the same test:

test "sort properties" {
    observe sort([3, 1, 2]) == [1, 2, 3]

    across(xs: List[Int]) {
        observe len(sort(xs)) == len(xs)
    }
}

Multiple across blocks in a single test are allowed, each with different bindings.

Determinism and reproducibility

Every across run is seeded. When a test fails, the output includes the seed and iteration number:

FAIL: addition is commutative

Seed: 12345
Iteration: 37

Pass --across-seed to reproduce the exact failure:

vary test math.vary --across-seed 12345

Shrinking

When an across property fails, Vary can try to minimize the failing input before reporting it.

This is shipped today, but it is not yet a full Hypothesis-style shrinking system for all generated values.

Current shrinking behaviour:

Value kindCurrent behaviour
IntShrinks toward 0
FloatShrinks toward 0.0
StrShrinks toward shorter strings
BoolNo-op
Composite valuesLimited today; not a general first-class shrink surface yet

CLI controls:

vary test . --across-max-shrink 200  # allow more shrink attempts
vary test . --no-across-shrink       # disable shrinking

CLI options

vary test .                        # default: 100 iterations per across
vary test . --across-cases 500     # more iterations
vary test . --across-seed 42       # fixed seed for reproducibility
vary test . --across-max-shrink 200
vary test . --no-across-shrink

across and mutation testing

When vary mutate runs your test suite, each mutant faces not just your hand-picked examples but 100 generated inputs per across block. A mutant that slips past three examples is far less likely to survive 100 random ones.

Because observe is a compiler built-in, the mutation runner knows which across block caught each mutant and can report the specific generated input that triggered the failure.

How assertions connect to mutation testing

vary mutate makes small changes to your code (swapping + for -, changing 60 to 61, flipping < to <=) and re-runs your tests against each changed version. If a test still passes after a change, that change (a "mutant") survived, and your tests have a gap.

The assertions are what catch these changes. How much they catch depends on how specific the assertion is. Compare these two tests for the same function:

def add(a: Int, b: Int) -> Int {
    return a + b
}

# Weak: passes even if + is changed to * (2*3=6, still > 0)
test "add weak" {
    observe add(2, 3) > 0
}

# Pins the exact value: catches + → * (6 != 5)
test "add precise" {
    observe add(2, 3) == 5
    observe add(0, 5) == 5
    observe add(-1, 1) == 0
}

Combining assertions matters. Testing is_passing with only observe is_passing(75) misses mutations to the boundary values 60 and 100. Adding observe not is_passing(59) and observe not is_passing(101) catches boundary mutations that the first test misses.

How assertions are wired into the compiler

Every observe is a compiler built-in, not a library function. The compiler generates bytecode that records telemetry at runtime: the source expression text, the source file, and the line number. The mutation runner reads this telemetry after each test run.

When you write observe add(2, 3) == 5, the compiler emits a call to the runtime's checkObservation method with the expression text, the source file path, and the line number. The runner collects all of these after each mutant runs.

In most languages, assertions are library functions. JUnit's assertEquals, pytest's assert, Go's t.Equal all live outside the compiler. The mutation testing tool (PIT, mutmut, etc.) only sees pass or fail. It cannot tell whether a test pinned an exact value or just checked that something was truthy, so its diagnostics are limited to "this mutant survived" with no guidance on what kind of assertion to add.

Because Vary's assertions are compiler built-ins, the mutation runner has structured information about every assertion in every test:

The runner tells the difference between observe x == 5 (pins an exact value) and observe x > 0 (loose property). This feeds into the "Why Survivors Exist" breakdown and the leverage fix suggestions.

A test block with zero assertions is flagged as a placebo. A test that only uses loose comparisons where exact checks would be more precise shows up as "weak assertions only" in the survivor diagnostics.

When --why explains a surviving mutant, it can suggest "add observe with the exact expected value" instead of a generic "add an assertion."

See Mutation testing golden path for a worked example showing how vary mutate identifies which assertions to add.

Contracts as test oracles

Functions can declare contracts (preconditions with in {} and postconditions with out (r) {}). When a mutant breaks a function's behaviour, a postcondition violation throws ContractViolation, which the mutation runner counts as a kill. This means contracts strengthen your mutation score without writing additional test code.

Contracts and tests complement each other: a test checks specific input/output pairs, while a contract checks a property across all calls.

TechniqueWhat it checksWhen it runs
observe in testSpecific input/output pairDuring vary test
across in testProperty over many generated inputsDuring vary test
in {} preconditionCaller responsibilitiesEvery call at runtime
out {} postconditionImplementation promisesEvery call at runtime

Examples

Testing equality:

test "string concatenation" {
    let greeting = "hello" + " " + "world"
    observe greeting == "hello world"
}

Testing boolean conditions:

test "list contains element" {
    let items = [1, 2, 3]
    observe items.index(2) >= 0
    observe items.index(99) < 0
}

Testing exceptions:

def divide(a: Int, b: Int) -> Float {
    return a / b
}

test "division by zero throws" {
    observe throws { divide(1, 0) }
}

Where tests live

Test blocks are top-level only. They can go in the same file as your code or in separate test files. vary test discovers all test blocks in the files you point it at.

Tests next to the code they exercise:

# math.vary

def factorial(n: Int) -> Int {
    if n <= 1 {
        return 1
    }
    return n * factorial(n - 1)
}

test "factorial of 0 is 1" {
    observe factorial(0) == 1
}

test "factorial of 5 is 120" {
    observe factorial(5) == 120
}

For larger projects, a separate test file that imports the module under test:

# test_math.vary
from math import factorial

test "factorial of 0" {
    observe factorial(0) == 1
}

test "factorial of 10" {
    observe factorial(10) == 3628800
}

vary mutate auto-discovers test files named test_<name>.vary or <name>_test.vary.

← Check rules
CI verification →