Vary has a built-in test DSL. Tests are top-level `test` blocks with a name and a body. No imports, no test framework, no configuration. This is the only way to write tests in Vary.

## Test blocks

```vary
def add(a: Int, b: Int) -> Int {
    return a + b
}

test "add returns the sum" {
    observe add(2, 3) == 5
}

test "add handles zero" {
    observe add(0, 5) == 5
    observe add(5, 0) == 5
}
```

Run tests with `vary test`:

```bash
vary test math.vary
```

Or run everything in a directory:

```bash
vary test src/
```

## Observe

`observe` is the primary way to assert in test blocks. It takes a boolean expression and fails the test if it evaluates to `False`:

```vary-snippet
test "add returns the sum" {
    observe add(2, 3) == 5
    observe add(0, 5) == 5
    observe add(-1, 1) == 0
}
```

`observe` is a keyword, not a function call. It marks the oracle boundary: the point where your test defines what "correct" means. The compiler records each `observe` with its source location and expression text, so the mutation runner knows exactly what each test checks.

## Throws expression

`throws { }` evaluates to `True` if the body raises an exception, `False` otherwise. Use it with `observe` to test error paths:

```vary
def divide(a: Int, b: Int) -> Float {
    return a / b
}

test "division by zero throws" {
    observe throws { divide(1, 0) }
}
```

## Property-based testing with across

`across` generates random typed inputs and runs your assertions against each one. Instead of picking specific examples, you describe what should always be true and let the compiler find counterexamples:

```vary
test "addition is commutative" {
    across(a: Int, b: Int) {
        observe a + b == b + a
    }
}
```

If you want a side-by-side comparison with Python's property-testing ecosystem, see [Hypothesis comparison](/docs/hypothesis-comparison/).

This runs 100 iterations by default. Each iteration generates fresh values for `a` and `b`, then checks every `observe` in the block. If any iteration fails, the test reports which input caused the failure.

`across` can only appear inside `test` blocks. It requires at least one binding.

### Bindings and supported types

Each binding declares a name and a type. The compiler generates values based on the type:

| Type | Generation strategy |
|------|-------------------|
| `Int` | Edge values (0, 1, -1, min/max bounds) then scaled random |
| `Float` | Special values (0.0, 1.0, -1.0) then bounded random |
| `Bool` | `True` and `False`, then random |
| `Str` | Empty string, short strings, then random ASCII up to 32 chars |
| `List[T]` | Empty list, single element, then random length with generated elements |
| `T?` | `None` or a generated value of `T` |
| Data types | Field-wise generation (all fields must be generatable types) |
| Enums | Cycles through variants |
| Tuples | Element-wise generation |

No generator registration, no imports. The compiler handles it.

You can bind multiple variables and check several properties in one block:

```vary
test "integer arithmetic laws" {
    across(a: Int, b: Int, c: Int) {
        observe (a + b) + c == a + (b + c)
        observe a * (b + c) == a * b + a * c
        observe a + b == b + a
    }
}
```

### Mixing across with concrete examples

`across` composes with regular `observe` statements. You can mix property assertions with hand-picked examples in the same test:

```vary-snippet
test "sort properties" {
    observe sort([3, 1, 2]) == [1, 2, 3]

    across(xs: List[Int]) {
        observe len(sort(xs)) == len(xs)
    }
}
```

Multiple `across` blocks in a single test are allowed, each with different bindings.

### Determinism and reproducibility

Every `across` run is seeded. When a test fails, the output includes the seed and iteration number:

```text
FAIL: addition is commutative

Seed: 12345
Iteration: 37
```

Pass `--across-seed` to reproduce the exact failure:

```bash
vary test math.vary --across-seed 12345
```

### Shrinking

When an `across` property fails, Vary can try to minimize the failing input before reporting it.

This is shipped today, but it is not yet a full Hypothesis-style shrinking system for all generated values.

Current shrinking behaviour:

| Value kind | Current behaviour |
|-----------|-------------------|
| `Int` | Shrinks toward `0` |
| `Float` | Shrinks toward `0.0` |
| `Str` | Shrinks toward shorter strings |
| `Bool` | No-op |
| Composite values | Limited today; not a general first-class shrink surface yet |

CLI controls:

```bash
vary test . --across-max-shrink 200  # allow more shrink attempts
vary test . --no-across-shrink       # disable shrinking
```

### CLI options

```bash
vary test .                        # default: 100 iterations per across
vary test . --across-cases 500     # more iterations
vary test . --across-seed 42       # fixed seed for reproducibility
vary test . --across-max-shrink 200
vary test . --no-across-shrink
```

### across and mutation testing

When `vary mutate` runs your test suite, each mutant faces not just your hand-picked examples but 100 generated inputs per `across` block. A mutant that slips past three examples is far less likely to survive 100 random ones.

Because `observe` is a compiler built-in, the mutation runner knows which `across` block caught each mutant and can report the specific generated input that triggered the failure.

## How assertions connect to mutation testing

`vary mutate` makes small changes to your code (swapping `+` for `-`, changing `60` to `61`, flipping `<` to `<=`) and re-runs your tests against each changed version. If a test still passes after a change, that change (a "mutant") survived, and your tests have a gap.

The assertions are what catch these changes. How much they catch depends on how specific the assertion is. Compare these two tests for the same function:

```vary
def add(a: Int, b: Int) -> Int {
    return a + b
}

# Weak: passes even if + is changed to * (2*3=6, still > 0)
test "add weak" {
    observe add(2, 3) > 0
}

# Pins the exact value: catches + → * (6 != 5)
test "add precise" {
    observe add(2, 3) == 5
    observe add(0, 5) == 5
    observe add(-1, 1) == 0
}
```

Combining assertions matters. Testing `is_passing` with only `observe is_passing(75)` misses mutations to the boundary values 60 and 100. Adding `observe not is_passing(59)` and `observe not is_passing(101)` catches boundary mutations that the first test misses.

### How assertions are wired into the compiler

Every `observe` is a compiler built-in, not a library function. The compiler generates bytecode that records telemetry at runtime: the source expression text, the source file, and the line number. The mutation runner reads this telemetry after each test run.

When you write `observe add(2, 3) == 5`, the compiler emits a call to the runtime's `checkObservation` method with the expression text, the source file path, and the line number. The runner collects all of these after each mutant runs.

In most languages, assertions are library functions. JUnit's `assertEquals`, pytest's `assert`, Go's `t.Equal` all live outside the compiler. The mutation testing tool (PIT, mutmut, etc.) only sees pass or fail. It cannot tell whether a test pinned an exact value or just checked that something was truthy, so its diagnostics are limited to "this mutant survived" with no guidance on what kind of assertion to add.

Because Vary's assertions are compiler built-ins, the mutation runner has structured information about every assertion in every test:

The runner tells the difference between `observe x == 5` (pins an exact value) and `observe x > 0` (loose property). This feeds into the "Why Survivors Exist" breakdown and the leverage fix suggestions.

A test block with zero assertions is flagged as a placebo. A test that only uses loose comparisons where exact checks would be more precise shows up as "weak assertions only" in the survivor diagnostics.

When `--why` explains a surviving mutant, it can suggest "add `observe` with the exact expected value" instead of a generic "add an assertion."

See [Mutation testing golden path](/docs/mutation/workflow/) for a worked example showing how `vary mutate` identifies which assertions to add.

## Contracts as test oracles

Functions can declare [contracts](/docs/contracts/) (preconditions with `in {}` and postconditions with `out (r) {}`). When a mutant breaks a function's behaviour, a postcondition violation throws `ContractViolation`, which the mutation runner counts as a kill. This means contracts strengthen your mutation score without writing additional test code.

Contracts and tests complement each other: a test checks specific input/output pairs, while a contract checks a property across all calls.

| Technique | What it checks | When it runs |
|-----------|---------------|-------------|
| `observe` in test | Specific input/output pair | During `vary test` |
| `across` in test | Property over many generated inputs | During `vary test` |
| `in {}` precondition | Caller responsibilities | Every call at runtime |
| `out {}` postcondition | Implementation promises | Every call at runtime |

## Examples

Testing equality:

```vary
test "string concatenation" {
    let greeting = "hello" + " " + "world"
    observe greeting == "hello world"
}
```

Testing boolean conditions:

```vary
test "list contains element" {
    let items = [1, 2, 3]
    observe items.index(2) >= 0
    observe items.index(99) < 0
}
```

Testing exceptions:

```vary
def divide(a: Int, b: Int) -> Float {
    return a / b
}

test "division by zero throws" {
    observe throws { divide(1, 0) }
}
```

## Where tests live

Test blocks are top-level only. They can go in the same file as your code or in separate test files. `vary test` discovers all `test` blocks in the files you point it at.

Tests next to the code they exercise:

```vary
# math.vary

def factorial(n: Int) -> Int {
    if n <= 1 {
        return 1
    }
    return n * factorial(n - 1)
}

test "factorial of 0 is 1" {
    observe factorial(0) == 1
}

test "factorial of 5 is 120" {
    observe factorial(5) == 120
}
```

For larger projects, a separate test file that imports the module under test:

```vary
# test_math.vary
from math import factorial

test "factorial of 0" {
    observe factorial(0) == 1
}

test "factorial of 10" {
    observe factorial(10) == 3628800
}
```

`vary mutate` auto-discovers test files named `test_<name>.vary` or `<name>_test.vary`.