
> VAST generates random programs, runs them through multiple independent execution paths, and checks that all paths agree. If they disagree, the compiler has a bug.

## What is VAST?

VAST (Vary Automated Semantic Testing) is Vary's compiler verification program. It generates random valid Vary programs, runs them through multiple independent execution paths (three by default, four with `--opt-check`), and fails when they disagree.

We call it a program because it is not a one-shot tool. VAST is something we run, extend, and invest in as the language evolves. When Vary adds a new language feature, VAST gets a new generator for it. When a new class of bug appears, VAST gets a new detection strategy.

Most compilers are tested by hand-written example programs with expected output. The weakness is obvious: humans only write examples they think of, and they tend to test what they expect to work.

The VAST program sidesteps that. The compiler generates programs on its own, runs them through multiple independent paths, and flags any disagreement. Instead of relying only on examples a developer remembered to write, VAST tests the compiler against a stream of programs nobody wrote.

## What VAST covers

VAST is organized around five areas, each addressing a different aspect of compiler verification.

| Area | Focus | What it covers |
|------|-------|---------------|
| Semantic agreement | Multi-path correctness | Multiple execution paths (AST, IR, JVM, JVM-unoptimized) run the same program and must produce the same result. Blame localization, IR translation checks, and per-pass optimizer verification all live here. If paths disagree, the compiler has a bug. |
| Input-space exploration | Program diversity | Typed program generation, mutation expansion, metamorphic transforms, symbolic guidance, stress generation, and large program generation all expand the set of programs VAST tests. |
| Stateful and complex semantics | Hard subsystems | Specialized generators for stateful programs, heap aliasing, exceptions, deterministic concurrency, floating-point, generics, and collection/pattern/nullable interactions. These target compiler subsystems where bugs tend to concentrate. |
| Trust and confidence | Measurement | Feature coverage, semantic coverage, interaction coverage, confidence scoring, path health monitoring, sabotage validation, and the regression corpus. These tell you how thoroughly the compiler has been tested and whether VAST itself is working correctly. |
| Operational integration | CI and workflow | Fast/deep/continuous CI modes, RC gates, regression generation, artifact collection, JSONL metrics, and seed rotation. See the [testing playbook](/docs/vast/testing-playbook/) for detailed guidance. |

## The execution paths

Every generated program runs through three paths by default:

```text
Generated Program
        |
        +-- AST interpreter (reference semantics)
        |
        +-- IR interpreter (intermediate representation)
        |
        +-- JVM compiled program (real compiler backend)
        |
        +-- JVM unoptimized (with --opt-check)
```

The **AST interpreter** walks the syntax tree directly. It uses sealed value types (`VInt`, `VBool`, `VStr`, `VEnum`, `VData`, `VList`, `VNone`) and produces a deterministic result for every program. This is the reference oracle.

The **IR interpreter** lowers the AST to a flat intermediate representation and interprets it. This provides a middle layer for blame localization.

The **JVM compiler** takes the same AST through the real compiler pipeline: constant folding, dead code elimination, type checking, bytecode generation, and JVM execution via classloader.

With `--opt-check`, a fourth path runs the JVM compiler without optimization passes, isolating optimizer bugs. Deep CI mode enables this automatically.

If any paths produce different results, something is wrong. When a majority agree and one differs, the fault is isolated to a specific compiler stage.

## Running it

```bash
vary vast --count 100 --seed 42
```

The output shows how many programs agreed and whether any mismatches were found:

```text
VAST: 100/100 passed (0.3s)
```

You can run different profiles that control program complexity:

```bash
vary vast --profile core --count 1000 --seed 1
vary vast --profile control --count 100 --seed 1
```

`core` generates straight-line programs: literals, variables, arithmetic, comparisons, if statements, and return values. `control` adds helper functions and bounded while loops. Feature profiles (`text` through `generics`) incrementally add strings, enums, data types, collections, nullable types, pattern matching, exceptions, and generics. Use `complete` for full language coverage.

## What it catches

VAST finds bugs that hand-written tests miss:

| Bug type | Example |
|----------|---------|
| Optimizer changes behaviour | Constant folder produces wrong result for edge-case arithmetic |
| Codegen produces wrong arithmetic | `a - b` compiles to `a + b` in specific nesting |
| Loops compile incorrectly | While loop off-by-one in bytecode jump targets |
| Function call frames are wrong | Arguments passed in wrong order for generated helpers |
| Variable scoping is broken | Variable from outer scope shadows incorrectly |
| Comparison semantics diverge | `<=` compiled as `<` in nested conditions |

These bugs live in unusual feature interactions. No developer sits down and writes a test for `(x - y) + y` inside a nested conditional with mutable loop variables. But that is exactly the kind of program VAST generates.

## Deterministic replay

Every run uses a seed. If seed `842193` triggers a compiler bug, you can replay it exactly:

```bash
vary vast --seed 842193 --count 1
```

The same seed always generates the same program, the same execution, and the same result. Without this, random testing would be impossible to debug.

## Mismatch reporting

When a mismatch is found, VAST shows the seed, the verdict, the outcome from each path, the generated source code, and a replay command:

```text
VAST mismatch [seed=41822917, profile=core]
  Verdict: MISMATCH_VALUE
  AST interpreter: success(Int(7))     [0.2ms]
  JVM bytecode:    success(Int(9))     [12ms]

  Source:
  def __vast_compute() -> Int {
      let x = 4
      let y = 3
      return (x - y) + y
  }

  Replay: vary vast --seed 41822917 --count 1 --profile core
```

## Failure classification

Not all disagreements are the same. VAST classifies every result into one of four categories:

| Category | Meaning |
|----------|---------|
| Agreement | All paths produced the same result |
| Mismatch | Paths ran but returned different values or error categories |
| Path failure | One path crashed during compile or execution (likely an infrastructure bug) |
| Invalid program | The generator violated profile rules (a VAST bug, not counted) |

Mismatches are the signal. A mismatch means the compiler produced different behaviour from the reference interpreters, and that is a bug candidate.

## CI integration

VAST runs automatically in the nightly CI workflow and as part of release candidate validation. Deep mode (~9,000 programs across 12 profiles) runs nightly with 4-path validation, seed rotation, metamorphic testing, and mutation expansion. Fast mode (~100 programs across 4 profiles) is available for quick local smoke tests. Continuous mode enables long-running adaptive exploration. All modes track coverage and output JSONL metrics. See [CI integration](/docs/vast/ci-integration/) for details.

## How VAST fits with mutation testing

VAST and `vary mutate` both involve running programs multiple ways and comparing results, but they solve different problems. VAST validates the compiler. `vary mutate` validates your tests. For a full breakdown, see [VAST vs mutate](/docs/vast/vast-vs-mutate/).

## Where VAST fits in the trust stack

VAST is not the whole compiler testing story.

It is the semantic-verification layer inside a broader compiler trust stack:

| Level | What lives here | Main question |
|-------|------------------|---------------|
| Implementation correctness | Kotlin unit tests, detekt rules, PIT mutation testing on compiler code | Did we implement this compiler subsystem correctly, and are the implementation tests strong? |
| VAST semantic verification | Differential execution, metamorphic tests, mutation expansion, specialized generators, confidence and coverage | Does the compiler preserve the meaning of Vary programs? |
| Release / system trust | RC gates, validation scripts, corpus replay, integration tests, sabotage checks | Is this compiler safe to ship? |

The bottom layer tests the compiler implementation from the inside. These are tests of the Kotlin code that implements the parser, type checker, optimizer, code generator, CLI, and related subsystems.

The middle layer, VAST, tests the compiler from the outside. It generates Vary programs and checks that the compiler preserves their meaning across multiple execution paths.

The top layer combines everything into shipping trust. Release gates, validation scripts, sabotage checks, and regression corpus replay answer the practical question: is this compiler safe to ship?

So VAST should be read as one pillar of compiler quality, not as a replacement for unit tests or release validation.
