Introduction — Markdown View

VAST generates random programs, runs them through multiple independent execution paths, and checks that all paths agree. If they disagree, the compiler has a bug.

What is VAST?

VAST (Vary Automated Semantic Testing) is Vary's compiler verification program. It generates random valid Vary programs, runs them through multiple independent execution paths (three by default, four with --opt-check), and fails when they disagree.

We call it a program because it is not a one-shot tool. VAST is something we run, extend, and invest in as the language evolves. When Vary adds a new language feature, VAST gets a new generator for it. When a new class of bug appears, VAST gets a new detection strategy.

Most compilers are tested by hand-written example programs with expected output. The weakness is obvious: humans only write examples they think of, and they tend to test what they expect to work.

The VAST program sidesteps that. The compiler generates programs on its own, runs them through multiple independent paths, and flags any disagreement. Instead of relying only on examples a developer remembered to write, VAST tests the compiler against a stream of programs nobody wrote.

What VAST covers

VAST is organized around five areas, each addressing a different aspect of compiler verification.

Area	Focus	What it covers
Semantic agreement	Multi-path correctness	Multiple execution paths (AST, IR, JVM, JVM-unoptimized) run the same program and must produce the same result. Blame localization, IR translation checks, and per-pass optimizer verification all live here. If paths disagree, the compiler has a bug.
Input-space exploration	Program diversity	Typed program generation, mutation expansion, metamorphic transforms, symbolic guidance, stress generation, and large program generation all expand the set of programs VAST tests.
Stateful and complex semantics	Hard subsystems	Specialized generators for stateful programs, heap aliasing, exceptions, deterministic concurrency, floating-point, generics, and collection/pattern/nullable interactions. These target compiler subsystems where bugs tend to concentrate.
Trust and confidence	Measurement	Feature coverage, semantic coverage, interaction coverage, confidence scoring, path health monitoring, sabotage validation, and the regression corpus. These tell you how thoroughly the compiler has been tested and whether VAST itself is working correctly.
Operational integration	CI and workflow	Fast/deep/continuous CI modes, RC gates, regression generation, artifact collection, JSONL metrics, and seed rotation. See the testing playbook for detailed guidance.

The execution paths

Every generated program runs through three paths by default:

Generated Program
        |
        +-- AST interpreter (reference semantics)
        |
        +-- IR interpreter (intermediate representation)
        |
        +-- JVM compiled program (real compiler backend)
        |
        +-- JVM unoptimized (with --opt-check)

The AST interpreter walks the syntax tree directly. It uses sealed value types (VInt, VBool, VStr, VEnum, VData, VList, VNone) and produces a deterministic result for every program. This is the reference oracle.

The IR interpreter lowers the AST to a flat intermediate representation and interprets it. This provides a middle layer for blame localization.

The JVM compiler takes the same AST through the real compiler pipeline: constant folding, dead code elimination, type checking, bytecode generation, and JVM execution via classloader.

With --opt-check, a fourth path runs the JVM compiler without optimization passes, isolating optimizer bugs. Deep CI mode enables this automatically.

If any paths produce different results, something is wrong. When a majority agree and one differs, the fault is isolated to a specific compiler stage.

Running it

vary vast --count 100 --seed 42

The output shows how many programs agreed and whether any mismatches were found:

VAST: 100/100 passed (0.3s)

You can run different profiles that control program complexity:

vary vast --profile core --count 1000 --seed 1
vary vast --profile control --count 100 --seed 1

core generates straight-line programs: literals, variables, arithmetic, comparisons, if statements, and return values. control adds helper functions and bounded while loops. Feature profiles (text through generics) incrementally add strings, enums, data types, collections, nullable types, pattern matching, exceptions, and generics. Use complete for full language coverage.

What it catches

VAST finds bugs that hand-written tests miss:

Bug type	Example
Optimizer changes behaviour	Constant folder produces wrong result for edge-case arithmetic
Codegen produces wrong arithmetic	`a - b` compiles to `a + b` in specific nesting
Loops compile incorrectly	While loop off-by-one in bytecode jump targets
Function call frames are wrong	Arguments passed in wrong order for generated helpers
Variable scoping is broken	Variable from outer scope shadows incorrectly
Comparison semantics diverge	`<=` compiled as `<` in nested conditions

These bugs live in unusual feature interactions. No developer sits down and writes a test for (x - y) + y inside a nested conditional with mutable loop variables. But that is exactly the kind of program VAST generates.

Deterministic replay

Every run uses a seed. If seed 842193 triggers a compiler bug, you can replay it exactly:

vary vast --seed 842193 --count 1

The same seed always generates the same program, the same execution, and the same result. Without this, random testing would be impossible to debug.

Mismatch reporting

When a mismatch is found, VAST shows the seed, the verdict, the outcome from each path, the generated source code, and a replay command:

VAST mismatch [seed=41822917, profile=core]
  Verdict: MISMATCH_VALUE
  AST interpreter: success(Int(7))     [0.2ms]
  JVM bytecode:    success(Int(9))     [12ms]

  Source:
  def __vast_compute() -> Int {
      let x = 4
      let y = 3
      return (x - y) + y
  }

  Replay: vary vast --seed 41822917 --count 1 --profile core

Failure classification

Not all disagreements are the same. VAST classifies every result into one of four categories:

Category	Meaning
Agreement	All paths produced the same result
Mismatch	Paths ran but returned different values or error categories
Path failure	One path crashed during compile or execution (likely an infrastructure bug)
Invalid program	The generator violated profile rules (a VAST bug, not counted)

Mismatches are the signal. A mismatch means the compiler produced different behaviour from the reference interpreters, and that is a bug candidate.

CI integration

VAST runs automatically in the nightly CI workflow and as part of release candidate validation. Deep mode (~9,000 programs across 12 profiles) runs nightly with 4-path validation, seed rotation, metamorphic testing, and mutation expansion. Fast mode (~100 programs across 4 profiles) is available for quick local smoke tests. Continuous mode enables long-running adaptive exploration. All modes track coverage and output JSONL metrics. See CI integration for details.

How VAST fits with mutation testing

VAST and vary mutate both involve running programs multiple ways and comparing results, but they solve different problems. VAST validates the compiler. vary mutate validates your tests. For a full breakdown, see VAST vs mutate.

Where VAST fits in the trust stack

VAST is not the whole compiler testing story.

It is the semantic-verification layer inside a broader compiler trust stack:

Level	What lives here	Main question
Implementation correctness	Kotlin unit tests, detekt rules, PIT mutation testing on compiler code	Did we implement this compiler subsystem correctly, and are the implementation tests strong?
VAST semantic verification	Differential execution, metamorphic tests, mutation expansion, specialized generators, confidence and coverage	Does the compiler preserve the meaning of Vary programs?
Release / system trust	RC gates, validation scripts, corpus replay, integration tests, sabotage checks	Is this compiler safe to ship?

The bottom layer tests the compiler implementation from the inside. These are tests of the Kotlin code that implements the parser, type checker, optimizer, code generator, CLI, and related subsystems.

The middle layer, VAST, tests the compiler from the outside. It generates Vary programs and checks that the compiler preserves their meaning across multiple execution paths.

The top layer combines everything into shipping trust. Release gates, validation scripts, sabotage checks, and regression corpus replay answer the practical question: is this compiler safe to ship?

So VAST should be read as one pillar of compiler quality, not as a replacement for unit tests or release validation.