VAST generates random programs, runs them through multiple independent execution paths, and checks that all paths agree. If they disagree, the compiler has a bug.
VAST (Vary Automated Semantic Testing) is Vary's compiler verification program. It generates random valid Vary programs, runs them through multiple independent execution paths (three by default, four with --opt-check), and fails when they disagree.
We call it a program because it is not a one-shot tool. VAST is something we run, extend, and invest in as the language evolves. When Vary adds a new language feature, VAST gets a new generator for it. When a new class of bug appears, VAST gets a new detection strategy.
Most compilers are tested by hand-written example programs with expected output. The weakness is obvious: humans only write examples they think of, and they tend to test what they expect to work.
The VAST program sidesteps that. The compiler generates programs on its own, runs them through multiple independent paths, and flags any disagreement. Instead of relying only on examples a developer remembered to write, VAST tests the compiler against a stream of programs nobody wrote.
VAST is organized around five areas, each addressing a different aspect of compiler verification.
| Area | Focus | What it covers |
|---|---|---|
| Semantic agreement | Multi-path correctness | Multiple execution paths (AST, IR, JVM, JVM-unoptimized) run the same program and must produce the same result. Blame localization, IR translation checks, and per-pass optimizer verification all live here. If paths disagree, the compiler has a bug. |
| Input-space exploration | Program diversity | Typed program generation, mutation expansion, metamorphic transforms, symbolic guidance, stress generation, and large program generation all expand the set of programs VAST tests. |
| Stateful and complex semantics | Hard subsystems | Specialized generators for stateful programs, heap aliasing, exceptions, deterministic concurrency, floating-point, generics, and collection/pattern/nullable interactions. These target compiler subsystems where bugs tend to concentrate. |
| Trust and confidence | Measurement | Feature coverage, semantic coverage, interaction coverage, confidence scoring, path health monitoring, sabotage validation, and the regression corpus. These tell you how thoroughly the compiler has been tested and whether VAST itself is working correctly. |
| Operational integration | CI and workflow | Fast/deep/continuous CI modes, RC gates, regression generation, artifact collection, JSONL metrics, and seed rotation. See the testing playbook for detailed guidance. |
Every generated program runs through three paths by default:
Generated Program
|
+-- AST interpreter (reference semantics)
|
+-- IR interpreter (intermediate representation)
|
+-- JVM compiled program (real compiler backend)
|
+-- JVM unoptimized (with --opt-check)
The AST interpreter walks the syntax tree directly. It uses sealed value types (VInt, VBool, VStr, VEnum, VData, VList, VNone) and produces a deterministic result for every program. This is the reference oracle.
The IR interpreter lowers the AST to a flat intermediate representation and interprets it. This provides a middle layer for blame localization.
The JVM compiler takes the same AST through the real compiler pipeline: constant folding, dead code elimination, type checking, bytecode generation, and JVM execution via classloader.
With --opt-check, a fourth path runs the JVM compiler without optimization passes, isolating optimizer bugs. Deep CI mode enables this automatically.
If any paths produce different results, something is wrong. When a majority agree and one differs, the fault is isolated to a specific compiler stage.
vary vast --count 100 --seed 42
The output shows how many programs agreed and whether any mismatches were found:
VAST: 100/100 passed (0.3s)
You can run different profiles that control program complexity:
vary vast --profile core --count 1000 --seed 1
vary vast --profile control --count 100 --seed 1
core generates straight-line programs: literals, variables, arithmetic, comparisons, if statements, and return values. control adds helper functions and bounded while loops. Feature profiles (text through generics) incrementally add strings, enums, data types, collections, nullable types, pattern matching, exceptions, and generics. Use complete for full language coverage.
VAST finds bugs that hand-written tests miss:
| Bug type | Example |
|---|---|
| Optimizer changes behaviour | Constant folder produces wrong result for edge-case arithmetic |
| Codegen produces wrong arithmetic | a - b compiles to a + b in specific nesting |
| Loops compile incorrectly | While loop off-by-one in bytecode jump targets |
| Function call frames are wrong | Arguments passed in wrong order for generated helpers |
| Variable scoping is broken | Variable from outer scope shadows incorrectly |
| Comparison semantics diverge | <= compiled as < in nested conditions |
These bugs live in unusual feature interactions. No developer sits down and writes a test for (x - y) + y inside a nested conditional with mutable loop variables. But that is exactly the kind of program VAST generates.
Every run uses a seed. If seed 842193 triggers a compiler bug, you can replay it exactly:
vary vast --seed 842193 --count 1
The same seed always generates the same program, the same execution, and the same result. Without this, random testing would be impossible to debug.
When a mismatch is found, VAST shows the seed, the verdict, the outcome from each path, the generated source code, and a replay command:
VAST mismatch [seed=41822917, profile=core]
Verdict: MISMATCH_VALUE
AST interpreter: success(Int(7)) [0.2ms]
JVM bytecode: success(Int(9)) [12ms]
Source:
def __vast_compute() -> Int {
let x = 4
let y = 3
return (x - y) + y
}
Replay: vary vast --seed 41822917 --count 1 --profile core
Not all disagreements are the same. VAST classifies every result into one of four categories:
| Category | Meaning |
|---|---|
| Agreement | All paths produced the same result |
| Mismatch | Paths ran but returned different values or error categories |
| Path failure | One path crashed during compile or execution (likely an infrastructure bug) |
| Invalid program | The generator violated profile rules (a VAST bug, not counted) |
Mismatches are the signal. A mismatch means the compiler produced different behaviour from the reference interpreters, and that is a bug candidate.
VAST runs automatically in the nightly CI workflow and as part of release candidate validation. Deep mode (~9,000 programs across 12 profiles) runs nightly with 4-path validation, seed rotation, metamorphic testing, and mutation expansion. Fast mode (~100 programs across 4 profiles) is available for quick local smoke tests. Continuous mode enables long-running adaptive exploration. All modes track coverage and output JSONL metrics. See CI integration for details.
VAST and vary mutate both involve running programs multiple ways and comparing results, but they solve different problems. VAST validates the compiler. vary mutate validates your tests. For a full breakdown, see VAST vs mutate.
VAST is not the whole compiler testing story.
It is the semantic-verification layer inside a broader compiler trust stack:
| Level | What lives here | Main question |
|---|---|---|
| Implementation correctness | Kotlin unit tests, detekt rules, PIT mutation testing on compiler code | Did we implement this compiler subsystem correctly, and are the implementation tests strong? |
| VAST semantic verification | Differential execution, metamorphic tests, mutation expansion, specialized generators, confidence and coverage | Does the compiler preserve the meaning of Vary programs? |
| Release / system trust | RC gates, validation scripts, corpus replay, integration tests, sabotage checks | Is this compiler safe to ship? |
The bottom layer tests the compiler implementation from the inside. These are tests of the Kotlin code that implements the parser, type checker, optimizer, code generator, CLI, and related subsystems.
The middle layer, VAST, tests the compiler from the outside. It generates Vary programs and checks that the compiler preserves their meaning across multiple execution paths.
The top layer combines everything into shipping trust. Release gates, validation scripts, sabotage checks, and regression corpus replay answer the practical question: is this compiler safe to ship?
So VAST should be read as one pillar of compiler quality, not as a replacement for unit tests or release validation.