VAST

Overview

VAST checks that the Vary compiler does what it is supposed to do. It does this by writing thousands of small programs, running each one in multiple ways, and making sure the answers always match.

Audience: Vary project internals. VAST tests the compiler, not your application code. If you want to validate your own project, see VAR instead.

The problem

A compiler translates code you write into code a computer runs. If the translation is wrong, your program does the wrong thing, and the error is invisible. Your code looks correct. The compiler accepted it without complaint. But the computer does something different from what you wrote.

These bugs are rare, but when they happen, they are hard to find. The program works most of the time. It fails only with certain inputs, or certain combinations of features, or only after the compiler applies certain optimizations. You can stare at your code for hours and never see the problem, because the problem is not in your code.

What VAST does about it

VAST writes programs automatically. Thousands of them, each one different. It runs every program through three separate paths inside the compiler, each of which should produce the same answer. When two paths disagree, VAST has found a bug.

The programs are random but valid. VAST does not throw garbage at the compiler and hope something breaks. It generates well-typed, grammatically correct programs that use real language features: variables, functions, loops, enums, pattern matching, error handling, generics. The programs are small and self-contained, which makes failures easy to reproduce and investigate.

Why three paths?

If you only have one path, you have no way to tell whether the answer is right. If you have two, you know something is wrong when they disagree, but you do not know which one is broken. With three paths, you can usually tell: if two agree and one disagrees, the odd one out is the suspect.

VAST can also run a fourth path that skips the compiler's optimization passes. When the optimized and unoptimized paths disagree but everything else matches, the optimizer is the problem. This is how VAST catches bugs that only appear after code is optimized.

What happens when VAST finds something

When paths disagree, VAST reports which program caused it, which paths disagreed, and what each path produced. It also tries to shrink the failing program down to the smallest version that still triggers the bug, so you are not debugging a 50-line generated program when a 5-line one would do.

When VAST finds a mismatch and reduction is enabled, the minimized reproducer can be saved in a corpus. Nightly automation is set up to grow and maintain this corpus over time. When corpus entries are present, release candidate testing replays them to make sure old bugs have not come back.

How thorough is it?

VAST tracks what it has tested: which language features appeared, which behaviours were exercised, which feature combinations were tried. It produces a confidence score that says, roughly, how well-tested the compiler is for a given set of features.

When coverage is low in some area (say, exceptions inside loops have never been tested), VAST can focus its effort there. Continuous exploration mode runs for a set amount of time and steers itself toward under-tested areas automatically.

Where it runs

VAST runs every night as part of an automated workflow. It also runs as part of the release candidate pipeline that every release must pass before it ships. Developers can run a quick version locally in a few seconds to check their work before submitting changes.

The nightly run tests roughly 9,000 programs across 12 language profiles. It validates the optimizer by comparing optimized and unoptimized output. It applies meaning-preserving changes to programs and checks that answers stay the same. It deliberately breaks programs to make sure the test infrastructure notices. It also disables parts of VAST itself to verify those parts are doing something.

Why the results hold up

VAST is not a formal proof. It is an empirical testing system. Here is what makes the results useful.

PropertyWhy it matters
Independent execution pathsThe AST interpreter, IR interpreter, and JVM compiler are written independently. A bug would have to manifest identically in all implementations to escape detection.
Optimizer isolationThe fourth path (JVM unoptimized) runs the same backend without optimization passes. This isolates optimizer bugs, the most common source of real compiler defects.
Sabotage self-checkingVAST disables its own detection in four different ways and verifies it notices each time. If sabotage passes silently, VAST has a gap. This runs every night.
Minimization and replayEvery failure is automatically shrunk to the smallest program that triggers it, then stored with its seed for exact replay. Failures are concrete and reproducible.
Regression corpusPreviously found bugs can be stored as minimized programs and replayed on RC runs. Once an entry exists, a fixed bug cannot silently regress.
Coverage trackingVAST measures feature coverage (22 constructs), semantic coverage (27 behaviours), and interaction coverage (pairwise feature combinations). It knows what it has tested and what it has not.
Continuous explorationCoverage-guided exploration steers generation toward under-tested areas. VAST does not just repeat the same patterns; it actively seeks gaps.
Historical trend stabilityConfidence scores, coverage metrics, and mismatch counts are exported as JSONL metrics nightly. Trend regressions are visible over time.

In practice, a VAST report of "no mismatches, HIGH confidence" means the compiler has been tested across thousands of generated programs, multiple execution paths, and specialized generators, and that the testing infrastructure itself has been validated through sabotage checks.

What it does not do

VAST tests the compiler, not your code. It cannot tell you whether your application is correct. It tells you whether the compiler correctly translates what you wrote into what the computer runs. For testing your own code, Vary has a separate mutation testing tool (vary mutate) that checks whether your tests catch bugs in your code.

VAST also does not test anything nondeterministic. No file I/O, no network calls. Every generated program is self-contained and produces the same answer every time, which is what makes the multi-path comparison possible. VAST does test deterministic concurrency patterns (fork-join, map-reduce, pipelines) where the result is independent of thread scheduling order, but it does not test programs whose outcome depends on nondeterministic thread interleavings.

Introduction →