Why it matters — Markdown View

Hand-written tests only check what developers think to test. VAST generates thousands of programs nobody wrote and verifies the compiler handles them all correctly.

A compiler that tests itself

Most compilers are tested the obvious way: developers write example programs, compile them, and check expected output. The problem is that developers only test what they think of. The test suite is biased toward expected behaviour.

The VAST program changes that. The compiler generates its own test programs, runs them through two independent paths, and checks whether semantics are preserved. The test space grows with every seed. The program itself grows as the language does.

A semantic oracle

The AST interpreter is an independent semantic reference. It shares no code with the bytecode generator. It does not use the type checker's output. It walks the AST directly and produces a value.

When the interpreter and the compiler agree on thousands of generated programs, that is evidence that the compiler preserves semantics. When they disagree, something is wrong, and you know which seed to replay.

Oracle: In testing, an oracle is any mechanism that tells you what the correct answer should be. For most test suites, the oracle is a human who wrote an expected value. In differential testing, the oracle is a second implementation of the same semantics. If the two implementations agree, the answer is probably correct. If they disagree, at least one has a bug.

The AST interpreter is Vary's oracle: an independent implementation that defines what the correct answer should be. Most languages do not have one built into the compiler.

Exploring program space

Unit tests check specific points in program space. You pick inputs, call a function, check the output. Each test covers one point.

VAST covers random regions. The generator produces combinations of nested arithmetic with different operator mixes, comparisons inside if-expressions inside assignments, function calls whose arguments are themselves expressions, while loops with variable mutations and early returns, and boolean logic chained with integer comparisons.

Compilers fail in these combinations. A constant folder that works for 3 + 4 might break for (x - y) + y when x and y come from a conditional branch. Nobody writes that combination on purpose. Random generation finds it naturally.

Differential testing in practice

Differential testing has a long track record in compiler work:

Tool	Target	Bugs found
Csmith	GCC and Clang	Hundreds of C compiler bugs
QuickCheck	Haskell libraries and GHC	Property violations in pure functions
Go fuzzing	Go runtime	Memory safety and runtime bugs
rustc fuzzing	Rust compiler	Type system and codegen errors

These tools have found thousands of compiler bugs that no hand-written test suite caught. The VAST program applies the same idea inside Vary. Instead of an external fuzz tool maintained separately, differential testing ships with the compiler.

Protection during rapid development

Vary's compiler changes frequently. Any new feature, optimizer tweak, or codegen patch could introduce a subtle miscompilation.

The VAST program runs alongside that development. When a new feature modifies how expressions are compiled, VAST generates programs that use those expressions in combinations the developer did not anticipate. If the change broke semantics, the mismatch shows up before it reaches users.

This matters even more when compiler code is written with LLM assistance. Surface correctness may be high but edge-case handling may be incomplete. The generator produces programs the LLM did not think of. The interpreter checks semantics. Mismatches show up fast.

A permanent regression detector

Once VAST runs in CI, every compiler change faces:

vary vast --profile core --count 200 --seed 12345
vary vast --profile control --count 100 --seed 12345

If a change breaks semantics, VAST catches it. Compiler regressions often surface months later, when a new feature interacts with old code. VAST catches them at the commit that introduced them.

What most languages do not have

Most languages rely on hand-written tests, external fuzzers maintained separately, or one-off research tools. Building a verification program into the compiler project itself is unusual.

Vary ships with typed program generation, an independent semantic executor, differential comparison, deterministic replay, and a CLI to run it all. Because VAST is a program we maintain, not a one-off experiment, it keeps pace with the language. New constructs get generators. New bug classes get detection strategies.

The question the VAST program answers

Before VAST, testing asked: did the compiler produce correct output for the programs we wrote?

The VAST program asks a different question: does the compiler preserve semantics across generated programs it has never seen before? The answer gets stronger with every seed.