Comparison with other systems

Other projects test compilers with random programs too. Here is how VAST relates to Csmith, Alive2, CompCert, LangFuzz, and others.

Compiler testing is a solved problem (sort of)

The idea of testing compilers with randomly generated programs is decades old. Several tools have demonstrated that random testing finds bugs that hand-written test suites miss. VAST builds on these ideas and adapts them for a specific context: a new language whose compiler is developed with LLM assistance.

Existing tools

System	Lang	Strategy	Notable results
Csmith	C	Random C program generation, differential testing across GCC and Clang	Hundreds of compiler bugs found in production compilers
YARPGen	C/C++	Random program generation targeting specific optimization patterns	Bugs in GCC, LLVM, and Intel compilers
LangFuzz	Multiple	Grammar-based fuzzing for language runtimes	Bugs in JavaScript engines, C compilers, and other runtimes
Alive2	LLVM IR	Translation validation using SMT solvers	Proves or disproves that LLVM optimizations preserve semantics
CompCert	C	Formally verified compiler with machine-checked proofs	Correct compilation guaranteed within the verified subset
QuickCheck	Haskell	Property-based testing with shrinking	Widely used to discover bugs in Haskell libraries and occasionally GHC itself
Go fuzzing	Go	Built-in fuzzing support with coverage guidance	Panics, crashes, and edge-case runtime bugs in the Go standard library
VAST	Vary	Differential + metamorphic + mutation, multi-path comparison (3 or 4 paths)	Integrated into compiler CI, combines compiler testing with mutation-driven program generation

How VAST compares

Csmith and YARPGen

Csmith and YARPGen generate random programs in C and C++ and test them across existing production compilers (GCC, Clang, ICC). Their strength is the diversity of compilers available for comparison.

VAST generates programs in Vary and compares across multiple execution paths within the same project. It cannot compare against an independent Vary compiler (there is only one), so it builds its own reference implementations: the AST interpreter and the IR interpreter.

	Csmith/YARPGen	VAST
Comparison targets	Multiple production compilers	3-4 internal execution paths
Language maturity	Established languages (C, C++)	New language under active development
External dependencies	Requires multiple compiler installations	Self-contained, ships with the compiler
Integration	Separate tool, run independently	Built into the compiler project and CI

LangFuzz

LangFuzz uses the target language's grammar to generate and mutate test inputs, then feeds them to the language runtime looking for crashes and assertion failures. It has found bugs in SpiderMonkey, V8, and other JavaScript engines, as well as C compilers.

VAST also generates programs from the language grammar, but instead of looking for crashes, it compares output across three independent execution paths. VAST also integrates mutation testing into its pipeline, using mutation operators to verify that its own comparison infrastructure catches injected faults.

	LangFuzz	VAST
Generation	Grammar-based fuzzing	Type-directed AST generation
Oracle	Crash detection, assertion failures	Three-path differential comparison
Mutation role	Mutates inputs to explore edge cases	Validates detection infrastructure, expands program space
Integration	External tool	Built into the compiler

Alive2

Alive2 uses formal methods (SMT solvers) to prove that LLVM optimization passes preserve program semantics. It provides mathematical guarantees, not statistical confidence.

VAST uses testing, not proofs. It cannot guarantee the absence of bugs, only demonstrate their presence. The tradeoff is practical: Alive2 requires formal modelling of each optimization, while VAST tests the full pipeline end-to-end by generating programs and comparing outcomes.

	Alive2	VAST
Technique	SMT-based verification	Differential testing
Guarantee	Mathematical proof of correctness	Statistical confidence from many programs
Scope	Individual optimization passes	Full compilation pipeline
Effort to extend	Model each optimization formally	Add generators for new language features

CompCert

CompCert is a formally verified C compiler. Every compilation step has a machine-checked proof that it preserves semantics. This is the gold standard for compiler correctness.

VAST does not provide formal verification. It provides continuous empirical testing. The practical difference: CompCert proves correctness once and maintains the proofs as the compiler changes. VAST tests correctness continuously and finds bugs as they are introduced. Both are valuable. VAST is feasible for a project that cannot invest in formal verification.

	CompCert	VAST
Approach	Formal verification (Coq proofs)	Empirical testing (random programs)
Coverage	Verified subset of C	Growing subset of Vary as generators are added
Bugs	Rare in verified passes (prevented by construction)	Detected at test time
Development cost	Very high (proofs for every pass)	Moderate (generators and interpreters)

QuickCheck and property-based testing

QuickCheck pioneered property-based testing with automatic shrinking. VAST borrows the idea of shrinking failing inputs to minimal reproducers, adapted for AST-level programs instead of data values.

VAST's reducer works on the same principle: when a randomly generated program triggers a bug, shrink it until removing any further piece makes the bug disappear.

	QuickCheck	VAST
Input domain	Data values (integers, lists, etc.)	Complete programs (AST nodes)
Properties	User-defined function properties	Three-path agreement
Shrinking	Value-level (remove list elements, etc.)	AST-level (remove statements, simplify expressions)
Integration	Library, used per-project	Built into the compiler

Go fuzzing

Go's built-in fuzzing generates random inputs for functions and tracks code coverage to guide generation toward untested paths. VAST does not currently use coverage guidance, though it is under consideration for future phases.

VAST differs in scope: it generates complete programs, not function inputs. And it uses differential comparison instead of crash detection.

What makes VAST unusual

Most compiler testing tools are external to the compiler project. Csmith is a separate project from GCC. Alive2 is separate from LLVM. LangFuzz is a standalone research tool. VAST ships with the Vary compiler.

This integration has practical benefits. VAST runs alongside unit tests and mutation tests in the same nightly build. The generator builds real AST nodes (the same types the parser produces), so there is no serialization gap between generation and execution. When a new language feature is added, the generator and interpreters are extended in the same change.

Most of the systems listed here focus on one technique. VAST combines differential testing, metamorphic testing, and mutation expansion in the same pipeline. Because Vary is a mutation-native language (mutation testing is built into the compiler, not bolted on), VAST can use mutation operators both to expand the program search space and to validate its own detection infrastructure. That combination is uncommon in compiler testing tools.

Limitations

VAST is not a formal verification system. It cannot prove the absence of bugs. It finds bugs by testing many programs, and its effectiveness depends on the quality and diversity of the generator.

VAST also cannot compare against an external Vary compiler. The execution paths share the same language specification but use independent implementations, so a specification-level bug (as opposed to an implementation bug) would not be caught because all paths would implement the same incorrect specification.