VAST

Comparison with other systems

Other projects test compilers with random programs too. Here is how VAST relates to Csmith, Alive2, CompCert, LangFuzz, and others.

Compiler testing is a solved problem (sort of)

The idea of testing compilers with randomly generated programs is decades old. Several tools have demonstrated that random testing finds bugs that hand-written test suites miss. VAST builds on these ideas and adapts them for a specific context: a new language whose compiler is developed with LLM assistance.

Existing tools

SystemLangStrategyNotable results
CsmithCRandom C program generation, differential testing across GCC and ClangHundreds of compiler bugs found in production compilers
YARPGenC/C++Random program generation targeting specific optimization patternsBugs in GCC, LLVM, and Intel compilers
LangFuzzMultipleGrammar-based fuzzing for language runtimesBugs in JavaScript engines, C compilers, and other runtimes
Alive2LLVM IRTranslation validation using SMT solversProves or disproves that LLVM optimizations preserve semantics
CompCertCFormally verified compiler with machine-checked proofsCorrect compilation guaranteed within the verified subset
QuickCheckHaskellProperty-based testing with shrinkingWidely used to discover bugs in Haskell libraries and occasionally GHC itself
Go fuzzingGoBuilt-in fuzzing support with coverage guidancePanics, crashes, and edge-case runtime bugs in the Go standard library
VASTVaryDifferential + metamorphic + mutation, multi-path comparison (3 or 4 paths)Integrated into compiler CI, combines compiler testing with mutation-driven program generation

How VAST compares

Csmith and YARPGen

Csmith and YARPGen generate random programs in C and C++ and test them across existing production compilers (GCC, Clang, ICC). Their strength is the diversity of compilers available for comparison.

VAST generates programs in Vary and compares across multiple execution paths within the same project. It cannot compare against an independent Vary compiler (there is only one), so it builds its own reference implementations: the AST interpreter and the IR interpreter.

Csmith/YARPGenVAST
Comparison targetsMultiple production compilers3-4 internal execution paths
Language maturityEstablished languages (C, C++)New language under active development
External dependenciesRequires multiple compiler installationsSelf-contained, ships with the compiler
IntegrationSeparate tool, run independentlyBuilt into the compiler project and CI

LangFuzz

LangFuzz uses the target language's grammar to generate and mutate test inputs, then feeds them to the language runtime looking for crashes and assertion failures. It has found bugs in SpiderMonkey, V8, and other JavaScript engines, as well as C compilers.

VAST also generates programs from the language grammar, but instead of looking for crashes, it compares output across three independent execution paths. VAST also integrates mutation testing into its pipeline, using mutation operators to verify that its own comparison infrastructure catches injected faults.

LangFuzzVAST
GenerationGrammar-based fuzzingType-directed AST generation
OracleCrash detection, assertion failuresThree-path differential comparison
Mutation roleMutates inputs to explore edge casesValidates detection infrastructure, expands program space
IntegrationExternal toolBuilt into the compiler

Alive2

Alive2 uses formal methods (SMT solvers) to prove that LLVM optimization passes preserve program semantics. It provides mathematical guarantees, not statistical confidence.

VAST uses testing, not proofs. It cannot guarantee the absence of bugs, only demonstrate their presence. The tradeoff is practical: Alive2 requires formal modelling of each optimization, while VAST tests the full pipeline end-to-end by generating programs and comparing outcomes.

Alive2VAST
TechniqueSMT-based verificationDifferential testing
GuaranteeMathematical proof of correctnessStatistical confidence from many programs
ScopeIndividual optimization passesFull compilation pipeline
Effort to extendModel each optimization formallyAdd generators for new language features

CompCert

CompCert is a formally verified C compiler. Every compilation step has a machine-checked proof that it preserves semantics. This is the gold standard for compiler correctness.

VAST does not provide formal verification. It provides continuous empirical testing. The practical difference: CompCert proves correctness once and maintains the proofs as the compiler changes. VAST tests correctness continuously and finds bugs as they are introduced. Both are valuable. VAST is feasible for a project that cannot invest in formal verification.

CompCertVAST
ApproachFormal verification (Coq proofs)Empirical testing (random programs)
CoverageVerified subset of CGrowing subset of Vary as generators are added
BugsRare in verified passes (prevented by construction)Detected at test time
Development costVery high (proofs for every pass)Moderate (generators and interpreters)

QuickCheck and property-based testing

QuickCheck pioneered property-based testing with automatic shrinking. VAST borrows the idea of shrinking failing inputs to minimal reproducers, adapted for AST-level programs instead of data values.

VAST's reducer works on the same principle: when a randomly generated program triggers a bug, shrink it until removing any further piece makes the bug disappear.

QuickCheckVAST
Input domainData values (integers, lists, etc.)Complete programs (AST nodes)
PropertiesUser-defined function propertiesThree-path agreement
ShrinkingValue-level (remove list elements, etc.)AST-level (remove statements, simplify expressions)
IntegrationLibrary, used per-projectBuilt into the compiler

Go fuzzing

Go's built-in fuzzing generates random inputs for functions and tracks code coverage to guide generation toward untested paths. VAST does not currently use coverage guidance, though it is under consideration for future phases.

VAST differs in scope: it generates complete programs, not function inputs. And it uses differential comparison instead of crash detection.

What makes VAST unusual

Most compiler testing tools are external to the compiler project. Csmith is a separate project from GCC. Alive2 is separate from LLVM. LangFuzz is a standalone research tool. VAST ships with the Vary compiler.

This integration has practical benefits. VAST runs alongside unit tests and mutation tests in the same nightly build. The generator builds real AST nodes (the same types the parser produces), so there is no serialization gap between generation and execution. When a new language feature is added, the generator and interpreters are extended in the same change.

Most of the systems listed here focus on one technique. VAST combines differential testing, metamorphic testing, and mutation expansion in the same pipeline. Because Vary is a mutation-native language (mutation testing is built into the compiler, not bolted on), VAST can use mutation operators both to expand the program search space and to validate its own detection infrastructure. That combination is uncommon in compiler testing tools.

Limitations

VAST is not a formal verification system. It cannot prove the absence of bugs. It finds bugs by testing many programs, and its effectiveness depends on the quality and diversity of the generator.

VAST also cannot compare against an external Vary compiler. The execution paths share the same language specification but use independent implementations, so a specification-level bug (as opposed to an implementation bug) would not be caught because all paths would implement the same incorrect specification.

← VAST vs mutate
Coverage and confidence →