Other projects test compilers with random programs too. Here is how VAST relates to Csmith, Alive2, CompCert, LangFuzz, and others.
The idea of testing compilers with randomly generated programs is decades old. Several tools have demonstrated that random testing finds bugs that hand-written test suites miss. VAST builds on these ideas and adapts them for a specific context: a new language whose compiler is developed with LLM assistance.
| System | Lang | Strategy | Notable results |
|---|---|---|---|
| Csmith | C | Random C program generation, differential testing across GCC and Clang | Hundreds of compiler bugs found in production compilers |
| YARPGen | C/C++ | Random program generation targeting specific optimization patterns | Bugs in GCC, LLVM, and Intel compilers |
| LangFuzz | Multiple | Grammar-based fuzzing for language runtimes | Bugs in JavaScript engines, C compilers, and other runtimes |
| Alive2 | LLVM IR | Translation validation using SMT solvers | Proves or disproves that LLVM optimizations preserve semantics |
| CompCert | C | Formally verified compiler with machine-checked proofs | Correct compilation guaranteed within the verified subset |
| QuickCheck | Haskell | Property-based testing with shrinking | Widely used to discover bugs in Haskell libraries and occasionally GHC itself |
| Go fuzzing | Go | Built-in fuzzing support with coverage guidance | Panics, crashes, and edge-case runtime bugs in the Go standard library |
| VAST | Vary | Differential + metamorphic + mutation, multi-path comparison (3 or 4 paths) | Integrated into compiler CI, combines compiler testing with mutation-driven program generation |
Csmith and YARPGen generate random programs in C and C++ and test them across existing production compilers (GCC, Clang, ICC). Their strength is the diversity of compilers available for comparison.
VAST generates programs in Vary and compares across multiple execution paths within the same project. It cannot compare against an independent Vary compiler (there is only one), so it builds its own reference implementations: the AST interpreter and the IR interpreter.
| Csmith/YARPGen | VAST | |
|---|---|---|
| Comparison targets | Multiple production compilers | 3-4 internal execution paths |
| Language maturity | Established languages (C, C++) | New language under active development |
| External dependencies | Requires multiple compiler installations | Self-contained, ships with the compiler |
| Integration | Separate tool, run independently | Built into the compiler project and CI |
LangFuzz uses the target language's grammar to generate and mutate test inputs, then feeds them to the language runtime looking for crashes and assertion failures. It has found bugs in SpiderMonkey, V8, and other JavaScript engines, as well as C compilers.
VAST also generates programs from the language grammar, but instead of looking for crashes, it compares output across three independent execution paths. VAST also integrates mutation testing into its pipeline, using mutation operators to verify that its own comparison infrastructure catches injected faults.
| LangFuzz | VAST | |
|---|---|---|
| Generation | Grammar-based fuzzing | Type-directed AST generation |
| Oracle | Crash detection, assertion failures | Three-path differential comparison |
| Mutation role | Mutates inputs to explore edge cases | Validates detection infrastructure, expands program space |
| Integration | External tool | Built into the compiler |
Alive2 uses formal methods (SMT solvers) to prove that LLVM optimization passes preserve program semantics. It provides mathematical guarantees, not statistical confidence.
VAST uses testing, not proofs. It cannot guarantee the absence of bugs, only demonstrate their presence. The tradeoff is practical: Alive2 requires formal modelling of each optimization, while VAST tests the full pipeline end-to-end by generating programs and comparing outcomes.
| Alive2 | VAST | |
|---|---|---|
| Technique | SMT-based verification | Differential testing |
| Guarantee | Mathematical proof of correctness | Statistical confidence from many programs |
| Scope | Individual optimization passes | Full compilation pipeline |
| Effort to extend | Model each optimization formally | Add generators for new language features |
CompCert is a formally verified C compiler. Every compilation step has a machine-checked proof that it preserves semantics. This is the gold standard for compiler correctness.
VAST does not provide formal verification. It provides continuous empirical testing. The practical difference: CompCert proves correctness once and maintains the proofs as the compiler changes. VAST tests correctness continuously and finds bugs as they are introduced. Both are valuable. VAST is feasible for a project that cannot invest in formal verification.
| CompCert | VAST | |
|---|---|---|
| Approach | Formal verification (Coq proofs) | Empirical testing (random programs) |
| Coverage | Verified subset of C | Growing subset of Vary as generators are added |
| Bugs | Rare in verified passes (prevented by construction) | Detected at test time |
| Development cost | Very high (proofs for every pass) | Moderate (generators and interpreters) |
QuickCheck pioneered property-based testing with automatic shrinking. VAST borrows the idea of shrinking failing inputs to minimal reproducers, adapted for AST-level programs instead of data values.
VAST's reducer works on the same principle: when a randomly generated program triggers a bug, shrink it until removing any further piece makes the bug disappear.
| QuickCheck | VAST | |
|---|---|---|
| Input domain | Data values (integers, lists, etc.) | Complete programs (AST nodes) |
| Properties | User-defined function properties | Three-path agreement |
| Shrinking | Value-level (remove list elements, etc.) | AST-level (remove statements, simplify expressions) |
| Integration | Library, used per-project | Built into the compiler |
Go's built-in fuzzing generates random inputs for functions and tracks code coverage to guide generation toward untested paths. VAST does not currently use coverage guidance, though it is under consideration for future phases.
VAST differs in scope: it generates complete programs, not function inputs. And it uses differential comparison instead of crash detection.
Most compiler testing tools are external to the compiler project. Csmith is a separate project from GCC. Alive2 is separate from LLVM. LangFuzz is a standalone research tool. VAST ships with the Vary compiler.
This integration has practical benefits. VAST runs alongside unit tests and mutation tests in the same nightly build. The generator builds real AST nodes (the same types the parser produces), so there is no serialization gap between generation and execution. When a new language feature is added, the generator and interpreters are extended in the same change.
Most of the systems listed here focus on one technique. VAST combines differential testing, metamorphic testing, and mutation expansion in the same pipeline. Because Vary is a mutation-native language (mutation testing is built into the compiler, not bolted on), VAST can use mutation operators both to expand the program search space and to validate its own detection infrastructure. That combination is uncommon in compiler testing tools.
VAST is not a formal verification system. It cannot prove the absence of bugs. It finds bugs by testing many programs, and its effectiveness depends on the quality and diversity of the generator.
VAST also cannot compare against an external Vary compiler. The execution paths share the same language specification but use independent implementations, so a specification-level bug (as opposed to an implementation bug) would not be caught because all paths would implement the same incorrect specification.