tl;dr: Vary's VAST system generates random programs, executes them through an AST interpreter, an IR interpreter, and the JVM, and compares results. A disagreement between the IR and JVM paths revealed that 'let x: Int? = None' was silently corrupting the bytecode stack. No hand-written test covered this case.


You can test a compiler by writing programs and checking that they produce the right output. That works until it does not. The programs you think to write are the ones that exercise the paths you already thought about. The bugs that ship are in the paths nobody considered.

Fuzz testing flips the approach: instead of a human writing programs to test the compiler, the compiler generates programs to test itself.

This week, Vary's internal fuzzer found a bug that had been hiding in the bytecode generator for weeks. Two tests failed in the nightly build. Here is how the system works, what it found, and why the bug was invisible to every hand-written test in the suite.

What VAST does

VAST (Vary Automated Semantic Testing) is a differential testing system built into the Vary compiler's test suite. It has three parts:

A program generator that produces random but valid Vary programs from a seed number. Given seed 42, it always produces the same program. Given seed 43, a different one. The generator respects Vary's type system, so every program it creates is well-typed and should compile without errors.

Three execution paths that each run the generated program independently.

PathWhat it does
AST interpreterWalks the syntax tree directly
IR interpreterLowers the program to a flat register-based intermediate representation, then executes that
JVM pipelineType-checks, optimizes, generates bytecode, loads the class, and runs it

A checker that compares the results. If all three paths return the same value, the program passes. If any path disagrees, something is wrong.

The idea is simple: the AST interpreter is easy to get right because it is a straightforward tree walk. The IR interpreter validates the lowering step. The JVM path validates the entire compilation pipeline. When the JVM disagrees with the other two, the bug is in code generation.

What the nightly found

Two VAST tests failed in the nightly build:

FAIL: VastIrTranslationCheckerTest.complete 50 seeds all stages match()
FAIL: VastIrTranslationCheckerTest.types 100 seeds all stages match()

The first test generates 50 random programs using the "complete" profile (enums, data types, match expressions, try/catch, generics, nullable types, loops, the works). The second generates 100 programs using the "types" profile (similar but without match and exceptions). Each program is run through all three paths and the results are compared.

Out of 150 generated programs, 24 triggered a disagreement between the IR interpreter and the JVM. The IR interpreter returned correct values. The JVM either crashed during bytecode generation or produced bytecode that the JVM verifier rejected.

The programs that broke

The generator uses seed numbers, so the failures are reproducible. Seed 42 on the complete profile produced this program (simplified):

def __vast_compute() -> Int {
    let v0: Int? = None
    return if v0 == None { 0 } else { v0 }
}

The IR interpreter returned 0. The JVM rejected the bytecode with a verification error: it tried to read a local variable that was never initialized.

Seed 45 on the types profile produced a function containing this pattern inside a conditional block:

def foo() -> Str {
    mut x: Bool = False
    if x {
        let a: Int? = None
    }
    return "b"
}

The JVM crashed during bytecode generation itself, before the program ever ran.

Both failures share the same pattern: a nullable variable initialized to None.

What was wrong

When the Vary compiler generates bytecode for a variable declaration like let v0: Int? = None, two things need to happen. First, push null onto the JVM operand stack (the value None becomes Java's null). Then store that null into the local variable slot allocated for v0.

The compiler was doing the push but skipping the store. The relevant code looked like this:

generate the expression         -- pushes null onto the stack
if expression type is NoneType:
    skip the store              -- "nothing to store"

The skip was there for a legitimate reason: in a different code path, calling .unwrap() on an empty Result type already consumes the value from the stack internally. In that case, there really is nothing left to store. But the condition was too broad. It matched every expression with type None, including a plain None literal being assigned to an optional variable.

The result was two kinds of corruption:

An extra value on the stack. The null was pushed but never consumed by a store instruction. At branch merge points (after an if/else), the JVM expects both branches to leave the stack at the same height. The extra null on one branch made the heights disagree, and the bytecode framework crashed trying to reconcile them.

An uninitialized local variable slot. The variable was allocated a slot in the local variable table, but nothing was ever written to it. When later code tried to read the variable (like if v0 == None), the JVM verifier saw a read from an uninitialized slot and rejected the class.

The fix

One line:

before: if (exprType is NoneType || varType is NoneType)
after:  if ((exprType is NoneType || varType is NoneType) && varType !is OptionalType)

When the variable's type is optional (Int?, Str?, etc.), the null is a legitimate value that needs to be stored. The skip only applies when the variable's type is itself NoneType, which is the unwrap edge case that motivated the original code.

Why no hand-written test caught it

let x: Int? = None is an odd thing to write. Programmers initialize optional variables to None when they plan to assign a real value later, usually with mut. Writing let (immutable) with None and then immediately checking it is something only a random generator would produce. But the compiler has to handle it correctly regardless, because valid programs can reach this state through code generation, or patterns that look less obviously strange in context.

Fuzz testing is good at finding exactly this kind of bug: code that is technically valid, rarely written by hand, and exercises an interaction between features (nullable types, conditional blocks, the JVM verifier) that nobody thought to combine in a test.

The bug was not in a dark corner of the language. It was in variable declaration, one of the most common operations in any program. It just happened to require a specific combination of nullable type, None initializer, and conditional scope that nobody wrote by hand. The fuzzer wrote it in under a second.

More articles

What's new in Vary v122-alpha.1 v122-alpha.1 is out. The headline is vary var, a new top-level command that runs check, test, mutation, and review under a cost budget. The mutation engine was rewritten around reachability tracing, kill-first scheduling, and a hot-swap backend. Frugal, a native PEG parser library ported from Parsimonious, also lands.
Vary mutation testing speed: comparing to AST and PIT Vary now measures mutation-testing performance directly on real benchmark programs, including a project-scale parser workload and a PIT-style comparison fixture, and the current results are strong enough to talk about in concrete terms.