Why Mutation Testing Belongs in the Compiler

tl;dr: Most mutation testing tools bolt on after the fact. Vary builds it into the compilation pipeline.

Code coverage tells you which lines ran. It does not tell you whether your tests would catch a bug on those lines. Mutation testing asks a different question: if this code changed, would any test notice?

Existing tools are good

PIT for Java, mutmut for Python, Stryker for JavaScript: these are solid tools that have proven mutation testing works. They parse source code, apply transformations, and re-run your test suite for each mutant. They work.

But they all share the same constraint: they bolt on after the language is built. That means they parse your code a second time, maintain their own understanding of your project structure, and run outside the normal compile/test loop. The friction adds up:

Friction point	What happens
Slow feedback	Each mutant requires a full test run. 500 mutants with a 30-second suite takes over four hours.
No semantic awareness	External tools operate on syntax patterns. They cannot tell a pure computation from a function with side effects, so they waste time on mutants no test could detect.
Configuration overhead	You configure which files to mutate, which tests to run, and how to read results, all outside your normal build.

Building a new language is easier now

Here is the thing that changed: creating a new programming language is more practical than it used to be. Targeting the JVM gives you a garbage collector, a JIT compiler, a mature ecosystem of libraries, and deployment on any platform that runs Java. You skip years of runtime engineering and start at the language design level.

That is what Vary does. Instead of writing another mutation testing plugin for an existing language, we built a compiler where mutation testing is part of the pipeline from day one. The compiler already has the type system, already has the test runner, already has the build cache. Mutation testing just uses what is already there.

In Vary, running mutation tests is one command:

vary mutate .

The compiler generates mutants, runs the relevant tests for each, and reports which mutants survived. No plugin to install, no separate configuration file, no second parse of your source tree.

What this means in practice

Consider a function that computes a discount:

def apply_discount(price: Float, rate: Float) -> Float {
    return price * (1.0 - rate)
}

A mutation operator might change * to + or replace 1.0 - rate with 1.0. If your tests still pass after these changes, those tests are not verifying the discount calculation. They are checking that the function returns something.

This is the gap between coverage and confidence. Mutation testing closes it.

Bytecode mutation is what makes it fast

Most mutation tools work at the source level: parse the code, change the AST, re-compile, run tests. Vary can do AST-level mutation too, but each AST mutant needs a full recompile, overhead that bytecode mutation avoids entirely.

Bytecode mutation skips that entirely. Vary compiles your code once, then patches the JVM bytecode directly for each mutant. Swapping an IADD instruction for ISUB takes microseconds. There is no re-parsing, no re-type-checking, no re-generating bytecode from scratch. The JVM loads the patched class, runs the relevant tests, and moves on.

This is where targeting the JVM pays off a second time. The bytecode format is well-specified and stable. Patching a single instruction in a .class file is a small, predictable operation. The JVM's classloader handles the rest.

The result: per-mutant overhead drops from a full recompile to a single bytecode patch. In a controlled benchmark on a ported parser library, Vary processed 1,844 mutants in about four minutes, fast enough to run during normal development rather than overnight in CI.

Why it matters that the compiler owns it

When the compiler owns mutation testing, a few things fall out naturally. The type system already knows which expressions are pure, so the mutation engine skips mutants that no test could observe. The test runner already knows your test structure, so there is nothing to configure. The build cache already tracks file changes, so unchanged code is not re-mutated.

None of this requires a plugin author to reverse-engineer the language. It is all the same code path.

The mutation testing workflow guide covers how to get started. The operator reference lists every mutation Vary applies. The advanced guide goes deeper on bytecode-level mutation and caching.