tl;dr: Vary exists as a JVM language because bytecode-level mutation eliminates per-mutant recompilation. Whether that translates to a measurable speedup depends on the workload. See the measured results in The Bytecode Mutation Thesis.


Most mutation testing tools prove an important idea and then hit the same wall: speed.

The idea is correct. If you want to know whether tests are actually protecting behavior, you change the code in small realistic ways and see whether the tests notice. That is a better signal than coverage.

The wall is operational. If each mutant means reparsing source, rebuilding the program, and rerunning too much of the suite, mutation testing becomes a thing teams admire and rarely run.

Vary exists because that tradeoff is not good enough anymore.

Source-level mutation is where the time goes

A traditional mutation workflow usually looks like this:

1. Parse source
2. Build an AST
3. Apply one mutation
4. Type-check again
5. Recompile again
6. Run tests
7. Repeat hundreds or thousands of times

That loop is expensive even when the compiler is fast. Most of the work is repeated for every mutant even though the program changed in only one tiny place.

This is why many mutation tools are respected but not habitual. Teams run them in CI, on a schedule, or on a subset of code. They do not treat them like a normal part of development because the latency is too high.

Bytecode mutation changes the cost model

Vary compiles to JVM bytecode first, then mutates at the bytecode level by default.

That changes the loop to something much smaller.

You are not reparsing source for each mutant. You are not rerunning the full semantic pipeline for each mutant. You are not regenerating all the same class files over and over. You are changing a very small compiled artifact and asking one question: would the tests notice this behavior change?

On the JVM, that is a practical thing to do. The bytecode format is stable, well understood, and built for machine processing. Patching one method is small and predictable work.

The difference is easier to see in table form:

Per-mutant workSource / AST mutationBytecode mutation
Parse source againYesNo
Type-check againYesNo
Generate bytecode againYesNo
Patch compiled methodNoYes
Load mutated classSometimesYes
Typical feedback shapeRecompile-heavyPatch-and-run

A controlled benchmark on the Frugal port measured bytecode mode vs. AST mode on a real parser library (1,844 mutants across 3,402 LOC). The headline result was a 1.01x speedup; test execution dominated, not compilation. But the architectural argument holds: bytecode mode eliminates recompilation by construction, and that advantage grows with compilation cost. On workloads where per-mutant compilation is the bottleneck (deep type inference, large module graphs), the gap should widen.

Fast mutation testing is the reason, not a side benefit

It is easy to tell this story backwards and say Vary targets the JVM because the JVM is mature, portable, and has a good runtime. That is true, but it is incomplete.

The stronger reason is that Vary wanted built-in mutation testing that is fast enough to matter.

If mutation testing stays slow, it becomes ceremonial:

If mutation is slowWhat teams do
Hours per runAvoid it except before releases
Too much recompilationRestrict it to tiny scopes
Long feedback loopTreat survivors as backlog, not immediate feedback
Expensive to repeatStop using it during normal development

If mutation is fast, the workflow changes:

If mutation is fastWhat teams can do
Seconds or minutes, not hoursRun it while code is still fresh
Changed-method cachingFocus on what actually moved
Cheap rerunsTighten tests and verify immediately
Normal developer latencyMake it part of the standard loop

That is the gap Vary is trying to close.

Why not just add this to an existing language?

Because a bolt-on tool is always fighting for information the compiler already has.

The compiler already knows:

Compiler knowledgeWhy it matters for mutation
Type informationHelps avoid nonsense mutations
Purity and effect informationHelps skip mutants no test could observe cleanly
Project structureHelps discover relevant tests
Build artifactsMakes incremental mutation practical
Test DSL semanticsMakes stronger reporting possible than pass/fail alone

When mutation testing lives outside the compiler, a tool has to rediscover or approximate all of that. Sometimes it does that well. But it is still reconstructing facts the language already knew.

Vary takes the opposite position: if mutation testing is important enough to trust, it is important enough to live inside the compiler and runtime model.

Why the JVM specifically

The JVM gives Vary two things at the same time.

First, it gives the ordinary platform benefits: mature garbage collection, JIT compilation, stable deployment, and a large ecosystem.

Second, and more important for this story, it gives Vary a mutation substrate that is fast and precise. The compiler can emit class files once, mutate methods directly, and reuse the rest of the pipeline.

That is why Vary's mutation engine has a bytecode mode and why the CLI defaults to --level bytecode. It is also why the infrastructure has a bytecode method-level cache: once mutation is happening at the method level, caching at the method level makes sense too.

This is not a random implementation detail. It is the architecture.

Why this matters more now

AI coding changes the economics of software creation. It is much easier to produce plausible code and shallow tests at scale. That makes verification quality more important, but it also makes verification cost more dangerous. A slow verification method will get skipped exactly when it is most needed.

So the standard is higher now. It is not enough for mutation testing to be theoretically better than coverage. It has to be fast enough to survive contact with real developer behavior.

That is the real argument for Vary.

Vary is not "a new language that happens to have mutation testing." It is a language shaped around the idea that mutation testing should be a normal, repeatable part of the build loop. Bytecode mutation on the JVM is what makes that realistic.

PageFocus
Why Mutation Testing Belongs in the CompilerWhy Vary builds mutation into the language instead of treating it as a plugin
How We Mutation Test the CompilerHow Vary uses different mutation strategies for Vary code and Kotlin compiler code
InfrastructureBytecode mode, method-level cache, and CLI behavior

More articles

What's new in Vary v122-alpha.1 v122-alpha.1 is out. The headline is vary var, a new top-level command that runs check, test, mutation, and review under a cost budget. The mutation engine was rewritten around reachability tracing, kill-first scheduling, and a hot-swap backend. Frugal, a native PEG parser library ported from Parsimonious, also lands.
Vary mutation testing speed: comparing to AST and PIT Vary now measures mutation-testing performance directly on real benchmark programs, including a project-scale parser workload and a PIT-style comparison fixture, and the current results are strong enough to talk about in concrete terms.