Bytecode Mutation Is Why Vary Uses the JVM

tl;dr: Vary exists as a JVM language because bytecode-level mutation eliminates per-mutant recompilation. Whether that translates to a measurable speedup depends on the workload. See the measured results in The Bytecode Mutation Thesis.

Most mutation testing tools prove an important idea and then hit the same wall: speed.

The idea is correct. If you want to know whether tests are actually protecting behavior, you change the code in small realistic ways and see whether the tests notice. That is a better signal than coverage.

The wall is operational. If each mutant means reparsing source, rebuilding the program, and rerunning too much of the suite, mutation testing becomes a thing teams admire and rarely run.

Vary exists because that tradeoff is not good enough anymore.

Source-level mutation is where the time goes

A traditional mutation workflow usually looks like this:

1. Parse source
2. Build an AST
3. Apply one mutation
4. Type-check again
5. Recompile again
6. Run tests
7. Repeat hundreds or thousands of times

That loop is expensive even when the compiler is fast. Most of the work is repeated for every mutant even though the program changed in only one tiny place.

This is why many mutation tools are respected but not habitual. Teams run them in CI, on a schedule, or on a subset of code. They do not treat them like a normal part of development because the latency is too high.

Bytecode mutation changes the cost model

Vary compiles to JVM bytecode first, then mutates at the bytecode level by default.

That changes the loop to something much smaller.

You are not reparsing source for each mutant. You are not rerunning the full semantic pipeline for each mutant. You are not regenerating all the same class files over and over. You are changing a very small compiled artifact and asking one question: would the tests notice this behavior change?

On the JVM, that is a practical thing to do. The bytecode format is stable, well understood, and built for machine processing. Patching one method is small and predictable work.

The difference is easier to see in table form:

Per-mutant work	Source / AST mutation	Bytecode mutation
Parse source again	Yes	No
Type-check again	Yes	No
Generate bytecode again	Yes	No
Patch compiled method	No	Yes
Load mutated class	Sometimes	Yes
Typical feedback shape	Recompile-heavy	Patch-and-run

A controlled benchmark on the Frugal port measured bytecode mode vs. AST mode on a real parser library (1,844 mutants across 3,402 LOC). The headline result was a 1.01x speedup; test execution dominated, not compilation. But the architectural argument holds: bytecode mode eliminates recompilation by construction, and that advantage grows with compilation cost. On workloads where per-mutant compilation is the bottleneck (deep type inference, large module graphs), the gap should widen.

Fast mutation testing is the reason, not a side benefit

It is easy to tell this story backwards and say Vary targets the JVM because the JVM is mature, portable, and has a good runtime. That is true, but it is incomplete.

The stronger reason is that Vary wanted built-in mutation testing that is fast enough to matter.

If mutation testing stays slow, it becomes ceremonial:

If mutation is slow	What teams do
Hours per run	Avoid it except before releases
Too much recompilation	Restrict it to tiny scopes
Long feedback loop	Treat survivors as backlog, not immediate feedback
Expensive to repeat	Stop using it during normal development

If mutation is fast, the workflow changes:

If mutation is fast	What teams can do
Seconds or minutes, not hours	Run it while code is still fresh
Changed-method caching	Focus on what actually moved
Cheap reruns	Tighten tests and verify immediately
Normal developer latency	Make it part of the standard loop

That is the gap Vary is trying to close.

Why not just add this to an existing language?

Because a bolt-on tool is always fighting for information the compiler already has.

The compiler already knows:

Compiler knowledge	Why it matters for mutation
Type information	Helps avoid nonsense mutations
Purity and effect information	Helps skip mutants no test could observe cleanly
Project structure	Helps discover relevant tests
Build artifacts	Makes incremental mutation practical
Test DSL semantics	Makes stronger reporting possible than pass/fail alone

When mutation testing lives outside the compiler, a tool has to rediscover or approximate all of that. Sometimes it does that well. But it is still reconstructing facts the language already knew.

Vary takes the opposite position: if mutation testing is important enough to trust, it is important enough to live inside the compiler and runtime model.

Why the JVM specifically

The JVM gives Vary two things at the same time.

First, it gives the ordinary platform benefits: mature garbage collection, JIT compilation, stable deployment, and a large ecosystem.

Second, and more important for this story, it gives Vary a mutation substrate that is fast and precise. The compiler can emit class files once, mutate methods directly, and reuse the rest of the pipeline.

That is why Vary's mutation engine has a bytecode mode and why the CLI defaults to --level bytecode. It is also why the infrastructure has a bytecode method-level cache: once mutation is happening at the method level, caching at the method level makes sense too.

This is not a random implementation detail. It is the architecture.

Why this matters more now

AI coding changes the economics of software creation. It is much easier to produce plausible code and shallow tests at scale. That makes verification quality more important, but it also makes verification cost more dangerous. A slow verification method will get skipped exactly when it is most needed.

So the standard is higher now. It is not enough for mutation testing to be theoretically better than coverage. It has to be fast enough to survive contact with real developer behavior.

That is the real argument for Vary.

Vary is not "a new language that happens to have mutation testing." It is a language shaped around the idea that mutation testing should be a normal, repeatable part of the build loop. Bytecode mutation on the JVM is what makes that realistic.

Page	Focus
Why Mutation Testing Belongs in the Compiler	Why Vary builds mutation into the language instead of treating it as a plugin
How We Mutation Test the Compiler	How Vary uses different mutation strategies for Vary code and Kotlin compiler code
Infrastructure	Bytecode mode, method-level cache, and CLI behavior