Most mutation testing tools prove an important idea and then hit the same wall: speed.
The idea is correct. If you want to know whether tests are actually protecting behavior, you change the code in small realistic ways and see whether the tests notice. That is a better signal than coverage.
The wall is operational. If each mutant means reparsing source, rebuilding the program, and rerunning too much of the suite, mutation testing becomes a thing teams admire and rarely run.
Vary exists because that tradeoff is not good enough anymore.
A traditional mutation workflow usually looks like this:
1. Parse source
2. Build an AST
3. Apply one mutation
4. Type-check again
5. Recompile again
6. Run tests
7. Repeat hundreds or thousands of times
That loop is expensive even when the compiler is fast. Most of the work is repeated for every mutant even though the program changed in only one tiny place.
This is why many mutation tools are respected but not habitual. Teams run them in CI, on a schedule, or on a subset of code. They do not treat them like a normal part of development because the latency is too high.
Vary compiles to JVM bytecode first, then mutates at the bytecode level by default.
That changes the loop to something much smaller.
You are not reparsing source for each mutant. You are not rerunning the full semantic pipeline for each mutant. You are not regenerating all the same class files over and over. You are changing a very small compiled artifact and asking one question: would the tests notice this behavior change?
On the JVM, that is a practical thing to do. The bytecode format is stable, well understood, and built for machine processing. Patching one method is small and predictable work.
The difference is easier to see in table form:
| Per-mutant work | Source / AST mutation | Bytecode mutation |
|---|---|---|
| Parse source again | Yes | No |
| Type-check again | Yes | No |
| Generate bytecode again | Yes | No |
| Patch compiled method | No | Yes |
| Load mutated class | Sometimes | Yes |
| Typical feedback shape | Recompile-heavy | Patch-and-run |
A controlled benchmark on the Frugal port measured bytecode mode vs. AST mode on a real parser library (1,844 mutants across 3,402 LOC). The headline result was a 1.01x speedup; test execution dominated, not compilation. But the architectural argument holds: bytecode mode eliminates recompilation by construction, and that advantage grows with compilation cost. On workloads where per-mutant compilation is the bottleneck (deep type inference, large module graphs), the gap should widen.
It is easy to tell this story backwards and say Vary targets the JVM because the JVM is mature, portable, and has a good runtime. That is true, but it is incomplete.
The stronger reason is that Vary wanted built-in mutation testing that is fast enough to matter.
If mutation testing stays slow, it becomes ceremonial:
| If mutation is slow | What teams do |
|---|---|
| Hours per run | Avoid it except before releases |
| Too much recompilation | Restrict it to tiny scopes |
| Long feedback loop | Treat survivors as backlog, not immediate feedback |
| Expensive to repeat | Stop using it during normal development |
If mutation is fast, the workflow changes:
| If mutation is fast | What teams can do |
|---|---|
| Seconds or minutes, not hours | Run it while code is still fresh |
| Changed-method caching | Focus on what actually moved |
| Cheap reruns | Tighten tests and verify immediately |
| Normal developer latency | Make it part of the standard loop |
That is the gap Vary is trying to close.
Because a bolt-on tool is always fighting for information the compiler already has.
The compiler already knows:
| Compiler knowledge | Why it matters for mutation |
|---|---|
| Type information | Helps avoid nonsense mutations |
| Purity and effect information | Helps skip mutants no test could observe cleanly |
| Project structure | Helps discover relevant tests |
| Build artifacts | Makes incremental mutation practical |
| Test DSL semantics | Makes stronger reporting possible than pass/fail alone |
When mutation testing lives outside the compiler, a tool has to rediscover or approximate all of that. Sometimes it does that well. But it is still reconstructing facts the language already knew.
Vary takes the opposite position: if mutation testing is important enough to trust, it is important enough to live inside the compiler and runtime model.
The JVM gives Vary two things at the same time.
First, it gives the ordinary platform benefits: mature garbage collection, JIT compilation, stable deployment, and a large ecosystem.
Second, and more important for this story, it gives Vary a mutation substrate that is fast and precise. The compiler can emit class files once, mutate methods directly, and reuse the rest of the pipeline.
That is why Vary's mutation engine has a bytecode mode and why the CLI defaults to --level bytecode. It is also why the infrastructure has a bytecode method-level cache: once mutation is happening at the method level, caching at the method level makes sense too.
This is not a random implementation detail. It is the architecture.
AI coding changes the economics of software creation. It is much easier to produce plausible code and shallow tests at scale. That makes verification quality more important, but it also makes verification cost more dangerous. A slow verification method will get skipped exactly when it is most needed.
So the standard is higher now. It is not enough for mutation testing to be theoretically better than coverage. It has to be fast enough to survive contact with real developer behavior.
That is the real argument for Vary.
Vary is not "a new language that happens to have mutation testing." It is a language shaped around the idea that mutation testing should be a normal, repeatable part of the build loop. Bytecode mutation on the JVM is what makes that realistic.
| Page | Focus |
|---|---|
| Why Mutation Testing Belongs in the Compiler | Why Vary builds mutation into the language instead of treating it as a plugin |
| How We Mutation Test the Compiler | How Vary uses different mutation strategies for Vary code and Kotlin compiler code |
| Infrastructure | Bytecode mode, method-level cache, and CLI behavior |