Alpha. Vary is under active development and not ready for production use. Syntax, APIs, performance, and behaviour may change between releases.

Introduction

What is mutation testing?

Code coverage tells you which lines ran. It does not tell you whether your tests checked anything. A test that calls a function and ignores the return value gets 100% coverage and catches zero bugs.

Mutation testing answers a different question: if something in the code changed, would any test notice?

The compiler makes small changes to compiled bytecode and program semantics (flipping conditions, mutating field access, altering null checks, shifting loop boundaries) and runs your tests against each changed version. Each change is a mutant. If the tests still pass, that mutant survived, and your tests have a gap. If a test fails, the mutant was killed.

If you want the smallest concrete walkthrough before the internals, read Smallest example.

Bytecode mutation

Vary compiles to JVM bytecode, and mutation happens at the bytecode level. This is the whole point: the compiled bytecode is a flat stream of instructions (IADD for addition, ISUB for subtraction, IF_ICMPGT for greater-than), and mutating one instruction is a single byte change. No re-parsing, no re-compiling. The mutated class loads in an isolated classloader, tests run against it, and the classloader is discarded. The whole cycle takes milliseconds per mutant, and mutants run in parallel.

Most mutation testing tools rewrite source text or modify a syntax tree. That means re-compiling for every mutant. For a project with hundreds of mutations, the overhead adds up fast. Bytecode mutation skips all of that. A file with 30 mutants finishes in seconds, which makes mutation testing practical during development rather than something you run overnight in CI.

Bytecode mutation is also more precise. Source-level rewriting has to deal with formatting, comments, and syntax ambiguity. Two source changes that look different can produce the same bytecode, or one source change can accidentally affect multiple operations. Bytecode has none of that. Each instruction has a fixed meaning, and swapping one is an unambiguous change.

AST mutation

Note: AST mutation is not a focus of Vary. Bytecode mutation is the primary approach and covers the vast majority of use cases. AST mutation exists as a secondary tool for specific situations where bytecode operators are not enough.

Vary also supports AST-level mutation (--level ast) for those cases. AST mutation modifies the parsed syntax tree before compilation. Each mutant goes through constant folding, type checking, and bytecode generation, so it is slower (a full compile pass per mutant). But it has access to higher-level program structures that bytecode cannot see: removing entire statements, swapping function arguments, dropping list elements, and skipping control flow blocks.

AST mutation is a secondary tool. Use it when investigating specific survivor patterns or when you want the semantic operators (skip-effect, skip-block, drop-element, swap-args) that have no bytecode equivalent. You can also run both with --level both, which combines results and deduplicates.

LevelSpeedOperatorsParallelism
bytecode (default)Fast (milliseconds per mutant)6 bytecode operatorsParallel
astModerate (recompiles per mutant)27 AST operators (17 classic + 10 semantic)Sequential
bothSlower (runs both)All 33 operators, deduplicatedMixed

Running it

vary mutate calc.vary

The output shows a mutation score: the percentage of mutations your tests detected.

vary mutate src/

You can also run mutation testing across an entire directory.

What mutators do

The bytecode level has 6 operators:

MutatorExample
ArithmeticIADD becomes ISUB (covers int, long, float, double)
ConditionalIF_ICMPLT becomes IF_ICMPLE or IF_ICMPGE
Return valueFunctions return 0, 0L, 0.0, or null instead of computed results
NegationINEG removed (negation has no effect)
Call skipMethod calls removed, replaced with default return values
Return poisonFunctions return adversarial values like -1 or MAX_VALUE

The AST level has 27 operators: 17 classic operators, plus 10 semantic operators that understand program meaning:

MutatorExample
Arithmetic+ becomes -, * becomes /
Comparison> becomes >=, == becomes !=
BooleanTrue becomes False, and becomes or
Literal60 becomes 61, "" becomes "mutant"
Statement removalStatements replaced with pass
Boundary< becomes <= (off-by-one errors)
Return defaultreturn expr becomes return 0, return "", etc.
Skip effectSide-effecting calls like validate(data) replaced with pass
Skip blockif cond { body } becomes if cond { pass }
Drop element[a, b, c] becomes [b, c] or [a, c]
Swap argumentsf(a, b) becomes f(b, a)
Contract preconditionMutates expressions inside in {} blocks
Contract postconditionMutates expressions inside out(r) {} / post {} blocks
Enum replaceColor.Red becomes Color.Green or Color.Blue
Contract removeEntire in {} or out(r) {} block removed
Match swapMatch case bodies swapped with each other
Match patternMatch guard removed or pattern replaced with wildcard
Boundary shiftShifts loop bound and comparison together
Guard mismatchChecks wrong field in a guard condition
Field swapReads a sibling field instead of the intended one
Omitted readRemoves a field from a calculation
Duplicate fieldUses one field twice instead of two distinct fields
Misbound constructorSwaps constructor arguments with compatible types
Null weakenRemoves or weakens a null check
Null strengthenRemoves a null-safe fallback
Collection simplifyWeakens collection emptiness or membership check
Numeric boundaryShifts a numeric boundary or division type

Reading the output

After a mutation run, you see which mutants were killed (your tests caught them) and which survived (your tests missed them).

A surviving mutant means the compiler changed something and no test noticed. If a real bug made the same change, your tests would not catch it either.

The output includes three metrics beyond the raw score:

MetricMeaning
Kill rateFraction of mutants killed by tests
ObservabilityFraction where the behavioural change reached an oracle boundary at all (killed mutants plus weak-oracle survivors)
Actionable survivorsSurvivors worth investigating (excludes likely-equivalent mutants)

A survivor breakdown shows the composition: weak-oracle (behaviour was seen but assertions were too weak), unobserved (behaviour never reached a test oracle), equivalent-likely (mutation probably has no observable effect), and other.

You can gate CI on these metrics in vary.toml:

[mutation]
min_observability = 70.0       # Minimum observability score (0-100)
max_unobserved_survivors = 5   # Maximum unobserved survivors allowed

Drilling into survivors

The output includes a survivor groups table. To see individual mutants in a group, use --expand:

vary mutate src/ --expand "math#add"

To understand why a specific mutant survived, use --why with its ID (shown in the --expand output):

vary mutate math.vary --why "add:LIT_CHANGE:abc123"

This shows what was changed, where in the code, why your tests missed it, and what assertion would catch it.

For the full step-by-step workflow, see Golden path.

For all operators, flags, and advanced features, see Advanced overview.

What score to aim for

There is no universal target. 100% is not always practical. But below 60% usually means tests are not checking return values or are missing branches.

The --why output and the leverage fixes tell you more than the score itself. They point to exactly where the gaps are and what to write next.