Alpha. Vary is under active development and not ready for production use. Syntax, APIs, performance, and behaviour may change between releases.

Introduction

What is mutation testing?

Code coverage tells you which lines ran. It does not tell you whether your tests checked anything. A test that calls a function and ignores the return value gets 100% coverage and catches zero bugs.

Mutation testing answers a different question: if something in the code changed, would any test notice?

The compiler makes small changes to compiled bytecode and program semantics (flipping conditions, mutating field access, altering null checks, shifting loop boundaries) and runs your tests against each changed version. Each change is a mutant. If the tests still pass, that mutant survived, and your tests have a gap. If a test fails, the mutant was killed.

If you want the smallest concrete walkthrough before the internals, read Smallest example.

Bytecode mutation

Vary compiles to JVM bytecode, and mutation happens at the bytecode level. This is the whole point: the compiled bytecode is a flat stream of instructions (IADD for addition, ISUB for subtraction, IF_ICMPGT for greater-than), and mutating one instruction is a single byte change. No re-parsing, no re-compiling. The mutated class loads in an isolated classloader, tests run against it, and the classloader is discarded. The whole cycle takes milliseconds per mutant, and mutants run in parallel.

Most mutation testing tools rewrite source text or modify a syntax tree. That means re-compiling for every mutant. For a project with hundreds of mutations, the overhead adds up fast. Bytecode mutation skips all of that. A file with 30 mutants finishes in seconds, which makes mutation testing practical during development rather than something you run overnight in CI.

Bytecode mutation is also more precise. Source-level rewriting has to deal with formatting, comments, and syntax ambiguity. Two source changes that look different can produce the same bytecode, or one source change can accidentally affect multiple operations. Bytecode has none of that. Each instruction has a fixed meaning, and swapping one is an unambiguous change.

AST mutation

Vary also supports AST-level mutation (--level ast) for those cases. AST mutation modifies the parsed syntax tree before compilation. Each mutant goes through constant folding, type checking, and bytecode generation, so it is slower (a full compile pass per mutant). But it has access to higher-level program structures that bytecode cannot see: removing entire statements, swapping function arguments, dropping list elements, and skipping control flow blocks.

AST mutation is a secondary tool. Use it when investigating specific survivor patterns or when you want the semantic operators (skip-effect, skip-block, drop-element, swap-args) that have no bytecode equivalent. You can also run both with --level both, which combines results and deduplicates.

Level	Speed	Operators	Parallelism
`bytecode` (default)	Fast (milliseconds per mutant)	6 bytecode operators	Parallel
`ast`	Moderate (recompiles per mutant)	27 AST operators (17 classic + 10 semantic)	Sequential
`both`	Slower (runs both)	All 33 operators, deduplicated	Mixed

Running it

vary mutate calc.vary

The output shows a mutation score: the percentage of mutations your tests detected.

vary mutate src/

You can also run mutation testing across an entire directory.

What mutators do

The bytecode level has 6 operators:

Mutator	Example
Arithmetic	`IADD` becomes `ISUB` (covers int, long, float, double)
Conditional	`IF_ICMPLT` becomes `IF_ICMPLE` or `IF_ICMPGE`
Return value	Functions return `0`, `0L`, `0.0`, or `null` instead of computed results
Negation	`INEG` removed (negation has no effect)
Call skip	Method calls removed, replaced with default return values
Return poison	Functions return adversarial values like `-1` or `MAX_VALUE`

The AST level has 27 operators: 17 classic operators, plus 10 semantic operators that understand program meaning:

Mutator	Example
Arithmetic	`+` becomes `-`, `*` becomes `/`
Comparison	`>` becomes `>=`, `==` becomes `!=`
Boolean	`True` becomes `False`, `and` becomes `or`
Literal	`60` becomes `61`, `""` becomes `"mutant"`
Statement removal	Statements replaced with `pass`
Boundary	`<` becomes `<=` (off-by-one errors)
Return default	`return expr` becomes `return 0`, `return ""`, etc.
Skip effect	Side-effecting calls like `validate(data)` replaced with `pass`
Skip block	`if cond { body }` becomes `if cond { pass }`
Drop element	`[a, b, c]` becomes `[b, c]` or `[a, c]`
Swap arguments	`f(a, b)` becomes `f(b, a)`
Contract precondition	Mutates expressions inside `in {}` blocks
Contract postcondition	Mutates expressions inside `out(r) {}` / `post {}` blocks
Enum replace	`Color.Red` becomes `Color.Green` or `Color.Blue`
Contract remove	Entire `in {}` or `out(r) {}` block removed
Match swap	Match case bodies swapped with each other
Match pattern	Match guard removed or pattern replaced with wildcard
Boundary shift	Shifts loop bound and comparison together
Guard mismatch	Checks wrong field in a guard condition
Field swap	Reads a sibling field instead of the intended one
Omitted read	Removes a field from a calculation
Duplicate field	Uses one field twice instead of two distinct fields
Misbound constructor	Swaps constructor arguments with compatible types
Null weaken	Removes or weakens a null check
Null strengthen	Removes a null-safe fallback
Collection simplify	Weakens collection emptiness or membership check
Numeric boundary	Shifts a numeric boundary or division type

Reading the output

After a mutation run, you see which mutants were killed (your tests caught them) and which survived (your tests missed them).

A surviving mutant means the compiler changed something and no test noticed. If a real bug made the same change, your tests would not catch it either.

The output includes three metrics beyond the raw score:

Metric	Meaning
Kill rate	Fraction of mutants killed by tests
Observability	Fraction where the behavioural change reached an oracle boundary at all (killed mutants plus weak-oracle survivors)
Actionable survivors	Survivors worth investigating (excludes likely-equivalent mutants)

A survivor breakdown shows the composition: weak-oracle (behaviour was seen but assertions were too weak), unobserved (behaviour never reached a test oracle), equivalent-likely (mutation probably has no observable effect), and other.

You can gate CI on these metrics in vary.toml:

[mutation]
min_observability = 70.0       # Minimum observability score (0-100)
max_unobserved_survivors = 5   # Maximum unobserved survivors allowed

Drilling into survivors

The output includes a survivor groups table. To see individual mutants in a group, use --expand:

vary mutate src/ --expand "math#add"

To understand why a specific mutant survived, use --why with its ID (shown in the --expand output):

vary mutate math.vary --why "add:LIT_CHANGE:abc123"

This shows what was changed, where in the code, why your tests missed it, and what assertion would catch it.

For the full step-by-step workflow, see Golden path.

For all operators, flags, and advanced features, see Advanced overview.

What score to aim for

There is no universal target. 100% is not always practical. But below 60% usually means tests are not checking return values or are missing branches.

The --why output and suggested fixes tell you more than the score itself. They point to exactly where the gaps are and what to write next.

Overview Smallest Example