Alpha. Vary is under active development and not ready for production use. Syntax, APIs, performance, and behaviour may change between releases.
Introduction
What is mutation testing?
Code coverage tells you which lines ran. It does not tell you whether your tests checked anything. A test that calls a function and ignores the return value gets 100% coverage and catches zero bugs.
Mutation testing answers a different question: if something in the code changed, would any test notice?
The compiler makes small changes to compiled bytecode and program semantics (flipping conditions, mutating field access, altering null checks, shifting loop boundaries) and runs your tests against each changed version. Each change is a mutant. If the tests still pass, that mutant survived, and your tests have a gap. If a test fails, the mutant was killed.
If you want the smallest concrete walkthrough before the internals, read Smallest example.
Bytecode mutation
Vary compiles to JVM bytecode, and mutation happens at the bytecode level. This is the whole point: the compiled bytecode is a flat stream of instructions (IADD for addition, ISUB for subtraction, IF_ICMPGT for greater-than), and mutating one instruction is a single byte change. No re-parsing, no re-compiling. The mutated class loads in an isolated classloader, tests run against it, and the classloader is discarded. The whole cycle takes milliseconds per mutant, and mutants run in parallel.
Most mutation testing tools rewrite source text or modify a syntax tree. That means re-compiling for every mutant. For a project with hundreds of mutations, the overhead adds up fast. Bytecode mutation skips all of that. A file with 30 mutants finishes in seconds, which makes mutation testing practical during development rather than something you run overnight in CI.
Bytecode mutation is also more precise. Source-level rewriting has to deal with formatting, comments, and syntax ambiguity. Two source changes that look different can produce the same bytecode, or one source change can accidentally affect multiple operations. Bytecode has none of that. Each instruction has a fixed meaning, and swapping one is an unambiguous change.
AST mutation
Note: AST mutation is not a focus of Vary. Bytecode mutation is the primary approach and covers the vast majority of use cases. AST mutation exists as a secondary tool for specific situations where bytecode operators are not enough.
Vary also supports AST-level mutation (--level ast) for those cases. AST mutation modifies the parsed syntax tree before compilation. Each mutant goes through constant folding, type checking, and bytecode generation, so it is slower (a full compile pass per mutant). But it has access to higher-level program structures that bytecode cannot see: removing entire statements, swapping function arguments, dropping list elements, and skipping control flow blocks.
AST mutation is a secondary tool. Use it when investigating specific survivor patterns or when you want the semantic operators (skip-effect, skip-block, drop-element, swap-args) that have no bytecode equivalent. You can also run both with --level both, which combines results and deduplicates.
| Level | Speed | Operators | Parallelism |
|---|---|---|---|
bytecode (default) | Fast (milliseconds per mutant) | 6 bytecode operators | Parallel |
ast | Moderate (recompiles per mutant) | 27 AST operators (17 classic + 10 semantic) | Sequential |
both | Slower (runs both) | All 33 operators, deduplicated | Mixed |
Running it
vary mutate calc.vary
The output shows a mutation score: the percentage of mutations your tests detected.
vary mutate src/
You can also run mutation testing across an entire directory.
What mutators do
The bytecode level has 6 operators:
| Mutator | Example |
|---|---|
| Arithmetic | IADD becomes ISUB (covers int, long, float, double) |
| Conditional | IF_ICMPLT becomes IF_ICMPLE or IF_ICMPGE |
| Return value | Functions return 0, 0L, 0.0, or null instead of computed results |
| Negation | INEG removed (negation has no effect) |
| Call skip | Method calls removed, replaced with default return values |
| Return poison | Functions return adversarial values like -1 or MAX_VALUE |
The AST level has 27 operators: 17 classic operators, plus 10 semantic operators that understand program meaning:
| Mutator | Example |
|---|---|
| Arithmetic | + becomes -, * becomes / |
| Comparison | > becomes >=, == becomes != |
| Boolean | True becomes False, and becomes or |
| Literal | 60 becomes 61, "" becomes "mutant" |
| Statement removal | Statements replaced with pass |
| Boundary | < becomes <= (off-by-one errors) |
| Return default | return expr becomes return 0, return "", etc. |
| Skip effect | Side-effecting calls like validate(data) replaced with pass |
| Skip block | if cond { body } becomes if cond { pass } |
| Drop element | [a, b, c] becomes [b, c] or [a, c] |
| Swap arguments | f(a, b) becomes f(b, a) |
| Contract precondition | Mutates expressions inside in {} blocks |
| Contract postcondition | Mutates expressions inside out(r) {} / post {} blocks |
| Enum replace | Color.Red becomes Color.Green or Color.Blue |
| Contract remove | Entire in {} or out(r) {} block removed |
| Match swap | Match case bodies swapped with each other |
| Match pattern | Match guard removed or pattern replaced with wildcard |
| Boundary shift | Shifts loop bound and comparison together |
| Guard mismatch | Checks wrong field in a guard condition |
| Field swap | Reads a sibling field instead of the intended one |
| Omitted read | Removes a field from a calculation |
| Duplicate field | Uses one field twice instead of two distinct fields |
| Misbound constructor | Swaps constructor arguments with compatible types |
| Null weaken | Removes or weakens a null check |
| Null strengthen | Removes a null-safe fallback |
| Collection simplify | Weakens collection emptiness or membership check |
| Numeric boundary | Shifts a numeric boundary or division type |
Reading the output
After a mutation run, you see which mutants were killed (your tests caught them) and which survived (your tests missed them).
A surviving mutant means the compiler changed something and no test noticed. If a real bug made the same change, your tests would not catch it either.
The output includes three metrics beyond the raw score:
| Metric | Meaning |
|---|---|
| Kill rate | Fraction of mutants killed by tests |
| Observability | Fraction where the behavioural change reached an oracle boundary at all (killed mutants plus weak-oracle survivors) |
| Actionable survivors | Survivors worth investigating (excludes likely-equivalent mutants) |
A survivor breakdown shows the composition: weak-oracle (behaviour was seen but assertions were too weak), unobserved (behaviour never reached a test oracle), equivalent-likely (mutation probably has no observable effect), and other.
You can gate CI on these metrics in vary.toml:
[mutation]
min_observability = 70.0 # Minimum observability score (0-100)
max_unobserved_survivors = 5 # Maximum unobserved survivors allowed
Drilling into survivors
The output includes a survivor groups table. To see individual mutants in a group, use --expand:
vary mutate src/ --expand "math#add"
To understand why a specific mutant survived, use --why with its ID (shown in the --expand output):
vary mutate math.vary --why "add:LIT_CHANGE:abc123"
This shows what was changed, where in the code, why your tests missed it, and what assertion would catch it.
For the full step-by-step workflow, see Golden path.
For all operators, flags, and advanced features, see Advanced overview.
What score to aim for
There is no universal target. 100% is not always practical. But below 60% usually means tests are not checking return values or are missing branches.
The --why output and the leverage fixes tell you more than the score itself. They point to exactly where the gaps are and what to write next.