Vary's mutation engine goes beyond kill/survive counting. Six pillars work together to tell you why mutants survive, what to fix, and how strong your test suite actually is.
For the basics, see Introduction. For a worked example, see Golden path.
| Pillar | What it does | Deep dive |
|---|---|---|
| 1. Contracts | Runtime oracles that kill mutants automatically; contract clauses are also mutated to measure specification adequacy | Contracts |
| 2. Observability | Records reads, writes, assertions, and branches per test to classify why survivors escaped | Observability |
| 3. Oracle analysis | Builds a graph of observe statements → values → identifiers; validates oracle quality per test | Oracle analysis |
| 4. Effect classification | Tags functions with side-effect categories; skips nondeterministic mutants; seals randomness at test time | Effects |
| 5. Signatures & manifest | Gives every mutant a stable, refactor-proof identity; emits a machine-readable manifest for CI and tooling | Signatures |
| 6. Semantic mutation | Operators that understand program meaning (field access, nullability, loop boundaries, constructor bindings) and catch bugs that classic syntax-level mutation misses | Operators: Semantic |
When you run vary mutate, the engine performs these steps in order:
| Step | Phase | What happens |
|---|---|---|
| 1 | Discovery | Scans source for mutation sites, assigns expression IDs and stable signatures |
| 2 | Effect analysis | Classifies each function's effects, skips nondeterministic mutants (unless --unstable) |
| 3 | Mutation | Applies operators to generate mutants at AST and bytecode levels |
| 4 | Execution | Runs tests against each mutant; contract violations count as kills |
| 5 | Observation | If --observe is enabled, records trace data per test |
| 6 | Oracle validation | Classifies each test's oracle strength (STRONG / CONSTANT / NONE) |
| 7 | Scoring | Computes mutation score, contract adequacy, oracle coverage, effect stability |
| 8 | Integrity | Combines the four scores into a weighted composite grade (A to F) |
Contracts (in {}, out (r) {}, post {}) serve two roles in mutation testing:
| Role | How it works |
|---|---|
| Runtime oracle | A contract violation during a mutant run counts as a kill, without a specific test for it |
| Specification adequacy | Contract clauses are mutated with dedicated operators (contract_precondition, contract_postcondition, contract_remove) to measure whether tests exercise the boundaries contracts define |
A pure def function has no side effects. When a mutant survives in a pure function, the cause is always a missing assertion, never hidden state. The --why output skips side-effect explanations and points directly at what to test.
Classes with invariant {} blocks kill mutants that corrupt construction arguments, without any test asserting on fields directly.
| You notice... | Use this |
|---|---|
| Mutants survive but you don't know why | --why + --observe (Observability) |
| Tests pass but assertions are tautological | Lie detection |
| Contract exists but nothing tests its boundary | Contract adequacy score (Contracts) |
| Flaky mutation results | Effect classification (Effects) |
| Score changes after a refactor | Stable signatures (Signatures) |
No observe in a test | Oracle validation (Oracle analysis) |
| Need a CI quality gate | Integrity score + min_integrity in vary.toml (Signatures) |
| Want to know which survivors are worth investigating | Observability score (see below) |
The integrity score combines four metrics into a single grade:
| Component | Weight | Source |
|---|---|---|
| Mutation score | 40% | Fraction of mutants killed |
| Contract adequacy | 20% | Fraction of contract obligations defended |
| Oracle coverage | 20% | Fraction of tests with strong oracles |
| Effect stability | 20% | Fraction of functions with stable effects |
Grades: A (90 to 100), B (75 to 89), C (60 to 74), D (40 to 59), F (0 to 39). Set a minimum with min_integrity in vary.toml.
The observability score tracks whether behavioural changes reach an oracle, not just whether the oracle catches them. It reports three metrics:
| Metric | Definition |
|---|---|
| Kill rate | killed / total |
| Observability | (killed + weak-oracle survivors) / total |
| Actionable survivor rate | (all survivors − equivalent-likely) / total |
Observability is always greater than or equal to kill rate. The gap between them shows how many survivors were seen by a test but not caught. These are the easiest to fix (strengthen the assertion). Survivors classified as equivalent-likely reduce the actionable count but do not affect observability.
Set CI gates in vary.toml:
[mutation]
min_observability = 70.0 # Minimum observability percentage
max_unobserved_survivors = 5 # Maximum unobserved survivors allowed
| Page | Topic |
|---|---|
| Contracts | How contracts kill mutants and how contract adequacy works |
| Observability | Runtime tracing, differential detection, assertion groups |
| Oracle analysis | Oracle graph structure, determinism tags, oracle validation |
| Effects | Effect tags, nondeterministic filtering, runtime sealing |
| Signatures | Stable identities, expression IDs, mutation manifest |
| Infrastructure | Caching, quarantine, policy gates, certificates, CLI reference |