Alpha. Vary is under active development and not ready for production use. Syntax, APIs, performance, and behaviour may change between releases.
Infrastructure
This page covers the operational side of vary mutate: caching, equivalent mutant quarantine, policy gates, certificates, and the full CLI reference. For mutation operators, see Operators. For contracts, observability, and scoring, see Advanced overview and its deep-dive pages.
Caching
Bytecode method-level cache
Stored in .vary-mutation-cache-v2. Each method is individually hashed. When a method's hash matches the cache, its results are reused. Only changed methods get re-tested. Use --all to skip the cache and test everything.
AST mutation cache
Stored in .vary-mutation-cache. Cache key is SHA-256 of source + test content + operator set. Enabled with --incremental.
Equivalent mutant quarantine
Some mutants are semantically equivalent to the original (e.g., x + 0 == x) and can never be killed. Quarantine them:
vary mutate source.vary --quarantine "mutant-id" --quarantine-reason "equivalent: x+0 == x"
vary mutate source.vary --list-quarantined
vary mutate source.vary --unquarantine "mutant-id"
Quarantined mutants are excluded from the score and survivor lists.
Policy gates
Configure in vary.toml:
[mutation]
min_score = 80.0 # Minimum mutation score (0-100)
fail_on_decrease = true # Fail if score decreased from previous run
max_survivors = 5 # Maximum surviving mutants allowed
If any constraint is violated, the command exits with code 1.
Certificates
A mutation certificate is a tamper-evident record of a run:
vary mutate source.vary --tests test.vary --certify
Stored at .vary/runs/{runId}/certificate.json. Contains source hash, test hash, compiler version, and score. A hash of all fields is included so tampering with any value invalidates the certificate.
Result storage
Each source directory contains a .vary/ directory for testing artifacts:
.vary/
mutation-manifest.json # Discovery + results manifest (emitted on every build)
status.json # Latest run summary
history.jsonl # Append-only ledger of all runs
equivalent-mutants.json # Quarantined mutant IDs
runs/
<run-id>/
certificate.json # Mutation certificate
Module mutation budgets
Mutation thresholds can be defined per module or path. The first matching pattern wins.
[mutation]
min_kill = 70
[[mutation.module]]
pattern = "core/**"
min_kill = 90
[[mutation.module]]
pattern = "legacy/**"
min_kill = 40
When vary mutate runs in project mode, each module's score is checked against its threshold. If any module falls below, the command exits with code 1. Set a low bar for legacy code, a high bar for critical paths, and ratchet up over time.
Operator controls
Select which mutation operators run globally or per module:
[mutation]
operators = ["CLASSIC", "CONTRACT"]
[[mutation.module]]
pattern = "api/**"
disable = ["ENUM_REPLACE"]
Operator controls let teams tune mutation noise and isolate new operator families. The --operators CLI flag overrides vary.toml for a single run. Use --list-operators to see all available operators and group aliases.
Timeout-aware diagnosis
Mutants that exceed the per-mutant timeout are classified separately from killed and survived:
Timeout mutants: 4
math.vary:23 factorial() ARITHMETIC (likely infinite loop)
math.vary:45 power() BOUNDARY (likely infinite loop)
Timeouts often indicate infinite-loop mutants (e.g., a loop condition flipped from < to >=). These are not counted as survivors but are reported so you can verify they are not masking real bugs. Configure the timeout with --timeout <ms> (default: 5000ms) or set a run-level time budget with --budget (e.g., --budget 2m).
Trend snapshots
Save a named snapshot after a mutation run and compare it to previous snapshots:
vary mutate src/ --snapshot v1.0
vary mutate src/ --snapshot v1.1 --compare v1.0
Snapshots are stored in .vary/runs/ and include kill rate, timeout count, and classification breakdown per module. The comparison output shows regressions:
Mutation trend (v1.0 → v1.1)
Kill rate: 82% → 87% (+5%)
Timeout mutants: 6 → 2 (-4)
Weak-oracle survivors: 11 → 7 (-4)
Use --trend without --compare to compare against the most recent previous snapshot.
Mutation heatmaps
The --heatmap flag generates a per-line and per-function mutation density report:
Heatmap: math.vary
Line Source Mutants Killed Survived
23 let total = price * qty 5 3 2 ██░░
45 if amount > threshold { 3 3 0 ████
67 return balance - fee 4 1 3 █░░░
Hotspot functions:
factorial() 60% kill rate (3/5)
withdraw() 85% kill rate (17/20)
Red-heavy lines (low kill rate) are the weakest regions in the file. Focus testing effort there first.
Quick mode
--quick limits mutations to relational and literal operators, caps at 20 mutants. Good for a rough score without waiting for all operators.
Default file cap and --all
By default, mutation runs cap at 200 mutants per file to keep interactive runs fast on large files. Use --all to restore exhaustive behavior (every possible mutant), or --quick to drop to 20.
Output modes
--output <text|log|json|html> controls how results are rendered during and after the run.
| Mode | Behavior |
|---|---|
text (default) | Live in-place spinner with elapsed time, tested/total mutant counters, and a refining ETA. Uses ANSI escape codes and installs a Ctrl+C shutdown hook to restore the terminal cleanly. |
log | Plain lines, one per event, suitable for CI log scraping. |
json | JSONL event stream for programmatic consumers. |
html | Writes a report file. |
Before the run starts, the engine pre-scans all files to count mutants and uses historical avgMutantMs from the run history to print an up-front wall-clock estimate. The ETA refines as actual throughput comes in.
Performance
| Optimization | Effect |
|---|---|
| Shared module resolution | Imported modules are type-checked once per run rather than once per mutant. |
| Early exit on failure | Remaining tests are skipped when a mutant is already proven killed. |
| Adaptive per-test timeout | 10× the baseline execution time, clamped between 1s and 5s. Prevents infinite loops while giving legitimately slow tests room. |
| Warm in-process workers | Long-lived workers keep a compiled classloader resident across mutant batches instead of rebuilding per mutant. |
| Worker-local test invocation cache | Reflective discovery of __vtest_names__, __vtest_methods__, and the per-test timeout executor is paid once per worker instead of once per mutant. |
redefine first | When an instrumentation agent is attached, eligible mutants take the in-process redefineClasses path before considering a fresh classloader. |
| Patch-plan reuse | Sibling mutants on the same compiled method share a cached ASM ClassNode so parsing and tree-building happen once per method, not once per sibling. |
Warm in-process worker model
The compiled bytecode backends (hot-swap, redefine) execute every mutant inside a warm, in-process worker. The worker is created once per batch: it compiles the target module, loads the baseline class bytes into a MutationClassLoader, runs the baseline test pass, and then cycles through mutants without tearing the classloader down between them. Full-loader rebuilds are attributed to two explicit causes only: worker recreation after poison recovery (wall-clock escape hatch or state-leak detection) or an explicit fresh-loader fallback.
Every artifact produced by a compiled run surfaces initialRebuild, poisonRebuilds, baseLoaderRebuilds, and unaccountedRebuilds so the warm-worker contract is visible in telemetry. Outside poison recovery and documented fallbacks, unaccountedRebuilds is always 0 on the pinned short-loop corpus.
Worker-local test invocation cache
Reflective discovery of the generated test dispatcher (__vtest_names__, __vtest_methods__, and each individual test method handle) used to run once per mutant. The warm worker now opens a single invocation cache after baseline setup, reuses it across every mutant in the batch, and closes it in the worker's teardown path. The daemon executor used for per-test timeout dispatch is bound to the same cache so it is not rebuilt around every test run either.
The effect is visible as the testDispatchMs phase in phaseTimings: on the pinned short-loop fixture the accumulator drops ~30% versus the pre-wave baseline, and the structural counters (namesInvocationCount, methodsInvocationCount, executorReuseCount) prove the cache is wired regardless of timing noise.
redefine-first policy
When --backend redefine is selected and an instrumentation agent is attached, eligible mutants take the Instrumentation.redefineClasses path as the normal execution mode. The fresh-loader path becomes a fallback rather than the primary loop. Every per-mutant result records one of redefine, hotswap, ineligible:<reason>, fallback:<detail>, or fresh-loader in swapOutcome so eligible versus ineligible versus runtime-triggered fallback are never aggregated together.
Ineligible mutants (for example structural changes, constructor edits, or <clinit> mutations) are pre-filtered and recorded as ineligible:<reason> so they do not inflate the runtime-fallback bucket. Worker recreation after poison does not force unnecessary fresh-loader use: when the worker is retired and recreated, the new warm loader continues on the redefine hot path. The redefine hot path also batches a revert-of-previous-class and the next mutation into a single redefineClasses varargs call when consecutive mutants target different classes; for same-class sibling mutants the new mutation implicitly overwrites the previous one and no revert is owed.
See Strict mode → Bytecode backends for backend selection and the full swap-outcome table, Parity gate for the cross-backend classification invariant, and Benchmark for the pinned benchmark matrix that measures redefine-vs-fresh-loader wall time.
Deduplication
After testing, killed mutants with identical kill signatures (same operator type, same set of killing tests) are deduplicated. Only one representative per signature is kept. Survivors and errors are never deduped. Disable with --no-dedup.
CLI reference
| Flag | Default | Description |
|---|---|---|
--level <ast\|bytecode\|both> | bytecode | Mutation level |
--operators <list> | all 27 | AST operators (comma-separated, or group aliases) |
--bc-operators <list> | all 6 | Bytecode operators (comma-separated) |
--output <text\|log\|json\|html> | text | Output mode (live spinner / CI log / JSONL / HTML report) |
-v, --verbose | off | Verbose output with OP and HINT columns |
-i, --incremental | off | Reuse cached AST results |
--quick | off | Quick mode: relational + literal only, max 20 mutants/file |
--all | off | Exhaustive mode: override the default 200 mutants/file cap |
-j, --parallel <n> | CPU count | Parallel worker count (bytecode level) |
--no-dedup | off | Keep redundant mutants |
--certify | off | Generate mutation certificate |
--why <id> | none | Explain why a mutant survived |
--expand <group> | none | Show individual mutants in a group |
--replay <id> | none | Re-run a single mutation |
--top <n> | 20 | Number of survivor groups to show |
--group <key> | function | Group by: function, file, or cause |
--observe | off | Enable runtime observability |
--differential | off | Enable differential trace detection |
--quarantine <id> | none | Quarantine a surviving mutant |
--list-quarantined | off | List quarantined mutants |
--unstable | off | Include nondeterministic mutants (skipped by default) |
--no-manifest | off | Disable mutation manifest emission (on vary run) |
--snapshot <tag> | none | Save a named snapshot after the run |
--compare <tag> | none | Compare results to a previous snapshot |
--heatmap | off | Generate per-line and per-function mutation heatmap |
--budget <duration> | none | Time budget for the run (e.g., 30s, 5m) |
--timeout <ms> | 5000 | Per-mutant timeout in milliseconds |
--auto-quarantine <threshold> | none | Auto-quarantine mutants above confidence threshold |
--no-prioritize | off | Disable data-flow-based mutant prioritization |
--kill-map | off | Output per-test kill map |
--baseline <path> | none | Baseline manifest JSON for delta reporting |
--strict-tests | off | Promote missing-observe warnings to errors |
--list-operators | off | List all available operators and group aliases |
--strict-selection <mode> | evidence | Strict-mode test selection: evidence or reference |
--reachability | off | Record per-test method reachability during baseline |
--warm-workers <on\|off> | on | Reuse long-lived workers across mutant batches |
--fresh-workers | off | Alias for --warm-workers=off |
--backend <name> | fresh-loader | Bytecode backend: fresh-loader, hot-swap, or redefine |
--relevance-graph-path <path> | .vary/relevance/ | Override location of the persisted relevance graph |
--explain | off | Include selection and scheduling explanation in the report |
--incremental-infer | off | Reuse prior mutant outcomes across runs (Incremental inference) |
--fast-mode | off | Heuristic test-filter narrowing with validated fallback (Fast mode) |
--fast-mode-narrow-factor <f> | 0.5 | Fraction of conservative test filter kept under fast mode |
--fast-mode-sample-size <n> | 10 | Mutants re-run with the broader filter to measure miss rate |
--fast-mode-miss-threshold <t> | 0.10 | Miss rate above which fast mode falls back automatically |
See Strict mode for evidence-based selection, warm workers, and bytecode backend details, Parity gate for cross-backend classification parity, Fast mode for validated test-filter narrowing, Incremental inference for per-mutant outcome reuse, Survivor tail for flake-detection rerun accounting, and Benchmark for the parity benchmark harness.