Alpha. Vary is under active development and not ready for production use. Syntax, APIs, performance, and behaviour may change between releases.

Infrastructure

This page covers the operational side of vary mutate: caching, equivalent mutant quarantine, policy gates, certificates, and the full CLI reference. For mutation operators, see Operators. For contracts, observability, and scoring, see Advanced overview and its deep-dive pages.

Caching

Bytecode method-level cache

Stored in .vary-mutation-cache-v2. Each method is individually hashed. When a method's hash matches the cache, its results are reused. Only changed methods get re-tested. Use --all to skip the cache and test everything.

AST mutation cache

Stored in .vary-mutation-cache. Cache key is SHA-256 of source + test content + operator set. Enabled with --incremental.

Equivalent mutant quarantine

Some mutants are semantically equivalent to the original (e.g., x + 0 == x) and can never be killed. Quarantine them:

vary mutate source.vary --quarantine "mutant-id" --quarantine-reason "equivalent: x+0 == x"
vary mutate source.vary --list-quarantined
vary mutate source.vary --unquarantine "mutant-id"

Quarantined mutants are excluded from the score and survivor lists.

Policy gates

Configure in vary.toml:

[mutation]
min_score = 80.0        # Minimum mutation score (0-100)
fail_on_decrease = true  # Fail if score decreased from previous run
max_survivors = 5        # Maximum surviving mutants allowed

If any constraint is violated, the command exits with code 1.

Certificates

A mutation certificate is a tamper-evident record of a run:

vary mutate source.vary --tests test.vary --certify

Stored at .vary/runs/{runId}/certificate.json. Contains source hash, test hash, compiler version, and score. A hash of all fields is included so tampering with any value invalidates the certificate.

Result storage

Each source directory contains a .vary/ directory for testing artifacts:

.vary/
  mutation-manifest.json       # Discovery + results manifest (emitted on every build)
  status.json                  # Latest run summary
  history.jsonl                # Append-only ledger of all runs
  equivalent-mutants.json      # Quarantined mutant IDs
  runs/
    <run-id>/
      certificate.json         # Mutation certificate

Module mutation budgets

Mutation thresholds can be defined per module or path. The first matching pattern wins.

[mutation]
min_kill = 70

[[mutation.module]]
pattern = "core/**"
min_kill = 90

[[mutation.module]]
pattern = "legacy/**"
min_kill = 40

When vary mutate runs in project mode, each module's score is checked against its threshold. If any module falls below, the command exits with code 1. Set a low bar for legacy code, a high bar for critical paths, and ratchet up over time.

Operator controls

Select which mutation operators run globally or per module:

[mutation]
operators = ["CLASSIC", "CONTRACT"]

[[mutation.module]]
pattern = "api/**"
disable = ["ENUM_REPLACE"]

Operator controls let teams tune mutation noise and isolate new operator families. The --operators CLI flag overrides vary.toml for a single run. Use --list-operators to see all available operators and group aliases.

Timeout-aware diagnosis

Mutants that exceed the per-mutant timeout are classified separately from killed and survived:

Timeout mutants: 4
  math.vary:23  factorial()  ARITHMETIC  (likely infinite loop)
  math.vary:45  power()      BOUNDARY    (likely infinite loop)

Timeouts often indicate infinite-loop mutants (e.g., a loop condition flipped from < to >=). These are not counted as survivors but are reported so you can verify they are not masking real bugs. Configure the timeout with --timeout <ms> (default: 5000ms) or set a run-level time budget with --budget (e.g., --budget 2m).

Trend snapshots

Save a named snapshot after a mutation run and compare it to previous snapshots:

vary mutate src/ --snapshot v1.0
vary mutate src/ --snapshot v1.1 --compare v1.0

Snapshots are stored in .vary/runs/ and include kill rate, timeout count, and classification breakdown per module. The comparison output shows regressions:

Mutation trend (v1.0 → v1.1)

  Kill rate:              82% → 87%  (+5%)
  Timeout mutants:         6 → 2    (-4)
  Weak-oracle survivors:  11 → 7    (-4)

Use --trend without --compare to compare against the most recent previous snapshot.

Mutation heatmaps

The --heatmap flag generates a per-line and per-function mutation density report:

Heatmap: math.vary

  Line  Source                          Mutants  Killed  Survived
  23    let total = price * qty         5        3       2        ██░░
  45    if amount > threshold {         3        3       0        ████
  67    return balance - fee            4        1       3        █░░░

Hotspot functions:
  factorial()    60% kill rate (3/5)
  withdraw()     85% kill rate (17/20)

Red-heavy lines (low kill rate) are the weakest regions in the file. Focus testing effort there first.

Quick mode

--quick limits mutations to relational and literal operators, caps at 20 mutants. Good for a rough score without waiting for all operators.

Default file cap and `--all`

By default, mutation runs cap at 200 mutants per file to keep interactive runs fast on large files. Use --all to restore exhaustive behavior (every possible mutant), or --quick to drop to 20.

Output modes

--output <text|log|json|html> controls how results are rendered during and after the run.

Mode	Behavior
`text` (default)	Live in-place spinner with elapsed time, tested/total mutant counters, and a refining ETA. Uses ANSI escape codes and installs a Ctrl+C shutdown hook to restore the terminal cleanly.
`log`	Plain lines, one per event, suitable for CI log scraping.
`json`	JSONL event stream for programmatic consumers.
`html`	Writes a report file.

Before the run starts, the engine pre-scans all files to count mutants and uses historical avgMutantMs from the run history to print an up-front wall-clock estimate. The ETA refines as actual throughput comes in.

Performance

Optimization	Effect
Shared module resolution	Imported modules are type-checked once per run rather than once per mutant.
Early exit on failure	Remaining tests are skipped when a mutant is already proven killed.
Adaptive per-test timeout	10× the baseline execution time, clamped between 1s and 5s. Prevents infinite loops while giving legitimately slow tests room.
Warm in-process workers	Long-lived workers keep a compiled classloader resident across mutant batches instead of rebuilding per mutant.
Worker-local test invocation cache	Reflective discovery of `__vtest_names__`, `__vtest_methods__`, and the per-test timeout executor is paid once per worker instead of once per mutant.
`redefine` first	When an instrumentation agent is attached, eligible mutants take the in-process `redefineClasses` path before considering a fresh classloader.
Patch-plan reuse	Sibling mutants on the same compiled method share a cached ASM `ClassNode` so parsing and tree-building happen once per method, not once per sibling.

Warm in-process worker model

The compiled bytecode backends (hot-swap, redefine) execute every mutant inside a warm, in-process worker. The worker is created once per batch: it compiles the target module, loads the baseline class bytes into a MutationClassLoader, runs the baseline test pass, and then cycles through mutants without tearing the classloader down between them. Full-loader rebuilds are attributed to two explicit causes only: worker recreation after poison recovery (wall-clock escape hatch or state-leak detection) or an explicit fresh-loader fallback.

Every artifact produced by a compiled run surfaces initialRebuild, poisonRebuilds, baseLoaderRebuilds, and unaccountedRebuilds so the warm-worker contract is visible in telemetry. Outside poison recovery and documented fallbacks, unaccountedRebuilds is always 0 on the pinned short-loop corpus.

Worker-local test invocation cache

Reflective discovery of the generated test dispatcher (__vtest_names__, __vtest_methods__, and each individual test method handle) used to run once per mutant. The warm worker now opens a single invocation cache after baseline setup, reuses it across every mutant in the batch, and closes it in the worker's teardown path. The daemon executor used for per-test timeout dispatch is bound to the same cache so it is not rebuilt around every test run either.

The effect is visible as the testDispatchMs phase in phaseTimings: on the pinned short-loop fixture the accumulator drops ~30% versus the pre-wave baseline, and the structural counters (namesInvocationCount, methodsInvocationCount, executorReuseCount) prove the cache is wired regardless of timing noise.

`redefine`-first policy

When --backend redefine is selected and an instrumentation agent is attached, eligible mutants take the Instrumentation.redefineClasses path as the normal execution mode. The fresh-loader path becomes a fallback rather than the primary loop. Every per-mutant result records one of redefine, hotswap, ineligible:<reason>, fallback:<detail>, or fresh-loader in swapOutcome so eligible versus ineligible versus runtime-triggered fallback are never aggregated together.

Ineligible mutants (for example structural changes, constructor edits, or <clinit> mutations) are pre-filtered and recorded as ineligible:<reason> so they do not inflate the runtime-fallback bucket. Worker recreation after poison does not force unnecessary fresh-loader use: when the worker is retired and recreated, the new warm loader continues on the redefine hot path. The redefine hot path also batches a revert-of-previous-class and the next mutation into a single redefineClasses varargs call when consecutive mutants target different classes; for same-class sibling mutants the new mutation implicitly overwrites the previous one and no revert is owed.

See Strict mode → Bytecode backends for backend selection and the full swap-outcome table, Parity gate for the cross-backend classification invariant, and Benchmark for the pinned benchmark matrix that measures redefine-vs-fresh-loader wall time.

Deduplication

After testing, killed mutants with identical kill signatures (same operator type, same set of killing tests) are deduplicated. Only one representative per signature is kept. Survivors and errors are never deduped. Disable with --no-dedup.

CLI reference

Flag	Default	Description
`--level <ast\\|bytecode\\|both>`	`bytecode`	Mutation level
`--operators <list>`	all 27	AST operators (comma-separated, or group aliases)
`--bc-operators <list>`	all 6	Bytecode operators (comma-separated)
`--output <text\\|log\\|json\\|html>`	`text`	Output mode (live spinner / CI log / JSONL / HTML report)
`-v`, `--verbose`	off	Verbose output with OP and HINT columns
`-i`, `--incremental`	off	Reuse cached AST results
`--quick`	off	Quick mode: relational + literal only, max 20 mutants/file
`--all`	off	Exhaustive mode: override the default 200 mutants/file cap
`-j`, `--parallel <n>`	CPU count	Parallel worker count (bytecode level)
`--no-dedup`	off	Keep redundant mutants
`--certify`	off	Generate mutation certificate
`--why <id>`	none	Explain why a mutant survived
`--expand <group>`	none	Show individual mutants in a group
`--replay <id>`	none	Re-run a single mutation
`--top <n>`	20	Number of survivor groups to show
`--group <key>`	`function`	Group by: function, file, or cause
`--observe`	off	Enable runtime observability
`--differential`	off	Enable differential trace detection
`--quarantine <id>`	none	Quarantine a surviving mutant
`--list-quarantined`	off	List quarantined mutants
`--unstable`	off	Include nondeterministic mutants (skipped by default)
`--no-manifest`	off	Disable mutation manifest emission (on `vary run`)
`--snapshot <tag>`	none	Save a named snapshot after the run
`--compare <tag>`	none	Compare results to a previous snapshot
`--heatmap`	off	Generate per-line and per-function mutation heatmap
`--budget <duration>`	none	Time budget for the run (e.g., `30s`, `5m`)
`--timeout <ms>`	5000	Per-mutant timeout in milliseconds
`--auto-quarantine <threshold>`	none	Auto-quarantine mutants above confidence threshold
`--no-prioritize`	off	Disable data-flow-based mutant prioritization
`--kill-map`	off	Output per-test kill map
`--baseline <path>`	none	Baseline manifest JSON for delta reporting
`--strict-tests`	off	Promote missing-observe warnings to errors
`--list-operators`	off	List all available operators and group aliases
`--strict-selection <mode>`	`evidence`	Strict-mode test selection: `evidence` or `reference`
`--reachability`	off	Record per-test method reachability during baseline
`--warm-workers <on\\|off>`	`on`	Reuse long-lived workers across mutant batches
`--fresh-workers`	off	Alias for `--warm-workers=off`
`--backend <name>`	`fresh-loader`	Bytecode backend: `fresh-loader`, `hot-swap`, or `redefine`
`--relevance-graph-path <path>`	`.vary/relevance/`	Override location of the persisted relevance graph
`--explain`	off	Include selection and scheduling explanation in the report
`--incremental-infer`	off	Reuse prior mutant outcomes across runs (Incremental inference)
`--fast-mode`	off	Heuristic test-filter narrowing with validated fallback (Fast mode)
`--fast-mode-narrow-factor <f>`	`0.5`	Fraction of conservative test filter kept under fast mode
`--fast-mode-sample-size <n>`	`10`	Mutants re-run with the broader filter to measure miss rate
`--fast-mode-miss-threshold <t>`	`0.10`	Miss rate above which fast mode falls back automatically

See Strict mode for evidence-based selection, warm workers, and bytecode backend details, Parity gate for cross-backend classification parity, Fast mode for validated test-filter narrowing, Incremental inference for per-mutant outcome reuse, Survivor tail for flake-detection rerun accounting, and Benchmark for the parity benchmark harness.

Signatures & manifest Equivalent mutants

Infrastructure

Caching

Bytecode method-level cache

AST mutation cache

Equivalent mutant quarantine

Policy gates

Certificates

Result storage

Module mutation budgets

Operator controls

Timeout-aware diagnosis

Trend snapshots

Mutation heatmaps

Quick mode

Default file cap and --all

Output modes

Performance

Warm in-process worker model

Worker-local test invocation cache

redefine-first policy

Deduplication

CLI reference

Default file cap and `--all`

`redefine`-first policy