Alpha. Vary is under active development and not ready for production use. Syntax, APIs, performance, and behaviour may change between releases.

Infrastructure

This page covers the operational side of vary mutate: caching, equivalent mutant quarantine, policy gates, certificates, and the full CLI reference. For mutation operators, see Operators. For contracts, observability, and scoring, see Advanced overview and its deep-dive pages.

Caching

Bytecode method-level cache

Stored in .vary-mutation-cache-v2. Each method is individually hashed. When a method's hash matches the cache, its results are reused. Only changed methods get re-tested. Use --all to skip the cache and test everything.

AST mutation cache

Stored in .vary-mutation-cache. Cache key is SHA-256 of source + test content + operator set. Enabled with --incremental.

Equivalent mutant quarantine

Some mutants are semantically equivalent to the original (e.g., x + 0 == x) and can never be killed. Quarantine them:

vary mutate source.vary --quarantine "mutant-id" --quarantine-reason "equivalent: x+0 == x"
vary mutate source.vary --list-quarantined
vary mutate source.vary --unquarantine "mutant-id"

Quarantined mutants are excluded from the score and survivor lists.

Policy gates

Configure in vary.toml:

[mutation]
min_score = 80.0        # Minimum mutation score (0-100)
fail_on_decrease = true  # Fail if score decreased from previous run
max_survivors = 5        # Maximum surviving mutants allowed

If any constraint is violated, the command exits with code 1.

Certificates

A mutation certificate is a tamper-evident record of a run:

vary mutate source.vary --tests test.vary --certify

Stored at .vary/runs/{runId}/certificate.json. Contains source hash, test hash, compiler version, and score. A hash of all fields is included so tampering with any value invalidates the certificate.

Result storage

Each source directory contains a .vary/ directory for testing artifacts:

.vary/
  mutation-manifest.json       # Discovery + results manifest (emitted on every build)
  status.json                  # Latest run summary
  history.jsonl                # Append-only ledger of all runs
  equivalent-mutants.json      # Quarantined mutant IDs
  runs/
    <run-id>/
      certificate.json         # Mutation certificate

Module mutation budgets

Mutation thresholds can be defined per module or path. The first matching pattern wins.

[mutation]
min_kill = 70

[[mutation.module]]
pattern = "core/**"
min_kill = 90

[[mutation.module]]
pattern = "legacy/**"
min_kill = 40

When vary mutate runs in project mode, each module's score is checked against its threshold. If any module falls below, the command exits with code 1. Set a low bar for legacy code, a high bar for critical paths, and ratchet up over time.

Operator controls

Select which mutation operators run globally or per module:

[mutation]
operators = ["CLASSIC", "CONTRACT"]

[[mutation.module]]
pattern = "api/**"
disable = ["ENUM_REPLACE"]

Operator controls let teams tune mutation noise and isolate new operator families. The --operators CLI flag overrides vary.toml for a single run. Use --list-operators to see all available operators and group aliases.

Timeout-aware diagnosis

Mutants that exceed the per-mutant timeout are classified separately from killed and survived:

Timeout mutants: 4
  math.vary:23  factorial()  ARITHMETIC  (likely infinite loop)
  math.vary:45  power()      BOUNDARY    (likely infinite loop)

Timeouts often indicate infinite-loop mutants (e.g., a loop condition flipped from < to >=). These are not counted as survivors but are reported so you can verify they are not masking real bugs. Configure the timeout with --timeout <ms> (default: 5000ms) or set a run-level time budget with --budget (e.g., --budget 2m).

Trend snapshots

Save a named snapshot after a mutation run and compare it to previous snapshots:

vary mutate src/ --snapshot v1.0
vary mutate src/ --snapshot v1.1 --compare v1.0

Snapshots are stored in .vary/runs/ and include kill rate, timeout count, and classification breakdown per module. The comparison output shows regressions:

Mutation trend (v1.0 → v1.1)

  Kill rate:              82% → 87%  (+5%)
  Timeout mutants:         6 → 2    (-4)
  Weak-oracle survivors:  11 → 7    (-4)

Use --trend without --compare to compare against the most recent previous snapshot.

Mutation heatmaps

The --heatmap flag generates a per-line and per-function mutation density report:

Heatmap: math.vary

  Line  Source                          Mutants  Killed  Survived
  23    let total = price * qty         5        3       2        ██░░
  45    if amount > threshold {         3        3       0        ████
  67    return balance - fee            4        1       3        █░░░

Hotspot functions:
  factorial()    60% kill rate (3/5)
  withdraw()     85% kill rate (17/20)

Red-heavy lines (low kill rate) are the weakest regions in the file. Focus testing effort there first.

Quick mode

--quick limits mutations to relational and literal operators, caps at 20 mutants. Good for a rough score without waiting for all operators.

Default file cap and --all

By default, mutation runs cap at 200 mutants per file to keep interactive runs fast on large files. Use --all to restore exhaustive behavior (every possible mutant), or --quick to drop to 20.

Output modes

--output <text|log|json|html> controls how results are rendered during and after the run.

ModeBehavior
text (default)Live in-place spinner with elapsed time, tested/total mutant counters, and a refining ETA. Uses ANSI escape codes and installs a Ctrl+C shutdown hook to restore the terminal cleanly.
logPlain lines, one per event, suitable for CI log scraping.
jsonJSONL event stream for programmatic consumers.
htmlWrites a report file.

Before the run starts, the engine pre-scans all files to count mutants and uses historical avgMutantMs from the run history to print an up-front wall-clock estimate. The ETA refines as actual throughput comes in.

Performance

OptimizationEffect
Shared module resolutionImported modules are type-checked once per run rather than once per mutant.
Early exit on failureRemaining tests are skipped when a mutant is already proven killed.
Adaptive per-test timeout10× the baseline execution time, clamped between 1s and 5s. Prevents infinite loops while giving legitimately slow tests room.
Warm in-process workersLong-lived workers keep a compiled classloader resident across mutant batches instead of rebuilding per mutant.
Worker-local test invocation cacheReflective discovery of __vtest_names__, __vtest_methods__, and the per-test timeout executor is paid once per worker instead of once per mutant.
redefine firstWhen an instrumentation agent is attached, eligible mutants take the in-process redefineClasses path before considering a fresh classloader.
Patch-plan reuseSibling mutants on the same compiled method share a cached ASM ClassNode so parsing and tree-building happen once per method, not once per sibling.

Warm in-process worker model

The compiled bytecode backends (hot-swap, redefine) execute every mutant inside a warm, in-process worker. The worker is created once per batch: it compiles the target module, loads the baseline class bytes into a MutationClassLoader, runs the baseline test pass, and then cycles through mutants without tearing the classloader down between them. Full-loader rebuilds are attributed to two explicit causes only: worker recreation after poison recovery (wall-clock escape hatch or state-leak detection) or an explicit fresh-loader fallback.

Every artifact produced by a compiled run surfaces initialRebuild, poisonRebuilds, baseLoaderRebuilds, and unaccountedRebuilds so the warm-worker contract is visible in telemetry. Outside poison recovery and documented fallbacks, unaccountedRebuilds is always 0 on the pinned short-loop corpus.

Worker-local test invocation cache

Reflective discovery of the generated test dispatcher (__vtest_names__, __vtest_methods__, and each individual test method handle) used to run once per mutant. The warm worker now opens a single invocation cache after baseline setup, reuses it across every mutant in the batch, and closes it in the worker's teardown path. The daemon executor used for per-test timeout dispatch is bound to the same cache so it is not rebuilt around every test run either.

The effect is visible as the testDispatchMs phase in phaseTimings: on the pinned short-loop fixture the accumulator drops ~30% versus the pre-wave baseline, and the structural counters (namesInvocationCount, methodsInvocationCount, executorReuseCount) prove the cache is wired regardless of timing noise.

redefine-first policy

When --backend redefine is selected and an instrumentation agent is attached, eligible mutants take the Instrumentation.redefineClasses path as the normal execution mode. The fresh-loader path becomes a fallback rather than the primary loop. Every per-mutant result records one of redefine, hotswap, ineligible:<reason>, fallback:<detail>, or fresh-loader in swapOutcome so eligible versus ineligible versus runtime-triggered fallback are never aggregated together.

Ineligible mutants (for example structural changes, constructor edits, or <clinit> mutations) are pre-filtered and recorded as ineligible:<reason> so they do not inflate the runtime-fallback bucket. Worker recreation after poison does not force unnecessary fresh-loader use: when the worker is retired and recreated, the new warm loader continues on the redefine hot path. The redefine hot path also batches a revert-of-previous-class and the next mutation into a single redefineClasses varargs call when consecutive mutants target different classes; for same-class sibling mutants the new mutation implicitly overwrites the previous one and no revert is owed.

See Strict mode → Bytecode backends for backend selection and the full swap-outcome table, Parity gate for the cross-backend classification invariant, and Benchmark for the pinned benchmark matrix that measures redefine-vs-fresh-loader wall time.

Deduplication

After testing, killed mutants with identical kill signatures (same operator type, same set of killing tests) are deduplicated. Only one representative per signature is kept. Survivors and errors are never deduped. Disable with --no-dedup.

CLI reference

FlagDefaultDescription
--level <ast\|bytecode\|both>bytecodeMutation level
--operators <list>all 27AST operators (comma-separated, or group aliases)
--bc-operators <list>all 6Bytecode operators (comma-separated)
--output <text\|log\|json\|html>textOutput mode (live spinner / CI log / JSONL / HTML report)
-v, --verboseoffVerbose output with OP and HINT columns
-i, --incrementaloffReuse cached AST results
--quickoffQuick mode: relational + literal only, max 20 mutants/file
--alloffExhaustive mode: override the default 200 mutants/file cap
-j, --parallel <n>CPU countParallel worker count (bytecode level)
--no-dedupoffKeep redundant mutants
--certifyoffGenerate mutation certificate
--why <id>noneExplain why a mutant survived
--expand <group>noneShow individual mutants in a group
--replay <id>noneRe-run a single mutation
--top <n>20Number of survivor groups to show
--group <key>functionGroup by: function, file, or cause
--observeoffEnable runtime observability
--differentialoffEnable differential trace detection
--quarantine <id>noneQuarantine a surviving mutant
--list-quarantinedoffList quarantined mutants
--unstableoffInclude nondeterministic mutants (skipped by default)
--no-manifestoffDisable mutation manifest emission (on vary run)
--snapshot <tag>noneSave a named snapshot after the run
--compare <tag>noneCompare results to a previous snapshot
--heatmapoffGenerate per-line and per-function mutation heatmap
--budget <duration>noneTime budget for the run (e.g., 30s, 5m)
--timeout <ms>5000Per-mutant timeout in milliseconds
--auto-quarantine <threshold>noneAuto-quarantine mutants above confidence threshold
--no-prioritizeoffDisable data-flow-based mutant prioritization
--kill-mapoffOutput per-test kill map
--baseline <path>noneBaseline manifest JSON for delta reporting
--strict-testsoffPromote missing-observe warnings to errors
--list-operatorsoffList all available operators and group aliases
--strict-selection <mode>evidenceStrict-mode test selection: evidence or reference
--reachabilityoffRecord per-test method reachability during baseline
--warm-workers <on\|off>onReuse long-lived workers across mutant batches
--fresh-workersoffAlias for --warm-workers=off
--backend <name>fresh-loaderBytecode backend: fresh-loader, hot-swap, or redefine
--relevance-graph-path <path>.vary/relevance/Override location of the persisted relevance graph
--explainoffInclude selection and scheduling explanation in the report
--incremental-inferoffReuse prior mutant outcomes across runs (Incremental inference)
--fast-modeoffHeuristic test-filter narrowing with validated fallback (Fast mode)
--fast-mode-narrow-factor <f>0.5Fraction of conservative test filter kept under fast mode
--fast-mode-sample-size <n>10Mutants re-run with the broader filter to measure miss rate
--fast-mode-miss-threshold <t>0.10Miss rate above which fast mode falls back automatically

See Strict mode for evidence-based selection, warm workers, and bytecode backend details, Parity gate for cross-backend classification parity, Fast mode for validated test-filter narrowing, Incremental inference for per-mutant outcome reuse, Survivor tail for flake-detection rerun accounting, and Benchmark for the parity benchmark harness.