
This page covers the operational side of `vary mutate`: caching, equivalent mutant quarantine, policy gates, certificates, and the full CLI reference. For mutation operators, see [Operators](/docs/mutation/operators/). For contracts, observability, and scoring, see [Advanced overview](/docs/mutation/advanced/) and its deep-dive pages.

## Caching

### Bytecode method-level cache

Stored in `.vary-mutation-cache-v2`. Each method is individually hashed. When a method's hash matches the cache, its results are reused. Only changed methods get re-tested. Use `--all` to skip the cache and test everything.

### AST mutation cache

Stored in `.vary-mutation-cache`. Cache key is SHA-256 of source + test content + operator set. Enabled with `--incremental`.

## Equivalent mutant quarantine

Some mutants are semantically equivalent to the original (e.g., `x + 0 == x`) and can never be killed. Quarantine them:

```bash
vary mutate source.vary --quarantine "mutant-id" --quarantine-reason "equivalent: x+0 == x"
vary mutate source.vary --list-quarantined
vary mutate source.vary --unquarantine "mutant-id"
```

Quarantined mutants are excluded from the score and survivor lists.

## Policy gates

Configure in `vary.toml`:

```toml
[mutation]
min_score = 80.0        # Minimum mutation score (0-100)
fail_on_decrease = true  # Fail if score decreased from previous run
max_survivors = 5        # Maximum surviving mutants allowed
```

If any constraint is violated, the command exits with code 1.

## Certificates

A mutation certificate is a tamper-evident record of a run:

```bash
vary mutate source.vary --tests test.vary --certify
```

Stored at `.vary/runs/{runId}/certificate.json`. Contains source hash, test hash, compiler version, and score. A hash of all fields is included so tampering with any value invalidates the certificate.

## Result storage

Each source directory contains a `.vary/` directory for testing artifacts:

```text
.vary/
  mutation-manifest.json       # Discovery + results manifest (emitted on every build)
  status.json                  # Latest run summary
  history.jsonl                # Append-only ledger of all runs
  equivalent-mutants.json      # Quarantined mutant IDs
  runs/
    <run-id>/
      certificate.json         # Mutation certificate
```

## Module mutation budgets

Mutation thresholds can be defined per module or path. The first matching pattern wins.

```toml
[mutation]
min_kill = 70

[[mutation.module]]
pattern = "core/**"
min_kill = 90

[[mutation.module]]
pattern = "legacy/**"
min_kill = 40
```

When `vary mutate` runs in project mode, each module's score is checked against its threshold. If any module falls below, the command exits with code 1. Set a low bar for legacy code, a high bar for critical paths, and ratchet up over time.

## Operator controls

Select which mutation operators run globally or per module:

```toml
[mutation]
operators = ["CLASSIC", "CONTRACT"]

[[mutation.module]]
pattern = "api/**"
disable = ["ENUM_REPLACE"]
```

Operator controls let teams tune mutation noise and isolate new operator families. The `--operators` CLI flag overrides `vary.toml` for a single run. Use `--list-operators` to see all available operators and group aliases.

## Timeout-aware diagnosis

Mutants that exceed the per-mutant timeout are classified separately from killed and survived:

```text
Timeout mutants: 4
  math.vary:23  factorial()  ARITHMETIC  (likely infinite loop)
  math.vary:45  power()      BOUNDARY    (likely infinite loop)
```

Timeouts often indicate infinite-loop mutants (e.g., a loop condition flipped from `<` to `>=`). These are not counted as survivors but are reported so you can verify they are not masking real bugs. Configure the timeout with `--timeout <ms>` (default: 5000ms) or set a run-level time budget with `--budget` (e.g., `--budget 2m`).

## Trend snapshots

Save a named snapshot after a mutation run and compare it to previous snapshots:

```bash
vary mutate src/ --snapshot v1.0
vary mutate src/ --snapshot v1.1 --compare v1.0
```

Snapshots are stored in `.vary/runs/` and include kill rate, timeout count, and classification breakdown per module. The comparison output shows regressions:

```text
Mutation trend (v1.0 → v1.1)

  Kill rate:              82% → 87%  (+5%)
  Timeout mutants:         6 → 2    (-4)
  Weak-oracle survivors:  11 → 7    (-4)
```

Use `--trend` without `--compare` to compare against the most recent previous snapshot.

## Mutation heatmaps

The `--heatmap` flag generates a per-line and per-function mutation density report:

```text
Heatmap: math.vary

  Line  Source                          Mutants  Killed  Survived
  23    let total = price * qty         5        3       2        ██░░
  45    if amount > threshold {         3        3       0        ████
  67    return balance - fee            4        1       3        █░░░

Hotspot functions:
  factorial()    60% kill rate (3/5)
  withdraw()     85% kill rate (17/20)
```

Red-heavy lines (low kill rate) are the weakest regions in the file. Focus testing effort there first.

## Quick mode

`--quick` limits mutations to relational and literal operators, caps at 20 mutants. Good for a rough score without waiting for all operators.

## Default file cap and `--all`

By default, mutation runs cap at 200 mutants per file to keep interactive runs fast on large files. Use `--all` to restore exhaustive behavior (every possible mutant), or `--quick` to drop to 20.

## Output modes

`--output <text|log|json|html>` controls how results are rendered during and after the run.

| Mode | Behavior |
|------|----------|
| `text` (default) | Live in-place spinner with elapsed time, tested/total mutant counters, and a refining ETA. Uses ANSI escape codes and installs a Ctrl+C shutdown hook to restore the terminal cleanly. |
| `log` | Plain lines, one per event, suitable for CI log scraping. |
| `json` | JSONL event stream for programmatic consumers. |
| `html` | Writes a report file. |

Before the run starts, the engine pre-scans all files to count mutants and uses historical `avgMutantMs` from the run history to print an up-front wall-clock estimate. The ETA refines as actual throughput comes in.

## Performance

| Optimization | Effect |
|--------------|--------|
| Shared module resolution | Imported modules are type-checked once per run rather than once per mutant. |
| Early exit on failure | Remaining tests are skipped when a mutant is already proven killed. |
| Adaptive per-test timeout | 10× the baseline execution time, clamped between 1s and 5s. Prevents infinite loops while giving legitimately slow tests room. |
| Warm in-process workers | Long-lived workers keep a compiled classloader resident across mutant batches instead of rebuilding per mutant. |
| Worker-local test invocation cache | Reflective discovery of `__vtest_names__`, `__vtest_methods__`, and the per-test timeout executor is paid once per worker instead of once per mutant. |
| `redefine` first | When an instrumentation agent is attached, eligible mutants take the in-process `redefineClasses` path before considering a fresh classloader. |
| Patch-plan reuse | Sibling mutants on the same compiled method share a cached ASM `ClassNode` so parsing and tree-building happen once per method, not once per sibling. |

## Warm in-process worker model

The compiled bytecode backends (`hot-swap`, `redefine`) execute every mutant inside a warm, in-process worker. The worker is created once per batch: it compiles the target module, loads the baseline class bytes into a `MutationClassLoader`, runs the baseline test pass, and then cycles through mutants without tearing the classloader down between them. Full-loader rebuilds are attributed to two explicit causes only: worker recreation after poison recovery (wall-clock escape hatch or state-leak detection) or an explicit fresh-loader fallback.

Every artifact produced by a compiled run surfaces `initialRebuild`, `poisonRebuilds`, `baseLoaderRebuilds`, and `unaccountedRebuilds` so the warm-worker contract is visible in telemetry. Outside poison recovery and documented fallbacks, `unaccountedRebuilds` is always `0` on the pinned short-loop corpus.

### Worker-local test invocation cache

Reflective discovery of the generated test dispatcher (`__vtest_names__`, `__vtest_methods__`, and each individual test method handle) used to run once per mutant. The warm worker now opens a single invocation cache after baseline setup, reuses it across every mutant in the batch, and closes it in the worker's teardown path. The daemon executor used for per-test timeout dispatch is bound to the same cache so it is not rebuilt around every test run either.

The effect is visible as the `testDispatchMs` phase in `phaseTimings`: on the pinned short-loop fixture the accumulator drops ~30% versus the pre-wave baseline, and the structural counters (`namesInvocationCount`, `methodsInvocationCount`, `executorReuseCount`) prove the cache is wired regardless of timing noise.

## `redefine`-first policy

When `--backend redefine` is selected and an instrumentation agent is attached, eligible mutants take the `Instrumentation.redefineClasses` path as the normal execution mode. The fresh-loader path becomes a fallback rather than the primary loop. Every per-mutant result records one of `redefine`, `hotswap`, `ineligible:<reason>`, `fallback:<detail>`, or `fresh-loader` in `swapOutcome` so eligible versus ineligible versus runtime-triggered fallback are never aggregated together.

Ineligible mutants (for example structural changes, constructor edits, or `<clinit>` mutations) are pre-filtered and recorded as `ineligible:<reason>` so they do not inflate the runtime-fallback bucket. Worker recreation after poison does not force unnecessary fresh-loader use: when the worker is retired and recreated, the new warm loader continues on the `redefine` hot path. The redefine hot path also batches a revert-of-previous-class and the next mutation into a single `redefineClasses` varargs call when consecutive mutants target different classes; for same-class sibling mutants the new mutation implicitly overwrites the previous one and no revert is owed.

See [Strict mode → Bytecode backends](/docs/mutation/strict-mode/) for backend selection and the full swap-outcome table, [Parity gate](/docs/mutation/parity-gate/) for the cross-backend classification invariant, and [Benchmark](/docs/mutation/benchmark/) for the pinned benchmark matrix that measures `redefine`-vs-`fresh-loader` wall time.

## Deduplication

After testing, killed mutants with identical kill signatures (same operator type, same set of killing tests) are deduplicated. Only one representative per signature is kept. Survivors and errors are never deduped. Disable with `--no-dedup`.

## CLI reference

| Flag | Default | Description |
|------|---------|-------------|
| `--level <ast\|bytecode\|both>` | `bytecode` | Mutation level |
| `--operators <list>` | all 27 | AST operators (comma-separated, or group aliases) |
| `--bc-operators <list>` | all 6 | Bytecode operators (comma-separated) |
| `--output <text\|log\|json\|html>` | `text` | Output mode (live spinner / CI log / JSONL / HTML report) |
| `-v`, `--verbose` | off | Verbose output with OP and HINT columns |
| `-i`, `--incremental` | off | Reuse cached AST results |
| `--quick` | off | Quick mode: relational + literal only, max 20 mutants/file |
| `--all` | off | Exhaustive mode: override the default 200 mutants/file cap |
| `-j`, `--parallel <n>` | CPU count | Parallel worker count (bytecode level) |
| `--no-dedup` | off | Keep redundant mutants |
| `--certify` | off | Generate mutation certificate |
| `--why <id>` | none | Explain why a mutant survived |
| `--expand <group>` | none | Show individual mutants in a group |
| `--replay <id>` | none | Re-run a single mutation |
| `--top <n>` | 20 | Number of survivor groups to show |
| `--group <key>` | `function` | Group by: function, file, or cause |
| `--observe` | off | Enable runtime observability |
| `--differential` | off | Enable differential trace detection |
| `--quarantine <id>` | none | Quarantine a surviving mutant |
| `--list-quarantined` | off | List quarantined mutants |
| `--unstable` | off | Include nondeterministic mutants (skipped by default) |
| `--no-manifest` | off | Disable mutation manifest emission (on `vary run`) |
| `--snapshot <tag>` | none | Save a named snapshot after the run |
| `--compare <tag>` | none | Compare results to a previous snapshot |
| `--heatmap` | off | Generate per-line and per-function mutation heatmap |
| `--budget <duration>` | none | Time budget for the run (e.g., `30s`, `5m`) |
| `--timeout <ms>` | 5000 | Per-mutant timeout in milliseconds |
| `--auto-quarantine <threshold>` | none | Auto-quarantine mutants above confidence threshold |
| `--no-prioritize` | off | Disable data-flow-based mutant prioritization |
| `--kill-map` | off | Output per-test kill map |
| `--baseline <path>` | none | Baseline manifest JSON for delta reporting |
| `--strict-tests` | off | Promote missing-observe warnings to errors |
| `--list-operators` | off | List all available operators and group aliases |
| `--strict-selection <mode>` | `evidence` | Strict-mode test selection: `evidence` or `reference` |
| `--reachability` | off | Record per-test method reachability during baseline |
| `--warm-workers <on\|off>` | `on` | Reuse long-lived workers across mutant batches |
| `--fresh-workers` | off | Alias for `--warm-workers=off` |
| `--backend <name>` | `fresh-loader` | Bytecode backend: `fresh-loader`, `hot-swap`, or `redefine` |
| `--relevance-graph-path <path>` | `.vary/relevance/` | Override location of the persisted relevance graph |
| `--explain` | off | Include selection and scheduling explanation in the report |
| `--incremental-infer` | off | Reuse prior mutant outcomes across runs ([Incremental inference](/docs/mutation/incremental-inference/)) |
| `--fast-mode` | off | Heuristic test-filter narrowing with validated fallback ([Fast mode](/docs/mutation/fast-mode/)) |
| `--fast-mode-narrow-factor <f>` | `0.5` | Fraction of conservative test filter kept under fast mode |
| `--fast-mode-sample-size <n>` | `10` | Mutants re-run with the broader filter to measure miss rate |
| `--fast-mode-miss-threshold <t>` | `0.10` | Miss rate above which fast mode falls back automatically |

See [Strict mode](/docs/mutation/strict-mode/) for evidence-based selection, warm workers, and bytecode backend details, [Parity gate](/docs/mutation/parity-gate/) for cross-backend classification parity, [Fast mode](/docs/mutation/fast-mode/) for validated test-filter narrowing, [Incremental inference](/docs/mutation/incremental-inference/) for per-mutant outcome reuse, [Survivor tail](/docs/mutation/survivor-tail/) for flake-detection rerun accounting, and [Benchmark](/docs/mutation/benchmark/) for the parity benchmark harness.
