
`vary mutate-parity` defends performance-oriented bytecode backends (`redefine`, `hot-swap`) against the reference `fresh-loader` backend. Any time the two backends classify the same mutant differently (one `killed`, the other `survived`), the parity run fails loudly so the discrepancy is caught before the performance backend can become a default.

For the backends themselves, see [Strict mode → Bytecode backends](/docs/mutation/strict-mode/).

## Why

The `fresh-loader` backend is the correctness source of truth. It constructs a brand-new classloader for every mutant, which is slow but gives each mutant a clean execution environment. Other backends are strictly performance optimisations: every kill-vs-survive classification must match what the reference backend would have produced. The parity gate is the continuous defence of that invariant.

## CLI

```bash
vary mutate-parity <source> [-t <tests>] [--candidate redefine|hot-swap]
vary mutate-parity <directory> --corpus [--candidate redefine]
```

Single-file mode:

```bash
vary mutate-parity src/utils.vary -t tests/test_utils.vary
```

Pinned-corpus mode (CI):

```bash
vary mutate-parity programs/frugal --corpus
```

| Flag | Default | Purpose |
|------|---------|---------|
| `-t, --tests` | auto | Test file/directory (single-file mode) |
| `--candidate` | `redefine` | Candidate backend: `redefine` or `hot-swap` |
| `--corpus` | off | Walk a project directory as a pinned corpus |
| `--json` | off | Echo the parity report to stdout |

## Output

Human-readable text (default) and `mutation-reports/parity-report.json` are emitted in both modes. With `--json`, the JSON is also printed to stdout.

Each mismatch carries:

| Field | Meaning |
|-------|---------|
| `mutantId` | Stable mutation ID (same across backends) |
| `location` | `<class>#<method><descriptor>@<instructionIndex>` |
| `referenceOutcome` / `candidateOutcome` | `killed` or `survived` under each backend |
| `candidateSwapOutcome` | `redefine`, `hotswap`, `fallback:*`, or `fresh-loader` |
| `referenceKilledBy` / `candidateKilledBy` | Tests that failed under each backend |

A non-empty `mismatches` array exits the CLI non-zero.

## Default backend

`fresh-loader` remains the default `--backend` on `vary mutate`. The `redefine` backend will not become the default until the parity gate is stable on the pinned corpus. Until that happens, the parity gate is the only CI signal that the `redefine` backend is trustworthy.

## CI

The nightly `nightly-reference-parity` job runs parity on a pinned corpus:

```bash
vary mutate-parity programs/frugal --corpus --json
```

A non-zero exit causes the nightly run to fail. The `mutation-reports/parity-report.json` artifact is uploaded for triage.

## What's not a mismatch

| Case | Why | Treatment |
|------|-----|-----------|
| Mutant errors in either backend | Engine failure, not a classification claim | Excluded from the comparison |
| Mutant was generated by only one backend | Rare, since both use the same generator | Skipped |

If the candidate backend has no `Instrumentation` handle available (no `-javaagent:` and self-attach failed), every mutant routes through fresh-loader fallback. The parity report still runs to completion; classifications then agree by construction.
