This page walks through a complete mutation testing session, from a weak test suite to a strong one. Mutation testing does not measure whether your code runs. It measures whether your tests would notice if the code changed.

This workflow is especially useful when tests are generated by AI, where surface coverage may be high but behavioural guarantees are weak.

You can follow along using `examples/mutation-workflow/` from the repository. For an introduction, see [Introduction](/docs/mutation/testing/). For the full reference, see [Advanced overview](/docs/mutation/advanced/).

## The code

`scoring.vary` has five small functions:

```vary
def add(a: Int, b: Int) -> Int {
    return a + b
}

def subtract(a: Int, b: Int) -> Int {
    return a - b
}

def clamp(value: Int, low: Int, high: Int) -> Int {
    if value < low {
        return low
    }
    if value > high {
        return high
    }
    return value
}

def abs_val(n: Int) -> Int {
    if n < 0 {
        return -n
    }
    return n
}

def is_passing(score: Int) -> Bool {
    return score >= 60 and score <= 100
}
```

`test_scoring.vary` has deliberately weak tests. They call every function but use loose assertions:

```vary-snippet
from scoring import add, subtract, clamp, abs_val, is_passing

test "add positive" {
    observe add(2, 3) > 0
}

test "subtract positive" {
    observe subtract(10, 3) > 0
}

test "clamp middle" {
    observe clamp(5, 0, 10) == 5
}

test "abs positive" {
    observe abs_val(5) == 5
}

test "is passing" {
    observe is_passing(75)
}
```

## Running the examples

Verify the code works before mutating it:

```bash
vary run examples/mutation-workflow/scoring.vary
vary test examples/mutation-workflow/test_scoring.vary
```

Both commands should exit cleanly. The test command runs all five tests.

## Step 1: Get the score

```bash
vary mutate scoring.vary --tests test_scoring.vary
```

Output:

```text
Score: 27% killed (13/48)
Test Strength: Weak (27%)

Early-stage test depth. The leverage fixes below are the fastest path to stronger test signal.

Biggest Leverage Fixes

1) Add scoring edge-case tests
   Impact: ~14 mutants  Projected after (approx): 56% (+29pp)
   Location: scoring

2) Assert scoring outputs and constants
   Impact: ~9 mutants  Projected after (approx): 75% (+48pp)
   Location: scoring

3) Assert scoring events and state changes
   Impact: ~9 mutants  Projected after (approx): 94% (+67pp)
   Location: scoring

4) Pin scoring return values
   Impact: ~3 mutants  Projected after (approx): 100% (+73pp)
   Location: scoring

Start here: address #1 in scoring. (~14 mutants, projected to 56%)
  vary mutate . --expand "scoring"

Top survivor groups (4 of 4)

GROUP               FILE:LINE           SURV  CAUSE
scoring#clamp       scoring.vary:10-11    14  ASSERT_EFFECT
scoring#is_passing  scoring.vary:27       11  ASSERT_VALUE
scoring#abs_val     scoring.vary:20-21     9  ASSERT_EFFECT
scoring#subtract    scoring.vary:6         1  ASSERT_MATH

Why Survivors Exist

   40% (14) Branch conditions not covered
   26% (9) Values changed but never asserted
   26% (9) Weak assertions only
    9% (3) Return values not pinned
```

27% means the tests catch about a quarter of possible changes.

The leverage fixes are cumulative. Fix #1 projects the score to 56%. Fix #2 adds to that, projecting 75%. The `+Npp` column shows how many percentage points above the current score each fix brings you. The "Start here" line tells you which fix to tackle first and gives the command to inspect it.

The CAUSE column in the survivor table tells you why each group survived. `ASSERT_VALUE` means a value was changed but never asserted. `ASSERT_EFFECT` means side effects were not observed.

The "Why Survivors Exist" breakdown gives the broader categories. "Values changed but never asserted" and "Weak assertions only" both point to the same root cause: the tests use `observe x > 0` where they should use `observe x == 5`.

## Step 2: Read the leverage fixes

The "Biggest Leverage Fixes" section ranks which changes would kill the most mutants. The projections are cumulative: if you address fix #1, the score reaches ~56%; if you also address fix #2, it reaches ~75%.

Start with fix #1. In this case, the biggest group is `scoring#clamp` (14 survivors) because the test only checks the middle of the range, not the boundaries.

## Step 3: Expand a group

Pick `scoring#clamp` and see what survived:

```bash
vary mutate scoring.vary --tests test_scoring.vary --expand "scoring#clamp"
```

```text
Expanded: scoring#clamp (14 mutants)

  scoring.vary:10  Replace < with <=
    vary mutate scoring.vary --replay clamp:REL_LT_TO_LE:a8c21b3f
  scoring.vary:10  Replace low with 0
    vary mutate scoring.vary --replay clamp:LIT_CHANGE:b3e48d12:1
  scoring.vary:13  Replace > with >=
    vary mutate scoring.vary --replay clamp:REL_GT_TO_GE:c4d51e2a
  ...
```

Each line shows what changed and where. The `--replay` command re-runs a single mutant if you want to reproduce it.

## Step 4: Explain a survivor

Pick a mutant and ask why it survived:

```bash
vary mutate scoring.vary --tests test_scoring.vary --why "clamp:REL_LT_TO_LE:a8c21b3f"
```

```text
Mutant: The comparison operator was changed but no test covers the boundary
Location: scoring.vary line 10, in clamp

Change:
  Replace < with <=

Why it survived:
  The comparison operator was changed but no test covers the boundary.

Fix:
  Add a test at the boundary value where < and <= differ.
  Example: observe clamp(0, 0, 10) == 0
```

The test calls `observe clamp(5, 0, 10) == 5`, which passes whether the boundary check is `<` or `<=`. No test checks the boundary itself.

## Step 5: Write a better test

Add boundary tests to `test_scoring.vary`:

```vary-snippet
test "clamp at boundaries" {
    observe clamp(0, 0, 10) == 0
    observe clamp(10, 0, 10) == 10
    observe clamp(-5, 0, 10) == 0
    observe clamp(15, 0, 10) == 10
}
```

And pin the `is_passing` boundaries:

```vary-snippet
test "is passing at boundary" {
    observe is_passing(60)
    observe not is_passing(59)
    observe is_passing(100)
    observe not is_passing(101)
}
```

Re-run:

```bash
vary mutate scoring.vary --tests test_scoring.vary
```

The `scoring#clamp` and `scoring#is_passing` groups shrink and the score goes up.

## Step 6: Fix the weak assertions

The `add` and `subtract` tests use `observe add(2, 3) > 0`. Replace them with exact checks:

```vary-snippet
test "add returns sum" {
    observe add(2, 3) == 5
    observe add(0, 5) == 5
    observe add(-1, 1) == 0
}

test "subtract returns difference" {
    observe subtract(10, 3) == 7
    observe subtract(5, 5) == 0
}
```

Re-run. The "Weak assertions only" category disappears from the breakdown.

## Step 7: Cover the remaining branches

For `abs_val`, test a negative input:

```vary-snippet
test "abs negative" {
    observe abs_val(-3) == 3
    observe abs_val(0) == 0
}
```

## Step 8: Confirm the score

```bash
vary mutate scoring.vary --tests test_scoring.vary
```

The score should be above 90%. Any remaining survivors are either equivalent mutants (changes that don't affect observable behaviour) or edge cases worth investigating with `--why`.

## Adding contracts for free kills

You can strengthen the mutation score without writing more tests by adding contracts:

```vary
def abs_val(n: Int) -> Int {
    out (r) {
        r >= 0
    }
    if n < 0 {
        return -n
    }
    return n
}
```

Now a mutant that changes `return -n` to `return n` breaks the postcondition when called with a negative input. The mutation runner counts contract violations as kills.

## Summary

| Step | What you do | What it tells you |
|------|-------------|-------------------|
| 1 | `vary mutate file.vary` | Overall score and survivor breakdown |
| 2 | Read leverage fixes | Cumulative projected scores for each fix |
| 3 | `--expand` a group | Individual surviving mutants |
| 4 | `--why` on a mutant | Root cause and suggested fix |
| 5 | Write the test, re-run | Confirm the score improved |
| 6 | Repeat | Until the score is where you want it |

## CLI flags used in this walkthrough

| Flag | What it does |
|------|-------------|
| `--expand GROUP` | Show individual mutants in a group |
| `--why ID` | Explain why a specific mutant survived |
| `--replay ID` | Re-run a single mutant |
| `--top N` | Change how many groups the table shows |
| `--group MODE` | Group survivors by function, file, or cause |
| `--quick` | Fast mode: relational + literal operators, max 20 mutants/file |
| `--all` | Exhaustive mode: override the default 200 mutants/file cap |
| `--output MODE` | Output mode: `text` (default, live spinner), `log`, `json`, or `html` |
| `-v` | Verbose output with OP and HINT columns |