
The [companion article](/articles/bytecode-mutation-is-why-vary-uses-the-jvm/) explains *why* Vary does bytecode mutation. This one explains *how*. A [follow-up article](/articles/bytecode-mutation-under-the-hood/) covers the implementation internals for readers who want to go deeper.

## What mutation testing actually is

Code coverage tells you which lines your tests execute. It does not tell you whether the tests would notice if those lines were wrong.

Mutation testing closes that gap. Make a small, deliberate change to your code (a "mutant"), then run your tests. If at least one test fails, the mutant is "killed," your tests caught the change. If every test still passes, the mutant "survived" and there is a hole in your test suite.

Think of it like a smoke detector check. You press the test button (introduce a known fault) and see whether the alarm goes off (a test fails). If the alarm stays silent, the detector is not doing its job.

A mutation testing tool does this hundreds or thousands of times, with different small changes: swapping `+` for `-`, changing `<` to `<=`, replacing a return value with zero. The percentage of mutants your tests kill is the mutation score. A high score means your tests are actually verifying behaviour, not just touching code.

## Why bytecode

Most mutation tools work at the source level. They edit your source code, recompile, run the tests, then undo the edit and repeat. The problem is that recompiling for every single mutant is slow. If you have 500 mutants and each one needs a full compile, the wait adds up fast.

Vary takes a different approach. It compiles your source code once, all the way down to JVM bytecode (the low-level instructions the Java Virtual Machine executes). Then it makes mutations directly in the compiled bytecode, without going back through the compiler. Patching one instruction in an already-compiled program is cheaper than re-parsing, re-type-checking, and re-compiling from scratch.

The analogy: imagine you have a printed book and you want to test whether a proofreader notices a typo. Source-level mutation reprints the entire book for each typo. Bytecode mutation uses white-out on one word and photocopies that page.

## The pipeline

When you run `vary mutate add.vary --tests test_add.vary`, six things happen:

| Step | What happens |
|---|---|
| Compile source | Vary source becomes JVM bytecode |
| Compile tests | Same process for the test file |
| Run baseline | Load both into memory, run every test, record which ones pass |
| Generate mutations | Walk every instruction in the compiled code and identify places where a small change would be meaningful |
| Test each mutation | Patch one instruction, load the patched version, run the same tests, check whether any test that passed before now fails. If so, the mutant is killed |
| Report | Mutation score = killed / total |

Steps 1 through 3 happen once. Step 5 repeats for every mutant, but it never reparses source, never type-checks, never regenerates bytecode from scratch. It patches one instruction and asks one question: did the tests notice?

## One concrete example, all the way down

Start with this Vary source:

```vary
def add(a: Int, b: Int) -> Int {
    return a + b
}

test "add sums two ints" {
    observe add(2, 3) == 5
}
```

### What the compiler emits

When Vary compiles this function, it produces JVM bytecode: a sequence of low-level instructions that the Java Virtual Machine knows how to execute. You do not normally see these instructions, but they are what actually runs when your program executes.

Vary's `Int` type maps to JVM `long` (a 64-bit integer). This means integer arithmetic uses the `L`-prefix instructions. Here is the bytecode for `add`:

```text
public static long add(long, long)
  0: LLOAD 0       // load parameter a onto the stack
  1: LLOAD 2       // load parameter b onto the stack
  2: LADD          // pop both values, add them, push the result
  3: LRETURN       // return the value on top of the stack
```

Four instructions. `LLOAD` loads a long value. `LADD` adds two longs. `LRETURN` returns a long. The JVM is a stack machine, so values get pushed onto a stack, operations consume values from the top, and results get pushed back.

### What a mutation looks like

The arithmetic mutation operator scans the instruction list. When it reaches `LADD` at index 2, it knows it can swap addition for subtraction. The mutated bytecode becomes:

```text
public static long add(long, long)
  0: LLOAD 0       // load a
  1: LLOAD 2       // load b
  2: LSUB          // subtract instead of add    <-- the mutation
  3: LRETURN       // return
```

One instruction changed. Everything else is identical. The mutation engine did not re-read the source file, did not re-run the type checker, did not regenerate the other three instructions. It swapped one byte.

### What happens when the tests run

The mutated bytecode gets loaded into a fresh, isolated environment (a new JVM classloader, which is the JVM's way of loading compiled code into memory). The test calls `add(2, 3)`. The original would return `5`. The mutant computes `2 - 3 = -1`. The `observe` statement checks `-1 == 5`, which is false, so the test fails. The mutant is killed.

If the test had been weaker, like `observe add(2, 3) > 0`, the mutant would survive: `-1 > 0` is false, so that test would still catch it. But `observe add(2, 3) > -10` would not, since `-1 > -10` is still true. Mutation testing finds exactly these kinds of gaps.

## The six mutation operators

Vary has six types of bytecode mutations, each targeting a different kind of instruction.

### Arithmetic: swap math operations

Replaces one arithmetic operation with another. `+` becomes `-`, `*` becomes `/`, and so on.

| What you wrote | What the mutant does |
|---|---|
| `a + b` | `a - b` |
| `a - b` | `a + b` |
| `a * b` | `a / b` |
| `a / b` | `a * b` |
| `a % b` | `a * b` or `a / b` |

At the bytecode level, this is a single opcode swap: `LADD` becomes `LSUB`. The stack shape stays the same (two values in, one value out), so nothing else in the method needs to change.

### Conditional: change boundary conditions

Alters comparison operators. `<` becomes `<=` or `>=`. `==` becomes `!=`. `null` checks flip.

| What you wrote | What the mutant does |
|---|---|
| `a < b` | `a <= b` or `a >= b` |
| `a <= b` | `a < b` or `a > b` |
| `a == b` | `a != b` |
| `x != None` | `x == None` |

These mutations catch off-by-one errors and missing boundary tests. If your test only checks `max(3, 5)` and never checks `max(5, 5)`, a boundary mutant that changes `<` to `<=` might survive.

### Negation removal: drop the minus sign

If your code negates a value (`-x`), this mutation removes the negation, turning `-x` into just `x`. If no test checks that the sign is correct, the mutant survives.

### Return value replacement: return a default instead

Ignores whatever the function computed and returns a default value instead: `0` for integers, `0.0` for floats, `null` for objects. This tests whether your code actually uses the return value.

### Return poison: return a value designed to cause trouble

Similar to return value replacement, but instead of benign defaults, it returns values chosen to trigger subtle bugs: `-1` for integers, the largest possible float, empty string `""` for objects.

This catches tests that only check "not null" or "not zero" without verifying the actual value. `observe result != None` catches a `null` return, but misses an empty string.

### Call skip: pretend a method call never happened

Removes an entire method call and replaces it with a default return value. If your code calls `validate(input)` and no test notices when that call disappears, then nothing is actually checking that validation happens.

## Putting it together

Take the `add` function from earlier. The compiler produces four bytecode instructions. The mutation engine scans all four and finds three possible mutations: one arithmetic swap on the `LADD`, one return-value replacement on the `LRETURN`, and one return-poison on the same `LRETURN`.

For each mutation, the engine copies the original bytecode, patches one instruction, loads the patched version into a fresh classloader, and runs the test suite. The arithmetic mutant (changing `+` to `-`) is killed because `add(2, 3)` returns `-1` instead of `5` and the test notices. The return-value mutant (returning `0` instead of the computed sum) is killed too, as is the return-poison mutant (returning `-1` regardless of input).

Three mutations, three kills, 100% mutation score. The tests are checking the behaviour of this function, not just executing it.

That is the point of mutation testing. Bytecode mutation is what makes it fast enough to run during normal development.

## Related reading

| Page | Focus |
|---|---|
| [Bytecode mutation under the hood](/articles/bytecode-mutation-under-the-hood/) | The implementation internals: ASM library, classloader isolation, kill detection, and stable mutation IDs |
| [Bytecode Mutation Is Why Vary Uses the JVM](/articles/bytecode-mutation-is-why-vary-uses-the-jvm/) | The architectural motivation for targeting JVM bytecode |
| [Why Mutation Testing Belongs in the Compiler](/articles/why-mutation-testing/) | Why Vary builds mutation into the language instead of treating it as a plugin |
| [How We Mutation Test the Compiler](/articles/how-we-mutation-test-the-compiler/) | How Vary uses different mutation strategies for Vary code and Kotlin compiler code |
