Bytecode mutation under the hood

Companion to How bytecode mutation testing works, which covers the concepts. This one is about implementation: which library does the patching, what actually changes inside a class file, and how the runner decides whether a mutant got caught.

If you're after the full opcode-by-opcode mapping (every arithmetic swap, every conditional flip), that lives in the bytecode operators reference where it can stay current as we add operators.

Vary types and the bytecode you actually see

Vary's primitive types map to JVM types in a particular way, and that shape is what the mutation engine encounters in the wild:

Vary type	JVM type	Stack slots
`Int`	`long` (64-bit)	2
`Float`	`double` (64-bit)	2
`Bool`	`boolean`	1
`Str`	`String`	1

Because Int is 64-bit, compiled Vary uses LADD and LSUB, not the 32-bit IADD/ISUB you'd see in handwritten Java. That also means the engine reaches for POP2 (two slots) when discarding a number, instead of POP. A small detail, but the kind of thing that bites you if you assume 32-bit arithmetic.

Comparisons add a wrinkle. A long comparison is two instructions: LCMP followed by a conditional jump like IFLT. The mutation engine targets the jump, not the LCMP, so a < b becomes a <= b by swapping IFLT for IFLE while leaving the rest of the sequence alone.

ASM does the patching

Vary uses the ObjectWeb ASM library for all bytecode work. ASM has two APIs: a streaming visitor and a tree of mutable instruction nodes. The mutation engine uses the tree, which lets it walk to a specific instruction, swap it out, and write the class back.

Reading bytecode in looks roughly like this:

val classReader = ClassReader(bytecode)
val classNode = ClassNode()
classReader.accept(classNode, 0)

When the modified tree gets written out, ASM's COMPUTE_FRAMES flag recalculates the JVM's stack frame verification data. The verifier checks that data when loading a class, so any mutation that changes the instruction count (return-value mutation, call skip) needs the frames recomputed. A pure opcode swap doesn't move any slots, but recomputation is cheap, so the engine just always asks for it.

What the mutator actually does

The mutator gets a target (class, method, instruction index) and a mutation type, walks to the right node, and patches it. The shape of the patch depends on the mutation.

Mutation	Patch shape
Arithmetic and negation	A single node swap. `LADD` becomes `LSUB`; `LNEG` becomes `NOP`.
Conditional	Replace the jump instruction, keeping the same target label.
Return value and return poison	Insert a pop and a constant push before the return, so the computed value gets discarded and a default (or adversarial) value goes in its place.
Call skip	Pop the arguments and receiver off the stack, push a default return value, replace the call with `NOP`. Constructors are excluded; skipping one would leave an uninitialized object on the heap.

Most cases boil down to a single line:

val newInsn = InsnNode(mutation.mutatedOpcode)
instructions.set(insn, newInsn)

The full opcode tables (which arithmetic ops swap to what, which conditional jumps invert which way) live in the bytecode operators reference.

Each mutant runs in its own classloader

Mutants don't get swapped into a shared JVM. Each one runs through a fresh ClassLoader, defined from raw bytes:

val classLoader = object : ClassLoader() {
    fun defineClass(name: String, bytes: ByteArray): Class<*> =
        defineClass(name, bytes, 0, bytes.size)
}

The mutated source class and the test classes get loaded together. Tests are discovered through Vary's test DSL (the compiler emits parallel name and method arrays so the runner can iterate without reflecting by string). Each test runs with a per-test timeout, defaulting to five seconds, so a mutation that triggers an infinite loop gets caught instead of hanging the whole suite.

Timeouts count as kills. A non-terminating mutant is still a detected mutant.

Did the mutant get caught?

After the test suite finishes against a mutant, the runner compares its results to the baseline (the same suite against the original code). Any test that flipped from pass to fail is the mutant's killer:

for ((testName, originalPassed) in originalResults) {
    val mutantPassed = mutantResults[testName] ?: false
    if (originalPassed && !mutantPassed) {
        killedBy.add(testName)
    }
}

If at least one test killed it, the mutant is dead. If every test still passes, the mutant survived, and it goes into the report so a human can decide whether the test suite is missing an assertion.

There's one small optimization worth calling out. Before the runner even loads a mutant, it checks whether the mutated bytecode is actually different from the original. ASM's frame recomputation can absorb some changes, and a NOP patch in a spot the verifier doesn't care about can produce identical output. Those "equivalent mutants" get marked and skipped before any tests run.

Stable IDs and incremental runs

Each mutation gets a content-addressed ID built from the class name, method name, descriptor, instruction index, and the specific opcode swap. Same mutation in the same place, same ID, every time.

That ID is the hook for incremental mutation testing. The engine hashes each method's bytecode (ignoring labels and line numbers), compares the hash to the previous run, and skips the methods whose hash hasn't changed. Most edits only touch a few methods, so most mutants don't need to re-run.

Flake handling

When the suite finishes, surviving mutants get re-run once. If a survivor flips to killed on the second pass, the test that flipped is treated as flaky and the mutant is dropped from the score. It doesn't catch every flake, but it catches the obvious ones, and it stops a non-deterministic test from inflating the survival count.

Page	Focus
How bytecode mutation testing works	The accessible introduction, with the worked `add` example
Bytecode mutation is why Vary uses the JVM	The architectural motivation for targeting JVM bytecode
Bytecode operators reference	Full per-operator opcode tables