Lie detection: deep dive — Markdown View

Scenario: Your ledger passes all 84 tests. A deposit of 50 into an account with 100 shows a balance of 150. Everything works. Then someone refactors the deposit method and accidentally changes + amount to - amount. The balance after depositing 50 is now 50 instead of 150. But the test "deposit adds something" asserts observe l.balance(a) != -1. The balance is 50, which is still not -1, so the test passes. Money is vanishing and every test is green.

This page walks through examples/ledger/, a double-entry ledger with contracts, and shows how the lie detector catches these tests. For the basics of lie detection, see Lie detection.

The ledger

ledger.vary implements a simple account ledger with contracts on every method. Contracts are preconditions (in) and postconditions (post, out) that the compiler checks at runtime. If a contract is violated, the program throws a ContractViolation error. The old() expression captures a value before the function body runs, so the postcondition can compare before and after:

class Ledger {
    mut accounts: Dict[Str, Int]
    mut names: Dict[Str, Str]
    mut next_id: Int

    def create_account(self, name: Str, initial_balance: Int) -> Str {
        in { initial_balance >= 0 }
        out (id) { id in self.accounts }
        ...
    }

    def deposit(self, id: Str, amount: Int) -> None {
        in { id in self.accounts; amount > 0 }
        post { self.accounts[id] == old(self.accounts[id]) + amount }
        ...
    }

    def withdraw(self, id: Str, amount: Int) -> None {
        in { id in self.accounts; amount > 0; amount <= self.accounts[id] }
        post { self.accounts[id] == old(self.accounts[id]) - amount }
        ...
    }

    def transfer(self, from_id: Str, to_id: Str, amount: Int) -> None {
        in { from_id in self.accounts; to_id in self.accounts
             amount > 0; amount <= self.accounts[from_id] }
        post { self.total_balance() == old(self.total_balance()) }
        ...
    }

    def balance(self, id: Str) -> Int {
        in { id in self.accounts }
        out (b) { b >= 0 }
        ...
    }
}

The post { self.total_balance() == old(self.total_balance()) } on transfer is a conservation law: total money in the system never changes. This contract acts as an oracle, a second source of truth that decides whether output is correct. It will throw ContractViolation if an arithmetic mutation changes how money moves, killing the mutant without any test assertion needed.

The lie tests

lie_tests.vary has tests in all three buckets:

from ledger import Ledger

# PLACEBO: structurally incapable of failing

test "ledger creates accounts" {
    let l = Ledger()
    l.create_account("Alice", 100)
    l.create_account("Bob", 50)
    observe True
}

# LIES: observed value changes under mutation, test still passes

test "alice has funds after transfer" {
    let l = Ledger()
    let a = l.create_account("Alice", 100)
    let b = l.create_account("Bob", 50)
    l.transfer(a, b, 30)
    observe l.balance(a) >= 0
}

test "deposit adds something" {
    let l = Ledger()
    let a = l.create_account("Alice", 100)
    l.deposit(a, 50)
    observe l.balance(a) != -1
}

# GOOD: exact assertions + contracts

test "transfer conserves total" {
    let l = Ledger()
    let a = l.create_account("Alice", 100)
    let b = l.create_account("Bob", 50)
    l.transfer(a, b, 30)
    observe l.total_balance() == 150
}

test "transfer moves exact amount" {
    let l = Ledger()
    let a = l.create_account("Alice", 100)
    let b = l.create_account("Bob", 50)
    l.transfer(a, b, 30)
    observe l.balance(a) == 70
    observe l.balance(b) == 80
}

test "deposit postcondition enforces exact addition" {
    let l = Ledger()
    let a = l.create_account("Alice", 0)
    l.deposit(a, 42)
    observe l.balance(a) == 42
}

Running the lie detector

Note: Vary's test discovery looks for files named test_*.vary or *_test.vary. The file lie_tests.vary is intentionally named outside that convention so it stays separate from the ledger's real test suite. Use -t to point at it explicitly.

Point the lie detector at the ledger source with the lie tests:

vary lie-detector examples/ledger/ledger.vary -t examples/ledger/lie_tests.vary

Default output is compact:

LIE DETECTOR (6 tests) — lies: 2  placebos: 1  gaps: 3  ok: 0
  LIE  lie_tests.vary:16  alice has funds after transfer
  LIE  lie_tests.vary:24  deposit adds something
  PLACEBO  lie_tests.vary:5  ledger creates accounts

Add -v for the full breakdown with mutation types and hints:

vary lie-detector examples/ledger/ledger.vary -t examples/ledger/lie_tests.vary -v

LIE DETECTOR (6 tests) — lies: 2  placebos: 1  gaps: 3  ok: 0

LIES (2)

  lie_tests.vary:16  alice has funds after transfer
    allows: RET_DEFAULT
  lie_tests.vary:24  deposit adds something
    allows: RET_DEFAULT

PLACEBOS (1)

  lie_tests.vary:5  ledger creates accounts
    all assertions are constant

GAPS (3)

  lie_tests.vary:35  transfer conserves total
    survives: SKIP_EFFECT (outside observed values)
    hint: observe the side effect (e.g. collection size, field value)
  lie_tests.vary:43  transfer moves exact amount
    survives: SKIP_EFFECT (outside observed values)
    hint: observe the side effect (e.g. collection size, field value)
  lie_tests.vary:52  deposit postcondition enforces exact addition
    survives: SKIP_EFFECT (outside observed values)
    hint: observe the side effect (e.g. collection size, field value)

Every test lands in one of the three buckets. The good tests are classified as GAP rather than LIE. A surviving mutant exists, but it doesn't change the values those tests observe. That's a coverage hole, not a weak assertion.

Why the LIEs are lies

"alice has funds after transfer": Alice starts with 100, transfers 30 to Bob, and has 70 left:

test "alice has funds after transfer" {
    let l = Ledger()
    let a = l.create_account("Alice", 100)
    let b = l.create_account("Bob", 50)
    l.transfer(a, b, 30)
    observe l.balance(a) >= 0
}

A RET_DEFAULT mutation (one of Vary's mutation operators) makes balance() return 0 instead of 70. The test checks 0 >= 0, still passes. The value the test claims to observe changed from 70 to 0, but the assertion was too loose to notice. That is a LIE. The fix: observe l.balance(a) == 70.

"deposit adds something": Alice starts with 100, deposits 50, and has 150:

test "deposit adds something" {
    let l = Ledger()
    let a = l.create_account("Alice", 100)
    l.deposit(a, 50)
    observe l.balance(a) != -1
}

Same pattern. The RET_DEFAULT operator makes balance() return 0. The test checks 0 != -1, still passes. The fix: observe l.balance(a) == 150.

Why the GAPs are gaps

"transfer conserves total" asserts observe l.total_balance() == 150. A SKIP_EFFECT mutation removes a side effect (e.g. skipping self.accounts[to_id] = ...). The mutation survives because the contract post { self.total_balance() == old(self.total_balance()) } catches it before the assertion runs, but the observed value (total_balance()) doesn't change from the test's perspective. This is a coverage hole, not a weakness in the assertion.

The fix is in the hint: "observe the side effect." A test that checks individual account balances after transfer (like observe l.balance(a) == 70) would catch skip-effect mutations directly.

How contracts help

Without the post { self.total_balance() == old(self.total_balance()) } contract on transfer, an ARITHMETIC mutation that changes - amount to + amount in transfer would make money appear from nowhere. The lie-detector would still catch this as a LIE if you have a loose assertion, but the contract catches it independently. A ContractViolation kills the mutant before the test assertion even runs.

Contracts act as additional oracles (independent checks that catch wrong behaviour):

Contract	What it catches
`post { self.accounts[id] == old(...) + amount }`	Any arithmetic mutation in `deposit`
`post { self.total_balance() == old(self.total_balance()) }`	Any mutation that creates or destroys money in `transfer`
`out (b) { b >= 0 }` on `balance`	Mutations that produce negative balances

The GAP classification is accurate: the exact-value tests are correct, and the remaining survivors are in code paths that contracts already guard or that need additional boundary inputs.

Fixing each category

Category	Fix
PLACEBO ("ledger creates accounts")	Replace `observe True` with `observe l.account_count() == 2`
LIE ("alice has funds after transfer")	Tighten `>= 0` to `== 70`
LIE ("deposit adds something")	Tighten `!= -1` to `== 150`
GAP (all three good tests)	Already correct assertions; add tests that observe each individual side effect

From lie-detector to mutate

The lie detector tells you which tests are weak. vary mutate tells you which code is weak. They answer different questions:

Tool	Question it answers
`vary lie-detector`	"Which tests are lying to me?"
`vary mutate`	"Which code paths survive all my tests?"

Run vary mutate on the same ledger with the same lie tests:

vary mutate examples/ledger/ledger.vary -t examples/ledger/lie_tests.vary

Score: 29% killed (23/78)
Killed by contracts: PRE: 4, POST: 8
Test Strength: Weak (29%)

Lie Risk: HIGH (1 tests)
  lie_tests.vary:5  "ledger creates accounts"
    all assertions are constant (equivalent to || true)

Top Contract Kill Sites
12 mutants killed by contract violations

  POST deposit   ledger.vary:30  (3 kills)
  POST transfer  ledger.vary:49  (3 kills)
  PRE  create_account  ledger.vary:13  (2 kills)
  ... and 4 more sites

Top survivor groups (9)

GROUP                         SURV  ACTION
ledger#Ledger.withdraw          14  defend contracts
ledger#Ledger.init              10  assert return values
ledger#Ledger.account_count      7  defend contracts
ledger#Ledger.transfer           6  defend contracts
ledger#Ledger.create_account     5  assert return values

29% killed. The lie tests leave 55 of 78 mutants alive. Three things to notice:

The lies show up here too. The "Lie Risk: HIGH" warning flags the same placebo the lie detector found. vary mutate runs the static placebo check as part of its analysis.

Contracts are doing most of the work. 12 of the 23 kills came from contract violations, not test assertions. The lie tests are so loose that the contracts catch more bugs than the tests do.

The survivor table shows the cost. withdraw has 14 survivors because no lie test touches it at all. The constructor has 10 survivors because no test checks the state after construction.

Compare this to the real test suite (ledger_test.vary, 84 tests): that scores 91% killed with only 6 survivors. The difference is exact assertions. The lie detector finds the weak tests; vary mutate shows you what that weakness costs.

The fix pattern: tighten assertions until the lies are gone, then add tests for the survivor groups until the score reaches your target.

# This test defends the deposit precondition.
# If the contract is removed by mutation, this test fails.
test "deposit rejects negative amount" {
    let l = Ledger()
    let a = l.create_account("Alice", 100)
    observe throws { l.deposit(a, -1) }
}