Scenario: Your ledger passes all 84 tests. A deposit of 50 into an account with 100 shows a balance of 150. Everything works. Then someone refactors the
depositmethod and accidentally changes+ amountto- amount. The balance after depositing 50 is now 50 instead of 150. But the test "deposit adds something" assertsobserve l.balance(a) != -1. The balance is 50, which is still not -1, so the test passes. Money is vanishing and every test is green.
This page walks through examples/ledger/, a double-entry ledger with contracts, and shows how the lie detector catches these tests. For the basics of lie detection, see Lie detection.
ledger.vary implements a simple account ledger with contracts on every method. Contracts are preconditions (in) and postconditions (post, out) that the compiler checks at runtime. If a contract is violated, the program throws a ContractViolation error. The old() expression captures a value before the function body runs, so the postcondition can compare before and after:
class Ledger {
mut accounts: Dict[Str, Int]
mut names: Dict[Str, Str]
mut next_id: Int
def create_account(self, name: Str, initial_balance: Int) -> Str {
in { initial_balance >= 0 }
out (id) { id in self.accounts }
...
}
def deposit(self, id: Str, amount: Int) -> None {
in { id in self.accounts; amount > 0 }
post { self.accounts[id] == old(self.accounts[id]) + amount }
...
}
def withdraw(self, id: Str, amount: Int) -> None {
in { id in self.accounts; amount > 0; amount <= self.accounts[id] }
post { self.accounts[id] == old(self.accounts[id]) - amount }
...
}
def transfer(self, from_id: Str, to_id: Str, amount: Int) -> None {
in { from_id in self.accounts; to_id in self.accounts
amount > 0; amount <= self.accounts[from_id] }
post { self.total_balance() == old(self.total_balance()) }
...
}
def balance(self, id: Str) -> Int {
in { id in self.accounts }
out (b) { b >= 0 }
...
}
}
The post { self.total_balance() == old(self.total_balance()) } on transfer is a conservation law: total money in the system never changes. This contract acts as an oracle, a second source of truth that decides whether output is correct. It will throw ContractViolation if an arithmetic mutation changes how money moves, killing the mutant without any test assertion needed.
lie_tests.vary has tests in all three buckets:
from ledger import Ledger
# PLACEBO: structurally incapable of failing
test "ledger creates accounts" {
let l = Ledger()
l.create_account("Alice", 100)
l.create_account("Bob", 50)
observe True
}
# LIES: observed value changes under mutation, test still passes
test "alice has funds after transfer" {
let l = Ledger()
let a = l.create_account("Alice", 100)
let b = l.create_account("Bob", 50)
l.transfer(a, b, 30)
observe l.balance(a) >= 0
}
test "deposit adds something" {
let l = Ledger()
let a = l.create_account("Alice", 100)
l.deposit(a, 50)
observe l.balance(a) != -1
}
# GOOD: exact assertions + contracts
test "transfer conserves total" {
let l = Ledger()
let a = l.create_account("Alice", 100)
let b = l.create_account("Bob", 50)
l.transfer(a, b, 30)
observe l.total_balance() == 150
}
test "transfer moves exact amount" {
let l = Ledger()
let a = l.create_account("Alice", 100)
let b = l.create_account("Bob", 50)
l.transfer(a, b, 30)
observe l.balance(a) == 70
observe l.balance(b) == 80
}
test "deposit postcondition enforces exact addition" {
let l = Ledger()
let a = l.create_account("Alice", 0)
l.deposit(a, 42)
observe l.balance(a) == 42
}
Note: Vary's test discovery looks for files named
test_*.varyor*_test.vary. The filelie_tests.varyis intentionally named outside that convention so it stays separate from the ledger's real test suite. Use-tto point at it explicitly.
Point the lie detector at the ledger source with the lie tests:
vary lie-detector examples/ledger/ledger.vary -t examples/ledger/lie_tests.vary
Default output is compact:
LIE DETECTOR (6 tests) — lies: 2 placebos: 1 gaps: 3 ok: 0
LIE lie_tests.vary:16 alice has funds after transfer
LIE lie_tests.vary:24 deposit adds something
PLACEBO lie_tests.vary:5 ledger creates accounts
Add -v for the full breakdown with mutation types and hints:
vary lie-detector examples/ledger/ledger.vary -t examples/ledger/lie_tests.vary -v
LIE DETECTOR (6 tests) — lies: 2 placebos: 1 gaps: 3 ok: 0
LIES (2)
lie_tests.vary:16 alice has funds after transfer
allows: RET_DEFAULT
lie_tests.vary:24 deposit adds something
allows: RET_DEFAULT
PLACEBOS (1)
lie_tests.vary:5 ledger creates accounts
all assertions are constant
GAPS (3)
lie_tests.vary:35 transfer conserves total
survives: SKIP_EFFECT (outside observed values)
hint: observe the side effect (e.g. collection size, field value)
lie_tests.vary:43 transfer moves exact amount
survives: SKIP_EFFECT (outside observed values)
hint: observe the side effect (e.g. collection size, field value)
lie_tests.vary:52 deposit postcondition enforces exact addition
survives: SKIP_EFFECT (outside observed values)
hint: observe the side effect (e.g. collection size, field value)
Every test lands in one of the three buckets. The good tests are classified as GAP rather than LIE. A surviving mutant exists, but it doesn't change the values those tests observe. That's a coverage hole, not a weak assertion.
"alice has funds after transfer": Alice starts with 100, transfers 30 to Bob, and has 70 left:
test "alice has funds after transfer" {
let l = Ledger()
let a = l.create_account("Alice", 100)
let b = l.create_account("Bob", 50)
l.transfer(a, b, 30)
observe l.balance(a) >= 0
}
A RET_DEFAULT mutation (one of Vary's mutation operators) makes balance() return 0 instead of 70. The test checks 0 >= 0, still passes. The value the test claims to observe changed from 70 to 0, but the assertion was too loose to notice. That is a LIE. The fix: observe l.balance(a) == 70.
"deposit adds something": Alice starts with 100, deposits 50, and has 150:
test "deposit adds something" {
let l = Ledger()
let a = l.create_account("Alice", 100)
l.deposit(a, 50)
observe l.balance(a) != -1
}
Same pattern. The RET_DEFAULT operator makes balance() return 0. The test checks 0 != -1, still passes. The fix: observe l.balance(a) == 150.
"transfer conserves total" asserts observe l.total_balance() == 150. A SKIP_EFFECT mutation removes a side effect (e.g. skipping self.accounts[to_id] = ...). The mutation survives because the contract post { self.total_balance() == old(self.total_balance()) } catches it before the assertion runs, but the observed value (total_balance()) doesn't change from the test's perspective. This is a coverage hole, not a weakness in the assertion.
The fix is in the hint: "observe the side effect." A test that checks individual account balances after transfer (like observe l.balance(a) == 70) would catch skip-effect mutations directly.
Without the post { self.total_balance() == old(self.total_balance()) } contract on transfer, an ARITHMETIC mutation that changes - amount to + amount in transfer would make money appear from nowhere. The lie-detector would still catch this as a LIE if you have a loose assertion, but the contract catches it independently. A ContractViolation kills the mutant before the test assertion even runs.
Contracts act as additional oracles (independent checks that catch wrong behaviour):
| Contract | What it catches |
|---|---|
post { self.accounts[id] == old(...) + amount } | Any arithmetic mutation in deposit |
post { self.total_balance() == old(self.total_balance()) } | Any mutation that creates or destroys money in transfer |
out (b) { b >= 0 } on balance | Mutations that produce negative balances |
The GAP classification is accurate: the exact-value tests are correct, and the remaining survivors are in code paths that contracts already guard or that need additional boundary inputs.
| Category | Fix |
|---|---|
| PLACEBO ("ledger creates accounts") | Replace observe True with observe l.account_count() == 2 |
| LIE ("alice has funds after transfer") | Tighten >= 0 to == 70 |
| LIE ("deposit adds something") | Tighten != -1 to == 150 |
| GAP (all three good tests) | Already correct assertions; add tests that observe each individual side effect |
The lie detector tells you which tests are weak. vary mutate tells you which code is weak. They answer different questions:
| Tool | Question it answers |
|---|---|
vary lie-detector | "Which tests are lying to me?" |
vary mutate | "Which code paths survive all my tests?" |
Run vary mutate on the same ledger with the same lie tests:
vary mutate examples/ledger/ledger.vary -t examples/ledger/lie_tests.vary
Score: 29% killed (23/78)
Killed by contracts: PRE: 4, POST: 8
Test Strength: Weak (29%)
Lie Risk: HIGH (1 tests)
lie_tests.vary:5 "ledger creates accounts"
all assertions are constant (equivalent to || true)
Top Contract Kill Sites
12 mutants killed by contract violations
POST deposit ledger.vary:30 (3 kills)
POST transfer ledger.vary:49 (3 kills)
PRE create_account ledger.vary:13 (2 kills)
... and 4 more sites
Top survivor groups (9)
GROUP SURV ACTION
ledger#Ledger.withdraw 14 defend contracts
ledger#Ledger.init 10 assert return values
ledger#Ledger.account_count 7 defend contracts
ledger#Ledger.transfer 6 defend contracts
ledger#Ledger.create_account 5 assert return values
29% killed. The lie tests leave 55 of 78 mutants alive. Three things to notice:
The lies show up here too. The "Lie Risk: HIGH" warning flags the same placebo the lie detector found. vary mutate runs the static placebo check as part of its analysis.
Contracts are doing most of the work. 12 of the 23 kills came from contract violations, not test assertions. The lie tests are so loose that the contracts catch more bugs than the tests do.
The survivor table shows the cost. withdraw has 14 survivors because no lie test touches it at all. The constructor has 10 survivors because no test checks the state after construction.
Compare this to the real test suite (ledger_test.vary, 84 tests): that scores 91% killed with only 6 survivors. The difference is exact assertions. The lie detector finds the weak tests; vary mutate shows you what that weakness costs.
The fix pattern: tighten assertions until the lies are gone, then add tests for the survivor groups until the score reaches your target.
# This test defends the deposit precondition.
# If the contract is removed by mutation, this test fails.
test "deposit rejects negative amount" {
let l = Ledger()
let a = l.create_account("Alice", 100)
observe throws { l.deposit(a, -1) }
}