Alpha. Vary is under active development and not ready for production use. Syntax, APIs, performance, and behaviour may change between releases.
Lie detection
Scenario: Your test suite is green. Coverage is 95%. You ship. A week later, a bug report: the order matcher is using the buy price instead of the sell price. Every trade has been overcharging customers. You check the tests, and there's one called "trade price is reasonable" that asserts
observe trades[0].price > 0. It passed before the bug, during the bug, and after the fix. It would pass if the price were negative a million. The test was never checking anything.
The lie detector finds these tests before your users do. It works on top of mutation testing, which makes small changes to your code (called mutants) and reruns your tests. If a test fails, the mutant is "killed" and the test is doing its job. The lie detector goes further: it checks whether the test's assertion actually depends on the values it claims to observe.
This page walks through the examples/lie_detector/ demo. For a richer demo that uses contracts (preconditions and postconditions on functions), see Lie detection: deep dive.
Four categories
The vary lie-detector command classifies every test into one of four buckets:
| Category | Meaning | Action |
|---|---|---|
| PLACEBO | Zero assertions or all constant (observe True, observe 1 == 1) | Delete or rewrite the test |
| LIE | Assertion uses a runtime value, but a mutation changes that value and the test still passes | Tighten the assertion (use == instead of >) |
| GAP | Assertion is real, but some mutations survive outside what the test observes | Add more tests or boundary inputs |
| OK | All proof mutants on observed values are killed | No action needed |
The scenario
scenario.vary implements a tiny order-matching engine:
data Trade { price: Int, qty: Int }
data Order { price: Int, qty: Int }
def make_buys() -> List[Order]
def make_sells() -> List[Order]
def match_orders(buys: List[Order], sells: List[Order]) -> List[Trade]
def total_qty(trades: List[Trade]) -> Int
match_orders walks buys against sells, executes a trade when the buy price meets or exceeds the sell price, and returns after the first fill. The trade price is the sell price, and the quantity is the smaller side.
Bad tests: placebos, lies, and gaps
bad_tests.vary has tests that demonstrate each failure mode:
from scenario import make_buys, make_sells, match_orders, total_qty
# PLACEBOS: constant assertions that can never fail
test "match executes" {
match_orders(make_buys(), make_sells())
observe True
}
test "match returns list" {
observe 1 == 1
}
test "total is positive" {
let t = total_qty(match_orders(make_buys(), make_sells()))
observe True
}
# LIES: observe a value, but assertion is too loose
test "trade price is reasonable" {
let trades = match_orders(make_buys(), make_sells())
observe trades[0].price > 0
}
test "some quantity traded" {
let trades = match_orders(make_buys(), make_sells())
observe trades[0].qty < 1000
}
# GAPS: assertions reference runtime values but don't cover the mutation
test "transfer is positive" {
let trades = match_orders(make_buys(), make_sells())
observe len(trades) > 0
}
OK tests
ok_tests.vary tests a separate module (ok_scenario.vary) with a classify function. Each test pins exact return values, and the boundary inputs (0) catch relational mutations:
from ok_scenario import classify
test "classify pins exact categories" {
observe classify(-5) == "negative"
observe classify(0) == "zero"
observe classify(5) == "positive"
}
Every proof mutant in the exercised code is killed. The lie detector scores this test as OK.
Running the lie detector
vary lie-detector examples/lie_detector/
Default output is a compact summary showing LIEs, PLACEBOs, and GAPs:
LIE DETECTOR (8 tests) — lies: 3 placebos: 3 gaps: 1 ok: 1
LIE bad_tests.vary:21 trade price is reasonable
LIE bad_tests.vary:26 some quantity traded
LIE bad_tests.vary:38 transfer creates money
PLACEBO bad_tests.vary:5 match executes
PLACEBO bad_tests.vary:10 match returns list
PLACEBO bad_tests.vary:14 total is positive
GAP bad_tests.vary:33 transfer is positive
OK tests only appear in the header count. Add -v for detailed output with mutation types, reasons, and actionable hints:
vary lie-detector examples/lie_detector/ -v
LIE DETECTOR (8 tests) — lies: 3 placebos: 3 gaps: 1 ok: 1
LIES (3)
bad_tests.vary:21 trade price is reasonable
allows: RELATIONAL
bad_tests.vary:26 some quantity traded
allows: RELATIONAL
bad_tests.vary:38 transfer creates money
allows: RELATIONAL
PLACEBOS (3)
bad_tests.vary:5 match executes
all assertions are constant
bad_tests.vary:10 match returns list
all assertions are constant
bad_tests.vary:14 total is positive
all assertions are constant
GAPS (1)
bad_tests.vary:33 transfer is positive
survives: RELATIONAL (outside observed values)
hint: add boundary input where operands can be equal
OK (1)
The summary line tells you the shape of the problem. Verbose mode explains what went wrong and what to do about it.
How LIE vs GAP works
The engine captures the left-hand side of every observe A == B or observe A > B comparison at runtime. It then runs proof mutants and compares:
| Observed value changed? | Test still passes? | Verdict |
|---|---|---|
| Yes | Yes | LIE: the assertion doesn't constrain what it claims to observe |
| No | Yes | GAP: the mutation is outside the scope of this test |
| Either | No | Mutant killed (good) |
Anatomy of a LIE
The scenario has buys at prices 100 and 60, and sells at 50 and 110. match_orders matches the first buy (100) against the first sell (50), producing Trade(50, 3) (price is the sell price, quantity is the smaller side).
Now look at this test:
test "trade price is reasonable" {
let trades = match_orders(make_buys(), make_sells())
observe trades[0].price > 0
}
The trade price is 50. The test checks 50 > 0 and passes. Now a mutation changes the trade to use the buy price instead of the sell price: Trade(b.price, qty) instead of Trade(s.price, qty). The trade price shifts from 50 to 100. The test checks 100 > 0, still passes. The value the test claims to observe changed from 50 to 100, but the assertion was too loose to notice. That is a LIE.
The fix writes what you actually mean:
test "match produces trade at sell price" {
let trades = match_orders(make_buys(), make_sells())
observe trades[0].price == 50
}
Now 100 == 50 fails. The mutation is caught.
Anatomy of a GAP
That same good test still has a GAP. A mutation in total_qty (say, changing + to -) survives because total_qty doesn't affect trades[0].price at all. The observed value didn't change, so the test can't be blamed. The fix is not to change this assertion, but to add a separate test that observes total_qty.
How lie detection works (static analysis)
Before running proof mutants, the detector does a fast static pass. It reparses the test file and classifies every assertion call:
| Classification | Meaning | Example |
|---|---|---|
| CONST | Every argument is a literal or constant expression | observe True, observe 1 == 1 |
| NON_CONST | At least one argument depends on a variable or call | observe trades[0].price == 50 |
A test with zero assertions or all-CONST assertions is a PLACEBO: no proof mutation needed.
CLI reference
vary lie-detector [options] <target>
| Flag | Effect |
|---|---|
-t, --tests FILE | Specify the test file to analyze |
-v, --verbose | Show detailed per-test breakdown with hints |
--json | Output results as JSON |
--fail-on-lies | Exit with code 1 if lies or placebos found |
--fail-on-gaps | Exit with code 1 if gaps found (strict mode) |
--max-gaps N | Fail if gap count exceeds budget |
--top-gaps N | Max gaps to display in verbose mode (default: 20) |
--max-mutants N | Max proof mutants per test (default: 50) |
The lie detector also runs as part of vary mutate (the static PLACEBO check). The standalone command adds proof-based LIE vs GAP classification.
Lie-shape taxonomy
Beyond the four broad categories (PLACEBO, LIE, GAP, OK), the engine classifies each survivor into a named lie-shape that describes the specific weakness pattern:
| Lie shape | Meaning |
|---|---|
| Constant Assertion | Test checks a literal/constant value, so it always passes regardless of SUT behaviour |
| Shallow Boolean | Test only checks true/false or null/non-null, never pins actual values |
| Effect-Only Check | Test checks that a side effect ran but not what it produced |
| Unused Return Value | Function return value is not asserted in any test |
| Exception-Only Check | Test only asserts no crash (or that an exception is thrown), no value pinning |
| Broad Smoke Test | Test exercises flow but has too few assertions to catch mutations |
| Boundary Gap | Boundary values are untested, so off-by-one mutations survive |
| Partial Observation | Some fields/properties are checked but the mutated field is not |
Lie shapes appear in --why output and in the lie detector's verbose mode. They help prioritize which survivors to fix first: a Constant Assertion is a test that does nothing, while a Boundary Gap usually needs one extra input to close.
CI integration
Start permissive, then tighten:
# Phase 1: catch the worst offenders
vary lie-detector src/ --fail-on-lies
# Phase 2: set a gap budget
vary lie-detector src/ --fail-on-lies --max-gaps 20
# Phase 3: strict mode
vary lie-detector src/ --fail-on-lies --fail-on-gaps