Alpha. Vary is under active development and not ready for production use. Syntax, APIs, performance, and behaviour may change between releases.

Lie detection

Scenario: Your test suite is green. Coverage is 95%. You ship. A week later, a bug report: the order matcher is using the buy price instead of the sell price. Every trade has been overcharging customers. You check the tests, and there's one called "trade price is reasonable" that asserts observe trades[0].price > 0. It passed before the bug, during the bug, and after the fix. It would pass if the price were negative a million. The test was never checking anything.

The lie detector finds these tests before your users do. It works on top of mutation testing, which makes small changes to your code (called mutants) and reruns your tests. If a test fails, the mutant is "killed" and the test is doing its job. The lie detector goes further: it checks whether the test's assertion actually depends on the values it claims to observe.

This page walks through the examples/lie_detector/ demo. For a richer demo that uses contracts (preconditions and postconditions on functions), see Lie detection: deep dive.

Four categories

The vary lie-detector command classifies every test into one of four buckets:

Category	Meaning	Action
PLACEBO	Zero assertions or all constant (`observe True`, `observe 1 == 1`)	Delete or rewrite the test
LIE	Assertion uses a runtime value, but a mutation changes that value and the test still passes	Tighten the assertion (use `==` instead of `>`)
GAP	Assertion is real, but some mutations survive outside what the test observes	Add more tests or boundary inputs
OK	All proof mutants on observed values are killed	No action needed

The scenario

scenario.vary implements a tiny order-matching engine:

data Trade { price: Int, qty: Int }
data Order { price: Int, qty: Int }

def make_buys() -> List[Order]
def make_sells() -> List[Order]
def match_orders(buys: List[Order], sells: List[Order]) -> List[Trade]
def total_qty(trades: List[Trade]) -> Int

match_orders walks buys against sells, executes a trade when the buy price meets or exceeds the sell price, and returns after the first fill. The trade price is the sell price, and the quantity is the smaller side.

Bad tests: placebos, lies, and gaps

bad_tests.vary has tests that demonstrate each failure mode:

from scenario import make_buys, make_sells, match_orders, total_qty

# PLACEBOS: constant assertions that can never fail

test "match executes" {
    match_orders(make_buys(), make_sells())
    observe True
}

test "match returns list" {
    observe 1 == 1
}

test "total is positive" {
    let t = total_qty(match_orders(make_buys(), make_sells()))
    observe True
}

# LIES: observe a value, but assertion is too loose

test "trade price is reasonable" {
    let trades = match_orders(make_buys(), make_sells())
    observe trades[0].price > 0
}

test "some quantity traded" {
    let trades = match_orders(make_buys(), make_sells())
    observe trades[0].qty < 1000
}

# GAPS: assertions reference runtime values but don't cover the mutation

test "transfer is positive" {
    let trades = match_orders(make_buys(), make_sells())
    observe len(trades) > 0
}

OK tests

ok_tests.vary tests a separate module (ok_scenario.vary) with a classify function. Each test pins exact return values, and the boundary inputs (0) catch relational mutations:

from ok_scenario import classify

test "classify pins exact categories" {
    observe classify(-5) == "negative"
    observe classify(0) == "zero"
    observe classify(5) == "positive"
}

Every proof mutant in the exercised code is killed. The lie detector scores this test as OK.

Running the lie detector

vary lie-detector examples/lie_detector/

Default output is a compact summary showing LIEs, PLACEBOs, and GAPs:

LIE DETECTOR (8 tests) — lies: 3  placebos: 3  gaps: 1  ok: 1
  LIE  bad_tests.vary:21  trade price is reasonable
  LIE  bad_tests.vary:26  some quantity traded
  LIE  bad_tests.vary:38  transfer creates money
  PLACEBO  bad_tests.vary:5  match executes
  PLACEBO  bad_tests.vary:10  match returns list
  PLACEBO  bad_tests.vary:14  total is positive
  GAP  bad_tests.vary:33  transfer is positive

OK tests only appear in the header count. Add -v for detailed output with mutation types, reasons, and actionable hints:

vary lie-detector examples/lie_detector/ -v

LIE DETECTOR (8 tests) — lies: 3  placebos: 3  gaps: 1  ok: 1

LIES (3)

  bad_tests.vary:21  trade price is reasonable
    allows: RELATIONAL
  bad_tests.vary:26  some quantity traded
    allows: RELATIONAL
  bad_tests.vary:38  transfer creates money
    allows: RELATIONAL

PLACEBOS (3)

  bad_tests.vary:5  match executes
    all assertions are constant
  bad_tests.vary:10  match returns list
    all assertions are constant
  bad_tests.vary:14  total is positive
    all assertions are constant

GAPS (1)

  bad_tests.vary:33  transfer is positive
    survives: RELATIONAL (outside observed values)
    hint: add boundary input where operands can be equal

OK (1)

The summary line tells you the shape of the problem. Verbose mode explains what went wrong and what to do about it.

How LIE vs GAP works

The engine captures the left-hand side of every observe A == B or observe A > B comparison at runtime. It then runs proof mutants and compares:

Observed value changed?	Test still passes?	Verdict
Yes	Yes	LIE: the assertion doesn't constrain what it claims to observe
No	Yes	GAP: the mutation is outside the scope of this test
Either	No	Mutant killed (good)

Anatomy of a LIE

The scenario has buys at prices 100 and 60, and sells at 50 and 110. match_orders matches the first buy (100) against the first sell (50), producing Trade(50, 3) (price is the sell price, quantity is the smaller side).

Now look at this test:

test "trade price is reasonable" {
    let trades = match_orders(make_buys(), make_sells())
    observe trades[0].price > 0
}

The trade price is 50. The test checks 50 > 0 and passes. Now a mutation changes the trade to use the buy price instead of the sell price: Trade(b.price, qty) instead of Trade(s.price, qty). The trade price shifts from 50 to 100. The test checks 100 > 0, still passes. The value the test claims to observe changed from 50 to 100, but the assertion was too loose to notice. That is a LIE.

The fix writes what you actually mean:

test "match produces trade at sell price" {
    let trades = match_orders(make_buys(), make_sells())
    observe trades[0].price == 50
}

Now 100 == 50 fails. The mutation is caught.

Anatomy of a GAP

That same good test still has a GAP. A mutation in total_qty (say, changing + to -) survives because total_qty doesn't affect trades[0].price at all. The observed value didn't change, so the test can't be blamed. The fix is not to change this assertion, but to add a separate test that observes total_qty.

How lie detection works (static analysis)

Before running proof mutants, the detector does a fast static pass. It reparses the test file and classifies every assertion call:

Classification	Meaning	Example
CONST	Every argument is a literal or constant expression	`observe True`, `observe 1 == 1`
NON_CONST	At least one argument depends on a variable or call	`observe trades[0].price == 50`

A test with zero assertions or all-CONST assertions is a PLACEBO: no proof mutation needed.

CLI reference

vary lie-detector [options] <target>

Flag	Effect
`-t`, `--tests FILE`	Specify the test file to analyze
`-v`, `--verbose`	Show detailed per-test breakdown with hints
`--json`	Output results as JSON
`--fail-on-lies`	Exit with code 1 if lies or placebos found
`--fail-on-gaps`	Exit with code 1 if gaps found (strict mode)
`--max-gaps N`	Fail if gap count exceeds budget
`--top-gaps N`	Max gaps to display in verbose mode (default: 20)
`--max-mutants N`	Max proof mutants per test (default: 50)

The lie detector also runs as part of vary mutate (the static PLACEBO check). The standalone command adds proof-based LIE vs GAP classification.

Lie-shape taxonomy

Beyond the four broad categories (PLACEBO, LIE, GAP, OK), the engine classifies each survivor into a named lie-shape that describes the specific weakness pattern:

Lie shape	Meaning
Constant Assertion	Test checks a literal/constant value, so it always passes regardless of SUT behaviour
Shallow Boolean	Test only checks true/false or null/non-null, never pins actual values
Effect-Only Check	Test checks that a side effect ran but not what it produced
Unused Return Value	Function return value is not asserted in any test
Exception-Only Check	Test only asserts no crash (or that an exception is thrown), no value pinning
Broad Smoke Test	Test exercises flow but has too few assertions to catch mutations
Boundary Gap	Boundary values are untested, so off-by-one mutations survive
Partial Observation	Some fields/properties are checked but the mutated field is not

Lie shapes appear in --why output and in the lie detector's verbose mode. They help prioritize which survivors to fix first: a Constant Assertion is a test that does nothing, while a Boundary Gap usually needs one extra input to close.

CI integration

Start permissive, then tighten:

# Phase 1: catch the worst offenders
vary lie-detector src/ --fail-on-lies

# Phase 2: set a gap budget
vary lie-detector src/ --fail-on-lies --max-gaps 20

# Phase 3: strict mode
vary lie-detector src/ --fail-on-lies --fail-on-gaps

Golden path Designing for mutation