Alpha. Vary is under active development and not ready for production use. Syntax, APIs, performance, and behaviour may change between releases.

Lie detection

Scenario: Your test suite is green. Coverage is 95%. You ship. A week later, a bug report: the order matcher is using the buy price instead of the sell price. Every trade has been overcharging customers. You check the tests, and there's one called "trade price is reasonable" that asserts observe trades[0].price > 0. It passed before the bug, during the bug, and after the fix. It would pass if the price were negative a million. The test was never checking anything.

The lie detector finds these tests before your users do. It works on top of mutation testing, which makes small changes to your code (called mutants) and reruns your tests. If a test fails, the mutant is "killed" and the test is doing its job. The lie detector goes further: it checks whether the test's assertion actually depends on the values it claims to observe.

This page walks through the examples/lie_detector/ demo. For a richer demo that uses contracts (preconditions and postconditions on functions), see Lie detection: deep dive.

Four categories

The vary lie-detector command classifies every test into one of four buckets:

CategoryMeaningAction
PLACEBOZero assertions or all constant (observe True, observe 1 == 1)Delete or rewrite the test
LIEAssertion uses a runtime value, but a mutation changes that value and the test still passesTighten the assertion (use == instead of >)
GAPAssertion is real, but some mutations survive outside what the test observesAdd more tests or boundary inputs
OKAll proof mutants on observed values are killedNo action needed

The scenario

scenario.vary implements a tiny order-matching engine:

data Trade { price: Int, qty: Int }
data Order { price: Int, qty: Int }

def make_buys() -> List[Order]
def make_sells() -> List[Order]
def match_orders(buys: List[Order], sells: List[Order]) -> List[Trade]
def total_qty(trades: List[Trade]) -> Int

match_orders walks buys against sells, executes a trade when the buy price meets or exceeds the sell price, and returns after the first fill. The trade price is the sell price, and the quantity is the smaller side.

Bad tests: placebos, lies, and gaps

bad_tests.vary has tests that demonstrate each failure mode:

from scenario import make_buys, make_sells, match_orders, total_qty

# PLACEBOS: constant assertions that can never fail

test "match executes" {
    match_orders(make_buys(), make_sells())
    observe True
}

test "match returns list" {
    observe 1 == 1
}

test "total is positive" {
    let t = total_qty(match_orders(make_buys(), make_sells()))
    observe True
}

# LIES: observe a value, but assertion is too loose

test "trade price is reasonable" {
    let trades = match_orders(make_buys(), make_sells())
    observe trades[0].price > 0
}

test "some quantity traded" {
    let trades = match_orders(make_buys(), make_sells())
    observe trades[0].qty < 1000
}

# GAPS: assertions reference runtime values but don't cover the mutation

test "transfer is positive" {
    let trades = match_orders(make_buys(), make_sells())
    observe len(trades) > 0
}

OK tests

ok_tests.vary tests a separate module (ok_scenario.vary) with a classify function. Each test pins exact return values, and the boundary inputs (0) catch relational mutations:

from ok_scenario import classify

test "classify pins exact categories" {
    observe classify(-5) == "negative"
    observe classify(0) == "zero"
    observe classify(5) == "positive"
}

Every proof mutant in the exercised code is killed. The lie detector scores this test as OK.

Running the lie detector

vary lie-detector examples/lie_detector/

Default output is a compact summary showing LIEs, PLACEBOs, and GAPs:

LIE DETECTOR (8 tests) — lies: 3  placebos: 3  gaps: 1  ok: 1
  LIE  bad_tests.vary:21  trade price is reasonable
  LIE  bad_tests.vary:26  some quantity traded
  LIE  bad_tests.vary:38  transfer creates money
  PLACEBO  bad_tests.vary:5  match executes
  PLACEBO  bad_tests.vary:10  match returns list
  PLACEBO  bad_tests.vary:14  total is positive
  GAP  bad_tests.vary:33  transfer is positive

OK tests only appear in the header count. Add -v for detailed output with mutation types, reasons, and actionable hints:

vary lie-detector examples/lie_detector/ -v
LIE DETECTOR (8 tests) — lies: 3  placebos: 3  gaps: 1  ok: 1

LIES (3)

  bad_tests.vary:21  trade price is reasonable
    allows: RELATIONAL
  bad_tests.vary:26  some quantity traded
    allows: RELATIONAL
  bad_tests.vary:38  transfer creates money
    allows: RELATIONAL

PLACEBOS (3)

  bad_tests.vary:5  match executes
    all assertions are constant
  bad_tests.vary:10  match returns list
    all assertions are constant
  bad_tests.vary:14  total is positive
    all assertions are constant

GAPS (1)

  bad_tests.vary:33  transfer is positive
    survives: RELATIONAL (outside observed values)
    hint: add boundary input where operands can be equal

OK (1)

The summary line tells you the shape of the problem. Verbose mode explains what went wrong and what to do about it.

How LIE vs GAP works

The engine captures the left-hand side of every observe A == B or observe A > B comparison at runtime. It then runs proof mutants and compares:

Observed value changed?Test still passes?Verdict
YesYesLIE: the assertion doesn't constrain what it claims to observe
NoYesGAP: the mutation is outside the scope of this test
EitherNoMutant killed (good)

Anatomy of a LIE

The scenario has buys at prices 100 and 60, and sells at 50 and 110. match_orders matches the first buy (100) against the first sell (50), producing Trade(50, 3) (price is the sell price, quantity is the smaller side).

Now look at this test:

test "trade price is reasonable" {
    let trades = match_orders(make_buys(), make_sells())
    observe trades[0].price > 0
}

The trade price is 50. The test checks 50 > 0 and passes. Now a mutation changes the trade to use the buy price instead of the sell price: Trade(b.price, qty) instead of Trade(s.price, qty). The trade price shifts from 50 to 100. The test checks 100 > 0, still passes. The value the test claims to observe changed from 50 to 100, but the assertion was too loose to notice. That is a LIE.

The fix writes what you actually mean:

test "match produces trade at sell price" {
    let trades = match_orders(make_buys(), make_sells())
    observe trades[0].price == 50
}

Now 100 == 50 fails. The mutation is caught.

Anatomy of a GAP

That same good test still has a GAP. A mutation in total_qty (say, changing + to -) survives because total_qty doesn't affect trades[0].price at all. The observed value didn't change, so the test can't be blamed. The fix is not to change this assertion, but to add a separate test that observes total_qty.

How lie detection works (static analysis)

Before running proof mutants, the detector does a fast static pass. It reparses the test file and classifies every assertion call:

ClassificationMeaningExample
CONSTEvery argument is a literal or constant expressionobserve True, observe 1 == 1
NON_CONSTAt least one argument depends on a variable or callobserve trades[0].price == 50

A test with zero assertions or all-CONST assertions is a PLACEBO: no proof mutation needed.

CLI reference

vary lie-detector [options] <target>
FlagEffect
-t, --tests FILESpecify the test file to analyze
-v, --verboseShow detailed per-test breakdown with hints
--jsonOutput results as JSON
--fail-on-liesExit with code 1 if lies or placebos found
--fail-on-gapsExit with code 1 if gaps found (strict mode)
--max-gaps NFail if gap count exceeds budget
--top-gaps NMax gaps to display in verbose mode (default: 20)
--max-mutants NMax proof mutants per test (default: 50)

The lie detector also runs as part of vary mutate (the static PLACEBO check). The standalone command adds proof-based LIE vs GAP classification.

Lie-shape taxonomy

Beyond the four broad categories (PLACEBO, LIE, GAP, OK), the engine classifies each survivor into a named lie-shape that describes the specific weakness pattern:

Lie shapeMeaning
Constant AssertionTest checks a literal/constant value, so it always passes regardless of SUT behaviour
Shallow BooleanTest only checks true/false or null/non-null, never pins actual values
Effect-Only CheckTest checks that a side effect ran but not what it produced
Unused Return ValueFunction return value is not asserted in any test
Exception-Only CheckTest only asserts no crash (or that an exception is thrown), no value pinning
Broad Smoke TestTest exercises flow but has too few assertions to catch mutations
Boundary GapBoundary values are untested, so off-by-one mutations survive
Partial ObservationSome fields/properties are checked but the mutated field is not

Lie shapes appear in --why output and in the lie detector's verbose mode. They help prioritize which survivors to fix first: a Constant Assertion is a test that does nothing, while a Boundary Gap usually needs one extra input to close.

CI integration

Start permissive, then tighten:

# Phase 1: catch the worst offenders
vary lie-detector src/ --fail-on-lies

# Phase 2: set a gap budget
vary lie-detector src/ --fail-on-lies --max-gaps 20

# Phase 3: strict mode
vary lie-detector src/ --fail-on-lies --fail-on-gaps