The verification ladder for AI coding tools

tl;dr: Use `vary check` while drafting, `vary test` when the structure is clean, `vary mutate --quick` when passing tests may still be weak, and `vary validate` only when you need a final local or CI bar.

Most AI coding loops waste time running expensive verifiers too early. The model writes some code, the tool runs the full test suite, half the tests fail for structural reasons, and then everything thrashes trying to fix a noisy failure surface all at once.

There is a simpler policy: run the cheapest command that can actually tell you something, and only escalate when that stage is clean.

The ladder

Stage	Command	What it answers
1	`vary check`	Is the code structurally sane enough to keep working on?
2	`vary test`	Does the code behave correctly on the cases we wrote down?
3	`vary mutate --quick`	Are those tests strong enough to catch realistic faults?
4	`vary validate`	Has this change met the final local or CI policy bar?

How to use it

Start with vary check while the code is still changing shape.

vary check src/
vary check src/ --plan
vary check src/ --fix

If the checker is still finding structural problems, stay there. That feedback is cheaper and more local than test failures. If a rule is unclear, ask the toolchain directly:

vary explain VCI001

When the structure is clean enough, move to behaviour:

vary test tests/
vary test tests/ --only auth::test_login --trace

If tests fail, do not jump ahead. Replay the narrowest failing behaviour and fix that. Only after tests pass should you ask whether the tests themselves are any good.

vary mutate src/foo.vary --quick

This catches confidence theatre. A suite can pass and still be too weak to notice small but realistic faults. Mutation tells you whether "tests pass" actually means anything.

When the change is ready for handoff, run the policy bar:

vary validate . --profile local
vary validate . --profile ci

vary validate is the closeout step, not something you run on every edit.

The point

check narrows shape problems. test checks behaviour. mutate checks test strength. validate applies the final gate. Use the cheapest next verifier that can actually reduce uncertainty, and stop tools from bouncing between vague edits and expensive checks.

Article	Focus
From generated code to confidence at scale	The confidence workflow that strengthens generated code before deeper verification
Human-readable, AI-written, confidence at scale	The product direction behind confidence-building in AI-assisted Vary

The ladder

How to use it

The point

Related reading