Most AI coding loops waste time running expensive verifiers too early. The model writes some code, the tool runs the full test suite, half the tests fail for structural reasons, and then everything thrashes trying to fix a noisy failure surface all at once.
There is a simpler policy: run the cheapest command that can actually tell you something, and only escalate when that stage is clean.
The ladder
| Stage | Command | What it answers |
|---|---|---|
| 1 | vary check | Is the code structurally sane enough to keep working on? |
| 2 | vary test | Does the code behave correctly on the cases we wrote down? |
| 3 | vary mutate --quick | Are those tests strong enough to catch realistic faults? |
| 4 | vary validate | Has this change met the final local or CI policy bar? |
How to use it
Start with vary check while the code is still changing shape.
vary check src/
vary check src/ --plan
vary check src/ --fix
If the checker is still finding structural problems, stay there. That feedback is cheaper and more local than test failures. If a rule is unclear, ask the toolchain directly:
vary explain VCI001
When the structure is clean enough, move to behaviour:
vary test tests/
vary test tests/ --only auth::test_login --trace
If tests fail, do not jump ahead. Replay the narrowest failing behaviour and fix that. Only after tests pass should you ask whether the tests themselves are any good.
vary mutate src/foo.vary --quick
This catches confidence theatre. A suite can pass and still be too weak to notice small but realistic faults. Mutation tells you whether "tests pass" actually means anything.
When the change is ready for handoff, run the policy bar:
vary validate . --profile local
vary validate . --profile ci
vary validate is the closeout step, not something you run on every edit.
The point
check narrows shape problems. test checks behaviour. mutate checks test strength. validate applies the final gate. Use the cheapest next verifier that can actually reduce uncertainty, and stop tools from bouncing between vague edits and expensive checks.
Related reading
| Article | Focus |
|---|---|
| From generated code to confidence at scale | The confidence workflow that strengthens generated code before deeper verification |
| Human-readable, AI-written, confidence at scale | The product direction behind confidence-building in AI-assisted Vary |