Early exploration. Vary and mutation testing are both early in their development. These guidelines reflect what we have learned so far and will evolve as the language and tooling mature.
Varyonic programming defines the architectural style of Vary: observable behaviour, typed domains, pure logic, decisions separated from effects, explicit errors, and contracts. This guide shows how to apply that style so mutation testing can observe, challenge, and verify program behaviour.
For an introduction to mutation testing, see Introduction. For the step-by-step workflow, see Golden path.
Mutation testing rewards observability. Survivors appear when behaviour changes but tests do not notice.
Weak:
import fs
def check_site(pages: List[Str]) -> Bool {
for p in pages {
if not fs.exists(fs.path(p)) {
return False
}
}
return True
}
A boolean hides what was checked, what failed, what paths were visited, and what was skipped.
Strong:
import fs
data CheckReport {
success: Bool
issues: List[Str]
checked_files: List[Str]
broken_links: List[Str]
warnings: List[Str]
}
def check_site(pages: List[Str]) -> CheckReport {
mut issues: List[Str] = []
mut broken: List[Str] = []
for p in pages {
if not fs.exists(fs.path(p)) {
issues = issues + ["missing: " + p]
broken = broken + [p]
}
}
return CheckReport(len(issues) == 0, issues, pages, broken, [])
}
Now tests can observe report.success, report.issues, report.checked_files, report.broken_links, and report.warnings. The more structure you return, the more mutation surfaces become visible.
Prefer returning records, data types, lists of issues, or explicit result objects. Avoid returning only Bool, Int, None, or implicit success via logging.
observe is strongest when used as an oracle, not just as a prettier assert.
Weak:
observe result == true
Better:
observe report.success == true
observe report.issues.len() == 0
observe report.checked_files.len() == 5
observe report.broken_links == []
Best:
observe report.error_count == report.issues.len()
observe sitemap_urls.len() == built_pages.len()
These are semantic observations, not end-value checks. The goal is to show that behaviour remained correct under mutation.
Many mutation survivors come from partial assertions.
Weak:
observe plan is not None
Strong:
test "plan has correct fields" {
let plan = plan_deploy("prod", "/var/www/site")
observe plan.strategy == DeployStrategy.Rsync
observe plan.source_dir == "build/"
observe plan.destination == "/var/www/site"
}
Testing that construction succeeded is not enough. Testing that the constructed value has the right contents is what kills mutants.
A system that is only tested on valid input is weak, no matter how many tests it has.
Good negative-path areas:
| Area | Example |
|---|---|
| Missing files | Required input absent |
| Empty lists | Zero-length collection passed to aggregation |
| Invalid enum cases at boundaries | Unknown or out-of-range variant |
| Duplicate names | Conflicting keys or identifiers |
| Malformed config | Bad TOML, missing required fields |
| Broken links | Reference to non-existent target |
| Absent optional fields | None where a value is expected |
| Bad path normalization cases | Trailing slashes, .. segments, mixed separators |
For every important function, test at least: normal success, obvious invalid input, an edge case, a boundary condition, and "almost valid" input.
Mutation testing often survives in exactly the paths nobody bothered to assert.
A lot of correctness is relational. These make excellent observe targets:
observe report.error_count == report.issues.len()
observe sitemap_urls.len() == built_pages.len()
Whenever two outputs should agree, write an observation for that agreement.
Vary gets stronger as you extract pure helpers with clear meanings. Instead of one long function, extract:
| Helper | Responsibility |
|---|---|
resolve_source_files | Discover and filter input paths |
classify_change | Categorize a diff into a semantic change kind |
normalize_link | Canonicalize URL paths |
derive_output_path | Map source path to build output path |
validate_config | Check config fields and return structured errors |
Extracted helpers are simpler to test, contract, and mutate individually. A dense pure helper is usually better than repeating logic inside an effectful orchestrator.
Unit tests are necessary, but scenario tests often kill mutants that isolated tests miss.
A scenario test might: create a temp site, write config, add pages, run build, run check, and inspect reports and output files. This observes interactions between modules.
Use focused tests for pure helpers and end-to-end scenario tests for flow integrity. Vary benefits from both layers.
A survivor is rarely "just a missing test." It often means:
| Signal | What to fix |
|---|---|
| Behaviour hidden behind a weak return type | Return structured data instead of None or Bool |
| Data is under-structured | Use data classes or enums instead of raw strings |
| Logic and effects are mixed | Separate pure planning from effectful execution |
| Assertions are too shallow | Assert on return values, not just absence of errors |
| The wrong boundary is being tested | Move tests closer to the pure logic |
When a mutant survives, ask: should this behaviour have been observable? Is the API too weak? Should this function return structured data? Is this effect hiding core logic?
Mutation testing improves architecture, not only test coverage.
For each important module, ask:
| Question | If no |
|---|---|
| Are return values structured and inspectable? | Replace Bool/Int returns with data types |
| Do tests assert exact fields, not just success? | Pin each important field |
| Are negative paths covered? | Add invalid-input and boundary tests |
| Are relationships between outputs observed? | Add cross-field observations |
| Are orchestration functions returning reports? | Move exit() to the CLI layer |
| Do mutation survivors point to weak architecture? | Refactor, then re-test |
The goal is not more code. It is more visible correctness. Vary gives you type structure, semantic contracts, observation-based testing, and mutation pressure. When those pieces reinforce each other, mutation testing becomes a design discipline rather than a quality metric.