Alpha. Vary is under active development and not ready for production use. Syntax, APIs, performance, and behaviour may change between releases.

Designing for mutation

Early exploration. Vary and mutation testing are both early in their development. These guidelines reflect what we have learned so far and will evolve as the language and tooling mature.

Varyonic programming defines the architectural style of Vary: observable behaviour, typed domains, pure logic, decisions separated from effects, explicit errors, and contracts. This guide shows how to apply that style so mutation testing can observe, challenge, and verify program behaviour.

For an introduction to mutation testing, see Introduction. For the step-by-step workflow, see Golden path.

Design for observation

Mutation testing rewards observability. Survivors appear when behaviour changes but tests do not notice.

Weak:

import fs

def check_site(pages: List[Str]) -> Bool {
    for p in pages {
        if not fs.exists(fs.path(p)) {
            return False
        }
    }
    return True
}

A boolean hides what was checked, what failed, what paths were visited, and what was skipped.

Strong:

import fs

data CheckReport {
    success: Bool
    issues: List[Str]
    checked_files: List[Str]
    broken_links: List[Str]
    warnings: List[Str]
}

def check_site(pages: List[Str]) -> CheckReport {
    mut issues: List[Str] = []
    mut broken: List[Str] = []
    for p in pages {
        if not fs.exists(fs.path(p)) {
            issues = issues + ["missing: " + p]
            broken = broken + [p]
        }
    }
    return CheckReport(len(issues) == 0, issues, pages, broken, [])
}

Now tests can observe report.success, report.issues, report.checked_files, report.broken_links, and report.warnings. The more structure you return, the more mutation surfaces become visible.

Prefer returning records, data types, lists of issues, or explicit result objects. Avoid returning only Bool, Int, None, or implicit success via logging.

Use observe for semantics, not just values

observe is strongest when used as an oracle, not just as a prettier assert.

Weak:

observe result == true

Better:

observe report.success == true
observe report.issues.len() == 0
observe report.checked_files.len() == 5
observe report.broken_links == []

Best:

observe report.error_count == report.issues.len()
observe sitemap_urls.len() == built_pages.len()

These are semantic observations, not end-value checks. The goal is to show that behaviour remained correct under mutation.

Assert every important field

Many mutation survivors come from partial assertions.

Weak:

observe plan is not None

Strong:

test "plan has correct fields" {
    let plan = plan_deploy("prod", "/var/www/site")
    observe plan.strategy == DeployStrategy.Rsync
    observe plan.source_dir == "build/"
    observe plan.destination == "/var/www/site"
}

Testing that construction succeeded is not enough. Testing that the constructed value has the right contents is what kills mutants.

Test negative paths

A system that is only tested on valid input is weak, no matter how many tests it has.

Good negative-path areas:

AreaExample
Missing filesRequired input absent
Empty listsZero-length collection passed to aggregation
Invalid enum cases at boundariesUnknown or out-of-range variant
Duplicate namesConflicting keys or identifiers
Malformed configBad TOML, missing required fields
Broken linksReference to non-existent target
Absent optional fieldsNone where a value is expected
Bad path normalization casesTrailing slashes, .. segments, mixed separators

For every important function, test at least: normal success, obvious invalid input, an edge case, a boundary condition, and "almost valid" input.

Mutation testing often survives in exactly the paths nobody bothered to assert.

Turn implicit relationships into explicit checks

A lot of correctness is relational. These make excellent observe targets:

observe report.error_count == report.issues.len()
observe sitemap_urls.len() == built_pages.len()

Whenever two outputs should agree, write an observation for that agreement.

Extract small pure helpers

Vary gets stronger as you extract pure helpers with clear meanings. Instead of one long function, extract:

HelperResponsibility
resolve_source_filesDiscover and filter input paths
classify_changeCategorize a diff into a semantic change kind
normalize_linkCanonicalize URL paths
derive_output_pathMap source path to build output path
validate_configCheck config fields and return structured errors

Extracted helpers are simpler to test, contract, and mutate individually. A dense pure helper is usually better than repeating logic inside an effectful orchestrator.

Use scenario tests alongside unit tests

Unit tests are necessary, but scenario tests often kill mutants that isolated tests miss.

A scenario test might: create a temp site, write config, add pages, run build, run check, and inspect reports and output files. This observes interactions between modules.

Use focused tests for pure helpers and end-to-end scenario tests for flow integrity. Vary benefits from both layers.

Treat survivors as design feedback

A survivor is rarely "just a missing test." It often means:

SignalWhat to fix
Behaviour hidden behind a weak return typeReturn structured data instead of None or Bool
Data is under-structuredUse data classes or enums instead of raw strings
Logic and effects are mixedSeparate pure planning from effectful execution
Assertions are too shallowAssert on return values, not just absence of errors
The wrong boundary is being testedMove tests closer to the pure logic

When a mutant survives, ask: should this behaviour have been observable? Is the API too weak? Should this function return structured data? Is this effect hiding core logic?

Mutation testing improves architecture, not only test coverage.

Checklist

For each important module, ask:

QuestionIf no
Are return values structured and inspectable?Replace Bool/Int returns with data types
Do tests assert exact fields, not just success?Pin each important field
Are negative paths covered?Add invalid-input and boundary tests
Are relationships between outputs observed?Add cross-field observations
Are orchestration functions returning reports?Move exit() to the CLI layer
Do mutation survivors point to weak architecture?Refactor, then re-test

The core idea

The goal is not more code. It is more visible correctness. Vary gives you type structure, semantic contracts, observation-based testing, and mutation pressure. When those pieces reinforce each other, mutation testing becomes a design discipline rather than a quality metric.