Alpha. Vary is under active development and not ready for production use. Syntax, APIs, performance, and behaviour may change between releases.

CI integration

VAST runs automatically in the nightly CI workflow and as part of release candidate validation. A deep check (~9,000 programs) runs nightly with 4-path validation, metamorphic testing, mutation expansion, and negative validation. A fast check (~100 programs) can be run locally before PRs. A continuous mode supports long-running exploration.

CI modes

VAST has three execution modes that control how many programs are generated, which profiles run, and what extra checks are performed.

Mode	Flag	Programs	Use case
Explore	`--mode explore`	Configurable	Manual investigation, single profile
Fast	`--mode fast`	~100	Quick smoke test, under 2 minutes
Deep	`--mode deep`	~9,000	Nightly CI, broad coverage across all profiles
Continuous	`--mode continuous`	Time-bounded	Long-running exploration, adaptive profile selection

Explore is the default when no --mode is specified. It runs a single profile with whatever flags you pass. Fast and deep are multi-profile modes that run several profiles in sequence, aggregate results, and print a dashboard. Continuous mode is time-bounded and adaptively selects profiles based on cumulative coverage gaps (see coverage and confidence).

Fast mode

Fast mode is designed as a quick smoke test. It runs four profiles with small counts:

Profile	Programs
core	40
control	20
types	20
complete	20

vary vast --mode fast

This covers straight-line programs (core), control flow with functions and loops (control), core feature expansion including enums, data types, collections, and nullable (types), and the full feature set with match, exceptions, and generics (complete).

Exit code 0 means no mismatches were found. Exit code 1 means at least one disagreement occurred.

Deep mode

Deep mode is the nightly run. It tests all 12 profiles:

Profile	Programs
core	2,000
control	1,000
text through generics	500 each
types	1,000
complete	1,000

vary vast --mode deep --verbose

Deep mode also generates regression artifacts when mismatches have reduced source available. These are written to the directory specified by --regression-dir.

Dashboard

Both fast and deep modes print a dashboard after all profiles complete:

VAST CI Dashboard
------------------------------------------------------------------------------------------------------------------------
Profile        Programs   Passed Mismatches   Feature  Semantic Interaction Confidence  RoundTrip Duration
------------------------------------------------------------------------------------------------------------------------
core                 40       40          0      100%      85%         62%       HIGH         OK     0.2s
control              20       20          0      100%      90%         58%       HIGH         OK     0.7s
types                20       20          0      100%      78%         71%   MODERATE         OK     0.2s
complete             20       20          0      100%      82%         65%       HIGH         OK     0.1s
------------------------------------------------------------------------------------------------------------------------
  core: seed=1773538761264
  control: seed=1773538761264
  types: seed=1773538761264
  complete: seed=1773538761264

Each row shows the profile name, programs executed, programs passed, mismatches found, feature/semantic/interaction coverage percentages, confidence level, round-trip status, and duration. See coverage and confidence for details on the coverage dimensions and confidence scoring.

The seed line below the table shows the exact seed used for each profile, so any failure can be replayed:

vary vast --profile core --seed 1773538761264 --count 1

Seed rotation

Seeds are computed deterministically from the current date and git commit hash using SHA-256(date + commit). This means every day tests different programs, every commit tests different programs, and results are fully reproducible given the same date and commit.

In CI modes (fast and deep), the --rotate-seed flag enables this automatically. The commit hash is auto-detected from the git repository, or can be specified explicitly with --commit-hash:

vary vast --mode fast --rotate-seed
vary vast --mode deep --rotate-seed --commit-hash abc1234

The --rotate-seed flag also works in explore mode, combining the base seed with the current date:

vary vast --profile core --count 100 --rotate-seed

Seeds are always printed in the dashboard and metrics output, so you can replay any run regardless of when it happened.

Feature coverage

VAST tracks which language constructs appear in generated programs. There are 22 tracked features:

Feature	Constructs
Basics	variable declarations, assignments, returns, int/bool/string/float literals
Expressions	binary operators, unary operators, if-expressions, function calls
Control flow	if statements, while loops
Functions	function definitions, generic functions
Types	enum definitions, data definitions, list literals, none literals
Patterns	match statements, try/except, raise

The coverage percentage shows how many of the profile's enabled features were actually generated. A profile that enables enums but never generates one has a coverage gap.

vary vast --profile complete --count 50 --seed 42 --show-coverage

Feature coverage: 22/22 (100%)
  + variable_decl: 342
  + assignment: 87
  + if_stmt: 156
  + while_loop: 23
  ...

In CI modes, coverage is tracked automatically and reported in the dashboard.

Round-trip validation

Round-trip validation checks that VAST-generated programs survive a format-parse cycle. Each program is:

Step	Description
1	Formatted to source text using the Vary formatter
2	Lexed and parsed back into an AST
3	Formatted again
4	Compared structurally against the original

This catches formatter bugs, parser regressions, and AST round-trip failures that would not be visible to the differential test alone.

vary vast --profile core --count 100 --seed 42 --round-trip

In CI modes, round-trip validation runs automatically for every generated program.

Metrics

CI mode writes per-profile metrics in JSONL format to .vast-logs/vast-metrics.jsonl by default:

{"timestamp":1710000000000,"profile":"core","programsExecuted":40,"passed":40,"mismatches":0,"pathFailures":0,"invalidCount":0,"featureCoveragePercent":100.0,"roundTripFailures":0,"durationMs":468,"seed":42,"mode":"fast","optimizerMismatches":0,"jvmUnoptPathMismatches":0,"semanticCoveragePercent":85.0,"confidenceScore":78.0,"interactionCoveragePercent":62.0}

Each line records the profile, counts, feature/semantic/interaction coverage, confidence score, optimizer mismatch counts, duration, seed, and mode. This file is append-only, so it accumulates a history across runs. The mode field distinguishes between fast, deep, and continuous runs.

Override the path with --metrics-file:

vary vast --mode fast --metrics-file /tmp/vast-metrics.jsonl

Regression artifacts

When deep mode finds a mismatch that has a reduced source (from the shrinking pass), it writes regression artifacts to the --regression-dir directory:

File	Contents
`.vary`	Minimized source with a header comment (seed, profile, verdict, date)
`.json`	Machine-readable metadata (seed, profile, verdict, blame, per-path outcomes)

vary vast --mode deep --regression-dir .vast-regressions/

These artifacts are useful for tracking known bugs and verifying fixes.

Nightly CI

The nightly GitHub Actions workflow runs VAST across 13 lanes organized into four parallel jobs. Each lane has a classification (regression, discovery, corpus growth) and a policy (blocker, warning, info) that determines whether failures block the nightly result.

Lane model

Lane	Type	Policy	What it does
`vast-deep-differential`	Regression	Blocker	~9,000 programs, 4-path validation, seed rotation, reduction
`vast-metamorphic`	Regression	Blocker	Metamorphic + round-trip + coverage validation
`vast-coverage`	Regression	Blocker	Semantic + interaction coverage reporting
`vast-negative`	Regression	Blocker	Sabotage-based negative validation probes
`vast-mutation`	Discovery	Blocker	Mutation expansion testing
`vast-ir-check`	Discovery	Blocker	IR translation equivalence + pass verification
`vast-stateful`	Discovery	Warning	Stateful program generation; search slices rotate
`vast-aliasing`	Discovery	Warning	Heap aliasing verification; search slices rotate
`vast-exception`	Discovery	Warning	Exception propagation verification
`vast-concurrency`	Discovery	Warning	Concurrency semantics (scheduler simulation)
`vast-symbolic`	Discovery	Warning	Symbolic input guidance
`vast-large-programs`	Discovery	Warning	Large program (500-10000 AST nodes) stress testing
`vast-continuous`	Corpus Growth	Info	Time-bounded adaptive exploration

Policy enforcement: blocker lane failures fail the nightly job. Warning lane failures are reported but don't block. Info lanes are informational only.

Search slice rotation: discovery lanes rotate their search parameters (program counts, stress modes, profile weights) across nights via deterministic hash-based selection. The lanes themselves always run; only the parameters vary.

Lane tagging

Every VAST invocation in the nightly workflow passes --lane <name> so metrics are tagged per-lane:

./bin/vary vast --mode explore --profile stateful --stateful --lane vast-stateful

Acceptance verification

The nightly growth job verifies the escaped-bug acceptance set and generates a closeout report:

./bin/vary vast --check-acceptance --corpus-dir tests/vast/corpus
./bin/vary vast --closeout-report --corpus-dir tests/vast/corpus

The acceptance set contains 6 real escaped bugs. Coverage is determined by executing programs through the differential pipeline with exact AST-level trigger predicates, not by pattern matching on source text. Evidence tiers:

Tier	Label	Meaning
Strongest	`GENERATOR_REDISCOVERED`	Generated programs matched exact bug-class trigger predicate + executed through pipeline
Strong	`REPLAY_VERIFIED`	Exact reproducer executed through pipeline successfully
Weak	`GENERATOR_EXECUTED`	Profile executed through pipeline but no trigger match
None	`NOT_VERIFIED`	No execution evidence

Only GENERATOR_REDISCOVERED counts toward the closeout bar.

Mode discipline

Specialized generator flags (--stateful, --aliasing, --ir-check, etc.) require --mode explore. Using --mode deep silently ignores these flags because deep mode runs its own fixed profile list. The nightly workflow uses --mode explore for all specialized lanes.

CLI reference

See CLI reference for the complete flag list covering core options, CI modes, specialized generators, acceptance/closeout, lane tagging, and coverage reporting.

Value pools VAST vs mutate