Alpha. Vary is under active development and not ready for production use. Syntax, APIs, performance, and behaviour may change between releases.
CI integration
VAST runs automatically in the nightly CI workflow and as part of release candidate validation. A deep check (~9,000 programs) runs nightly with 4-path validation, metamorphic testing, mutation expansion, and negative validation. A fast check (~100 programs) can be run locally before PRs. A continuous mode supports long-running exploration.
CI modes
VAST has three execution modes that control how many programs are generated, which profiles run, and what extra checks are performed.
| Mode | Flag | Programs | Use case |
|---|---|---|---|
| Explore | --mode explore | Configurable | Manual investigation, single profile |
| Fast | --mode fast | ~100 | Quick smoke test, under 2 minutes |
| Deep | --mode deep | ~9,000 | Nightly CI, broad coverage across all profiles |
| Continuous | --mode continuous | Time-bounded | Long-running exploration, adaptive profile selection |
Explore is the default when no --mode is specified. It runs a single profile with whatever flags you pass. Fast and deep are multi-profile modes that run several profiles in sequence, aggregate results, and print a dashboard. Continuous mode is time-bounded and adaptively selects profiles based on cumulative coverage gaps (see coverage and confidence).
Fast mode
Fast mode is designed as a quick smoke test. It runs four profiles with small counts:
| Profile | Programs |
|---|---|
| core | 40 |
| control | 20 |
| types | 20 |
| complete | 20 |
vary vast --mode fast
This covers straight-line programs (core), control flow with functions and loops (control), core feature expansion including enums, data types, collections, and nullable (types), and the full feature set with match, exceptions, and generics (complete).
Exit code 0 means no mismatches were found. Exit code 1 means at least one disagreement occurred.
Deep mode
Deep mode is the nightly run. It tests all 12 profiles:
| Profile | Programs |
|---|---|
| core | 2,000 |
| control | 1,000 |
| text through generics | 500 each |
| types | 1,000 |
| complete | 1,000 |
vary vast --mode deep --verbose
Deep mode also generates regression artifacts when mismatches have reduced source available. These are written to the directory specified by --regression-dir.
Dashboard
Both fast and deep modes print a dashboard after all profiles complete:
VAST CI Dashboard
------------------------------------------------------------------------------------------------------------------------
Profile Programs Passed Mismatches Feature Semantic Interaction Confidence RoundTrip Duration
------------------------------------------------------------------------------------------------------------------------
core 40 40 0 100% 85% 62% HIGH OK 0.2s
control 20 20 0 100% 90% 58% HIGH OK 0.7s
types 20 20 0 100% 78% 71% MODERATE OK 0.2s
complete 20 20 0 100% 82% 65% HIGH OK 0.1s
------------------------------------------------------------------------------------------------------------------------
core: seed=1773538761264
control: seed=1773538761264
types: seed=1773538761264
complete: seed=1773538761264
Each row shows the profile name, programs executed, programs passed, mismatches found, feature/semantic/interaction coverage percentages, confidence level, round-trip status, and duration. See coverage and confidence for details on the coverage dimensions and confidence scoring.
The seed line below the table shows the exact seed used for each profile, so any failure can be replayed:
vary vast --profile core --seed 1773538761264 --count 1
Seed rotation
Seeds are computed deterministically from the current date and git commit hash using SHA-256(date + commit). This means every day tests different programs, every commit tests different programs, and results are fully reproducible given the same date and commit.
In CI modes (fast and deep), the --rotate-seed flag enables this automatically. The commit hash is auto-detected from the git repository, or can be specified explicitly with --commit-hash:
vary vast --mode fast --rotate-seed
vary vast --mode deep --rotate-seed --commit-hash abc1234
The --rotate-seed flag also works in explore mode, combining the base seed with the current date:
vary vast --profile core --count 100 --rotate-seed
Seeds are always printed in the dashboard and metrics output, so you can replay any run regardless of when it happened.
Feature coverage
VAST tracks which language constructs appear in generated programs. There are 22 tracked features:
| Feature | Constructs |
|---|---|
| Basics | variable declarations, assignments, returns, int/bool/string/float literals |
| Expressions | binary operators, unary operators, if-expressions, function calls |
| Control flow | if statements, while loops |
| Functions | function definitions, generic functions |
| Types | enum definitions, data definitions, list literals, none literals |
| Patterns | match statements, try/except, raise |
The coverage percentage shows how many of the profile's enabled features were actually generated. A profile that enables enums but never generates one has a coverage gap.
vary vast --profile complete --count 50 --seed 42 --show-coverage
Feature coverage: 22/22 (100%)
+ variable_decl: 342
+ assignment: 87
+ if_stmt: 156
+ while_loop: 23
...
In CI modes, coverage is tracked automatically and reported in the dashboard.
Round-trip validation
Round-trip validation checks that VAST-generated programs survive a format-parse cycle. Each program is:
| Step | Description |
|---|---|
| 1 | Formatted to source text using the Vary formatter |
| 2 | Lexed and parsed back into an AST |
| 3 | Formatted again |
| 4 | Compared structurally against the original |
This catches formatter bugs, parser regressions, and AST round-trip failures that would not be visible to the differential test alone.
vary vast --profile core --count 100 --seed 42 --round-trip
In CI modes, round-trip validation runs automatically for every generated program.
Metrics
CI mode writes per-profile metrics in JSONL format to .vast-logs/vast-metrics.jsonl by default:
{"timestamp":1710000000000,"profile":"core","programsExecuted":40,"passed":40,"mismatches":0,"pathFailures":0,"invalidCount":0,"featureCoveragePercent":100.0,"roundTripFailures":0,"durationMs":468,"seed":42,"mode":"fast","optimizerMismatches":0,"jvmUnoptPathMismatches":0,"semanticCoveragePercent":85.0,"confidenceScore":78.0,"interactionCoveragePercent":62.0}
Each line records the profile, counts, feature/semantic/interaction coverage, confidence score, optimizer mismatch counts, duration, seed, and mode. This file is append-only, so it accumulates a history across runs. The mode field distinguishes between fast, deep, and continuous runs.
Override the path with --metrics-file:
vary vast --mode fast --metrics-file /tmp/vast-metrics.jsonl
Regression artifacts
When deep mode finds a mismatch that has a reduced source (from the shrinking pass), it writes regression artifacts to the --regression-dir directory:
| File | Contents |
|---|---|
.vary | Minimized source with a header comment (seed, profile, verdict, date) |
.json | Machine-readable metadata (seed, profile, verdict, blame, per-path outcomes) |
vary vast --mode deep --regression-dir .vast-regressions/
These artifacts are useful for tracking known bugs and verifying fixes.
Nightly CI
The nightly GitHub Actions workflow runs VAST across 13 lanes organized into four parallel jobs. Each lane has a classification (regression, discovery, corpus growth) and a policy (blocker, warning, info) that determines whether failures block the nightly result.
Lane model
| Lane | Type | Policy | What it does |
|---|---|---|---|
vast-deep-differential | Regression | Blocker | ~9,000 programs, 4-path validation, seed rotation, reduction |
vast-metamorphic | Regression | Blocker | Metamorphic + round-trip + coverage validation |
vast-coverage | Regression | Blocker | Semantic + interaction coverage reporting |
vast-negative | Regression | Blocker | Sabotage-based negative validation probes |
vast-mutation | Discovery | Blocker | Mutation expansion testing |
vast-ir-check | Discovery | Blocker | IR translation equivalence + pass verification |
vast-stateful | Discovery | Warning | Stateful program generation; search slices rotate |
vast-aliasing | Discovery | Warning | Heap aliasing verification; search slices rotate |
vast-exception | Discovery | Warning | Exception propagation verification |
vast-concurrency | Discovery | Warning | Concurrency semantics (scheduler simulation) |
vast-symbolic | Discovery | Warning | Symbolic input guidance |
vast-large-programs | Discovery | Warning | Large program (500-10000 AST nodes) stress testing |
vast-continuous | Corpus Growth | Info | Time-bounded adaptive exploration |
Policy enforcement: blocker lane failures fail the nightly job. Warning lane failures are reported but don't block. Info lanes are informational only.
Search slice rotation: discovery lanes rotate their search parameters (program counts, stress modes, profile weights) across nights via deterministic hash-based selection. The lanes themselves always run; only the parameters vary.
Lane tagging
Every VAST invocation in the nightly workflow passes --lane <name> so metrics are tagged per-lane:
./bin/vary vast --mode explore --profile stateful --stateful --lane vast-stateful
Acceptance verification
The nightly growth job verifies the escaped-bug acceptance set and generates a closeout report:
./bin/vary vast --check-acceptance --corpus-dir tests/vast/corpus
./bin/vary vast --closeout-report --corpus-dir tests/vast/corpus
The acceptance set contains 6 real escaped bugs. Coverage is determined by executing programs through the differential pipeline with exact AST-level trigger predicates, not by pattern matching on source text. Evidence tiers:
| Tier | Label | Meaning |
|---|---|---|
| Strongest | GENERATOR_REDISCOVERED | Generated programs matched exact bug-class trigger predicate + executed through pipeline |
| Strong | REPLAY_VERIFIED | Exact reproducer executed through pipeline successfully |
| Weak | GENERATOR_EXECUTED | Profile executed through pipeline but no trigger match |
| None | NOT_VERIFIED | No execution evidence |
Only GENERATOR_REDISCOVERED counts toward the closeout bar.
Mode discipline
Specialized generator flags (--stateful, --aliasing, --ir-check, etc.) require --mode explore. Using --mode deep silently ignores these flags because deep mode runs its own fixed profile list. The nightly workflow uses --mode explore for all specialized lanes.
CLI reference
See CLI reference for the complete flag list covering core options, CI modes, specialized generators, acceptance/closeout, lane tagging, and coverage reporting.