Alpha. Vary is under active development and not ready for production use. Syntax, APIs, performance, and behaviour may change between releases.

CI integration

VAST runs automatically in the nightly CI workflow and as part of release candidate validation. A deep check (~9,000 programs) runs nightly with 4-path validation, metamorphic testing, mutation expansion, and negative validation. A fast check (~100 programs) can be run locally before PRs. A continuous mode supports long-running exploration.

CI modes

VAST has three execution modes that control how many programs are generated, which profiles run, and what extra checks are performed.

ModeFlagProgramsUse case
Explore--mode exploreConfigurableManual investigation, single profile
Fast--mode fast~100Quick smoke test, under 2 minutes
Deep--mode deep~9,000Nightly CI, broad coverage across all profiles
Continuous--mode continuousTime-boundedLong-running exploration, adaptive profile selection

Explore is the default when no --mode is specified. It runs a single profile with whatever flags you pass. Fast and deep are multi-profile modes that run several profiles in sequence, aggregate results, and print a dashboard. Continuous mode is time-bounded and adaptively selects profiles based on cumulative coverage gaps (see coverage and confidence).

Fast mode

Fast mode is designed as a quick smoke test. It runs four profiles with small counts:

ProfilePrograms
core40
control20
types20
complete20
vary vast --mode fast

This covers straight-line programs (core), control flow with functions and loops (control), core feature expansion including enums, data types, collections, and nullable (types), and the full feature set with match, exceptions, and generics (complete).

Exit code 0 means no mismatches were found. Exit code 1 means at least one disagreement occurred.

Deep mode

Deep mode is the nightly run. It tests all 12 profiles:

ProfilePrograms
core2,000
control1,000
text through generics500 each
types1,000
complete1,000
vary vast --mode deep --verbose

Deep mode also generates regression artifacts when mismatches have reduced source available. These are written to the directory specified by --regression-dir.

Dashboard

Both fast and deep modes print a dashboard after all profiles complete:

VAST CI Dashboard
------------------------------------------------------------------------------------------------------------------------
Profile        Programs   Passed Mismatches   Feature  Semantic Interaction Confidence  RoundTrip Duration
------------------------------------------------------------------------------------------------------------------------
core                 40       40          0      100%      85%         62%       HIGH         OK     0.2s
control              20       20          0      100%      90%         58%       HIGH         OK     0.7s
types                20       20          0      100%      78%         71%   MODERATE         OK     0.2s
complete             20       20          0      100%      82%         65%       HIGH         OK     0.1s
------------------------------------------------------------------------------------------------------------------------
  core: seed=1773538761264
  control: seed=1773538761264
  types: seed=1773538761264
  complete: seed=1773538761264

Each row shows the profile name, programs executed, programs passed, mismatches found, feature/semantic/interaction coverage percentages, confidence level, round-trip status, and duration. See coverage and confidence for details on the coverage dimensions and confidence scoring.

The seed line below the table shows the exact seed used for each profile, so any failure can be replayed:

vary vast --profile core --seed 1773538761264 --count 1

Seed rotation

Seeds are computed deterministically from the current date and git commit hash using SHA-256(date + commit). This means every day tests different programs, every commit tests different programs, and results are fully reproducible given the same date and commit.

In CI modes (fast and deep), the --rotate-seed flag enables this automatically. The commit hash is auto-detected from the git repository, or can be specified explicitly with --commit-hash:

vary vast --mode fast --rotate-seed
vary vast --mode deep --rotate-seed --commit-hash abc1234

The --rotate-seed flag also works in explore mode, combining the base seed with the current date:

vary vast --profile core --count 100 --rotate-seed

Seeds are always printed in the dashboard and metrics output, so you can replay any run regardless of when it happened.

Feature coverage

VAST tracks which language constructs appear in generated programs. There are 22 tracked features:

FeatureConstructs
Basicsvariable declarations, assignments, returns, int/bool/string/float literals
Expressionsbinary operators, unary operators, if-expressions, function calls
Control flowif statements, while loops
Functionsfunction definitions, generic functions
Typesenum definitions, data definitions, list literals, none literals
Patternsmatch statements, try/except, raise

The coverage percentage shows how many of the profile's enabled features were actually generated. A profile that enables enums but never generates one has a coverage gap.

vary vast --profile complete --count 50 --seed 42 --show-coverage
Feature coverage: 22/22 (100%)
  + variable_decl: 342
  + assignment: 87
  + if_stmt: 156
  + while_loop: 23
  ...

In CI modes, coverage is tracked automatically and reported in the dashboard.

Round-trip validation

Round-trip validation checks that VAST-generated programs survive a format-parse cycle. Each program is:

StepDescription
1Formatted to source text using the Vary formatter
2Lexed and parsed back into an AST
3Formatted again
4Compared structurally against the original

This catches formatter bugs, parser regressions, and AST round-trip failures that would not be visible to the differential test alone.

vary vast --profile core --count 100 --seed 42 --round-trip

In CI modes, round-trip validation runs automatically for every generated program.

Metrics

CI mode writes per-profile metrics in JSONL format to .vast-logs/vast-metrics.jsonl by default:

{"timestamp":1710000000000,"profile":"core","programsExecuted":40,"passed":40,"mismatches":0,"pathFailures":0,"invalidCount":0,"featureCoveragePercent":100.0,"roundTripFailures":0,"durationMs":468,"seed":42,"mode":"fast","optimizerMismatches":0,"jvmUnoptPathMismatches":0,"semanticCoveragePercent":85.0,"confidenceScore":78.0,"interactionCoveragePercent":62.0}

Each line records the profile, counts, feature/semantic/interaction coverage, confidence score, optimizer mismatch counts, duration, seed, and mode. This file is append-only, so it accumulates a history across runs. The mode field distinguishes between fast, deep, and continuous runs.

Override the path with --metrics-file:

vary vast --mode fast --metrics-file /tmp/vast-metrics.jsonl

Regression artifacts

When deep mode finds a mismatch that has a reduced source (from the shrinking pass), it writes regression artifacts to the --regression-dir directory:

FileContents
.varyMinimized source with a header comment (seed, profile, verdict, date)
.jsonMachine-readable metadata (seed, profile, verdict, blame, per-path outcomes)
vary vast --mode deep --regression-dir .vast-regressions/

These artifacts are useful for tracking known bugs and verifying fixes.

Nightly CI

The nightly GitHub Actions workflow runs VAST across 13 lanes organized into four parallel jobs. Each lane has a classification (regression, discovery, corpus growth) and a policy (blocker, warning, info) that determines whether failures block the nightly result.

Lane model

LaneTypePolicyWhat it does
vast-deep-differentialRegressionBlocker~9,000 programs, 4-path validation, seed rotation, reduction
vast-metamorphicRegressionBlockerMetamorphic + round-trip + coverage validation
vast-coverageRegressionBlockerSemantic + interaction coverage reporting
vast-negativeRegressionBlockerSabotage-based negative validation probes
vast-mutationDiscoveryBlockerMutation expansion testing
vast-ir-checkDiscoveryBlockerIR translation equivalence + pass verification
vast-statefulDiscoveryWarningStateful program generation; search slices rotate
vast-aliasingDiscoveryWarningHeap aliasing verification; search slices rotate
vast-exceptionDiscoveryWarningException propagation verification
vast-concurrencyDiscoveryWarningConcurrency semantics (scheduler simulation)
vast-symbolicDiscoveryWarningSymbolic input guidance
vast-large-programsDiscoveryWarningLarge program (500-10000 AST nodes) stress testing
vast-continuousCorpus GrowthInfoTime-bounded adaptive exploration

Policy enforcement: blocker lane failures fail the nightly job. Warning lane failures are reported but don't block. Info lanes are informational only.

Search slice rotation: discovery lanes rotate their search parameters (program counts, stress modes, profile weights) across nights via deterministic hash-based selection. The lanes themselves always run; only the parameters vary.

Lane tagging

Every VAST invocation in the nightly workflow passes --lane <name> so metrics are tagged per-lane:

./bin/vary vast --mode explore --profile stateful --stateful --lane vast-stateful

Acceptance verification

The nightly growth job verifies the escaped-bug acceptance set and generates a closeout report:

./bin/vary vast --check-acceptance --corpus-dir tests/vast/corpus
./bin/vary vast --closeout-report --corpus-dir tests/vast/corpus

The acceptance set contains 6 real escaped bugs. Coverage is determined by executing programs through the differential pipeline with exact AST-level trigger predicates, not by pattern matching on source text. Evidence tiers:

TierLabelMeaning
StrongestGENERATOR_REDISCOVEREDGenerated programs matched exact bug-class trigger predicate + executed through pipeline
StrongREPLAY_VERIFIEDExact reproducer executed through pipeline successfully
WeakGENERATOR_EXECUTEDProfile executed through pipeline but no trigger match
NoneNOT_VERIFIEDNo execution evidence

Only GENERATOR_REDISCOVERED counts toward the closeout bar.

Mode discipline

Specialized generator flags (--stateful, --aliasing, --ir-check, etc.) require --mode explore. Using --mode deep silently ignores these flags because deep mode runs its own fixed profile list. The nightly workflow uses --mode explore for all specialized lanes.

CLI reference

See CLI reference for the complete flag list covering core options, CI modes, specialized generators, acceptance/closeout, lane tagging, and coverage reporting.