
> VAST measures what it has tested, not just whether anything failed. Coverage tracking and confidence scoring tell you how thoroughly the compiler has been exercised.

## Three coverage dimensions

VAST tracks coverage at three levels.

### Feature coverage

Tracks which of 22 language constructs appear in generated programs: variable declarations, assignments, binary/unary operators, if statements, while loops, function definitions, literals (int, bool, string, float, none), enum/data definitions, list literals, match statements, try/except, raise, and generic functions.

```bash
vary vast --profile complete --count 50 --seed 42 --show-coverage
```

```text
Feature coverage: 22/22 (100%)
  + variable_decl: 342
  + assignment: 87
  + if_stmt: 156
  ...
```

This tells you whether the generator actually used all the constructs the profile enables.

### Semantic coverage

Tracks 27 semantic properties across six categories. These measure behaviors that actually trigger compiler bugs, not just whether a construct appeared.

| Category | Properties |
|----------|-----------|
| Value ranges | zero, negative, large, boundary values |
| Control flow | branch taken/not taken, loop zero/multi iteration, nested control, early return |
| Type interactions | mixed type expressions, nullable access, collection index, enum dispatch, data field access |
| Error paths | division risk, exception thrown/caught, overflow risk |
| Expression complexity | nested binary ops, chained calls, conditional expressions, multi-arg calls |
| Stress testing | identity ops, complementary ops, deep arithmetic chains, float precision |

Each profile defines which properties are enabled. The semantic tracker walks each generated AST and records which properties occur.

```bash
vary vast --profile types --count 100 --seed 42 --semantic-coverage
```

```text
Semantic coverage: 18/20 (90%)
  Value ranges:
    + zero_value: 42
    + negative_value: 31
    + large_value: 8
    + boundary_value: 3
  Control flow:
    + branch_taken: 89
    + branch_not_taken: 67
    - loop_zero_iter: 0
    + loop_multi_iter: 23
  ...
```

The `+` and `-` marks show hit/miss.

### Interaction coverage

Tracks pairwise co-occurrence of language features. Most compiler bugs appear in feature combinations: generics + collections, nullable + pattern matching, exceptions + loops.

```bash
vary vast --profile complete --count 100 --seed 42 --interaction-coverage
```

This catches gaps where individual features work but their combination does not.

## Confidence scoring

The confidence report rolls the three coverage dimensions, program volume, and mismatch rate into one score.

```bash
vary vast --profile complete --count 200 --seed 42 --confidence
```

```text
Confidence: HIGH (78%)
  Feature coverage:     100%
  Semantic coverage:    85%
  Interaction coverage: 62%
  Mismatch rate:        0.00%
  Programs tested:      200
  Gaps:
    - Low interaction coverage (62%) — run with richer profiles
```

### Scoring formula

| Component | Weight | What it measures |
|-----------|--------|-----------------|
| Feature coverage | 20% | Construct diversity |
| Semantic coverage | 25% | Behavioral diversity |
| Interaction coverage | 15% | Combinatorial depth |
| Volume | 20% | How many programs were tested |
| Cleanliness | 20% | Absence of mismatches and path failures |

### Confidence levels

| Level | Score range | Meaning |
|-------|------------|---------|
| VERY_HIGH | 90-100 | Comprehensive testing, no known gaps |
| HIGH | 70-89 | Solid coverage, minor gaps identified |
| MODERATE | 50-69 | Decent coverage, significant gaps remain |
| LOW | 0-49 | Insufficient testing, major gaps |

### Gap identification

The confidence report flags four kinds of gap:

| Gap type | When it triggers |
|----------|-----------------|
| Uncovered semantic properties | One or more enabled behaviors have never been tested |
| Low program count | Volume is too small for the profile's complexity |
| High mismatch rate | Active bugs are inflating the failure rate |
| Low coverage area | A coverage dimension (feature, semantic, or interaction) falls below threshold |

In CI modes (fast and deep), confidence is computed per-profile and overall. The CI dashboard shows the confidence level for each profile.

## Stress testing

Stress testing targets compiler edge cases that normal random generation rarely hits.

```bash
vary vast --profile core --count 100 --seed 42 --stress
```

### What stress mode generates

The stress generator produces difficult inputs in several categories.

**Integer boundary values:** `Long.MAX_VALUE`, `Long.MIN_VALUE`, `Int.MAX_VALUE`, `Int.MIN_VALUE`, byte/short boundaries, powers of 10.

**Float boundary values:** positive/negative zero, `Double.MAX_VALUE`, `Double.MIN_VALUE`, near-zero values, `1e15`, `1e-15`, `0.1 + 0.2`.

**Stress patterns:**

| Pattern | Example | What it catches |
|---------|---------|----------------|
| Identity ops | `x + 0`, `x * 1`, `x - 0` | Optimizer identity folding |
| Complementary ops | `(x + 5) - 5`, `(x * 3) / 3` | Optimizer inverse folding |
| Overflow chains | `MAX_VALUE * 2` | Overflow handling |
| Deep nesting | `((a + b) * c) - d` (3+ levels) | Stack depth, register allocation |
| Float precision | `0.1 + 0.1 + 0.1 + ...` | Precision accumulation |
| Boundary negation | `-Long.MAX_VALUE` | Negation overflow |
| Division edges | `x / 1`, `x / -1` | Division special cases |

When stress mode is active, approximately 30% of generated expressions use stress patterns. The semantic coverage tracker records which stress properties were exercised.

## Optimizer validation

The `--opt-check` flag adds a fourth execution path to catch optimizer bugs.

```bash
vary vast --profile types --count 100 --seed 42 --opt-check
```

### Four execution paths

| Path | Pipeline |
|------|----------|
| AST interpreter | Direct AST evaluation (reference oracle) |
| IR interpreter | AST lowered to register-based IR, then interpreted |
| JVM optimized | ConstantFolder + DeadCodeEliminator, then bytecode |
| JVM unoptimized | Direct bytecode generation, no optimization passes |

### Blame localization

When paths disagree, the comparator identifies the suspect:

| AST | IR | JVM-unopt | JVM-opt | Blame |
|-----|-----|-----------|---------|-------|
| A | A | A | B | Optimizer bug |
| A | B | B | B | AST interpreter bug |
| A | A | B | B | Codegen bug |
| A | B | A | B | Mixed (multiple issues) |

In deep CI mode, `--opt-check` is enabled automatically.

## Path health monitoring

Path calibration verifies that execution paths are healthy before testing begins.

```bash
vary vast --profile core --count 100 --seed 42 --calibrate --path-health
```

The `--calibrate` flag runs a set of known-answer programs and checks that all paths produce expected results. If calibration fails, VAST exits with an error rather than producing unreliable results.

The `--path-health` flag tracks path agreement across all programs in the run and prints a reliability matrix showing how often each pair of paths agrees.

## Continuous exploration

Continuous mode runs time-bounded exploration with adaptive profile selection.

```bash
vary vast --mode continuous --duration 300
```

Continuous mode picks profiles and batch sizes based on what has been tested so far.

| Behavior | How it works |
|----------|-------------|
| Profile selection | Profiles with lower semantic coverage get higher selection probability. Untried profiles get an exploration bonus. |
| Batch sizing | Early iterations use small batches (20 programs) for breadth. Later iterations use larger batches (100 programs) for depth. |
| Cumulative state | Coverage maps merge across iterations, so the system tracks overall progress across profiles. |
| Convergence | When cumulative semantic coverage exceeds 95%, exploration stops early. |

The report includes a per-profile summary, confidence score, and coverage gap analysis identifying which profiles and properties need more work.

```text
VAST Continuous Exploration Report
============================================================
Duration: 300.2s | Iterations: 47
Programs: 2340 total, 2340 passed, 0 mismatches
Status: TIME_BUDGET_EXHAUSTED

Per-Profile Summary:
  Profile        Programs   Passed Mismatches    Iters
  ------------------------------------------------------
  collections          90       90          0        3
  complete            450      450          0       10
  control             180      180          0        5
  core                160      160          0        4
  ...

Confidence: HIGH (82%)
  ...

Coverage Gap Analysis:
  Overall semantic coverage: 88%
  ...
```

See [CI integration](/docs/vast/ci-integration/) for how continuous mode fits into the CI pipeline.
