VAST

Value pools

Instead of picking a fresh random number every time it needs a value, the generator can reuse variables it already created. This produces programs where values flow through assignments and expressions, which is closer to how real code works and better at finding bugs.

Why random constants are not enough

Early VAST phases generated fresh literals for every expression. Need an integer? Pick a random number. Need a boolean? Flip a coin.

This produces syntactically diverse programs, but the data flow is shallow. Every value is independent. There is no reuse, no composition, and no dependency between variables. Real programs are different: values flow through assignments, get passed to functions, and combine in expressions that reference earlier results.

Value pools address this by tracking values as they are generated and making them available for reuse.

What a value pool is

A value pool is a collection of expressions available at a given point in the program. As the generator creates variables, loop counters, and branch results, each one is added to the pool. When the generator needs a new expression, it can choose to reuse a value from the pool instead of generating a fresh literal.

Generator needs an Int expression
       |
       +-- 30% chance: reuse a value from the pool
       |
       +-- 20% chance: compose two pool values (e.g., x + y)
       |
       +-- 50% chance: generate a fresh literal or expression

The probabilities are configurable. Higher reuse bias produces programs with more data dependencies. Lower reuse bias produces programs closer to the original random generation.

Pool entries

Each entry in the pool tracks metadata about the value:

FieldWhat it records
TypeThe Vary type (Int, Bool, Str, etc.)
ExpressionThe AST expression that produces the value
OriginWhere the value came from (local binding, loop iteration, branch join, argument, temporary)
Scope depthHow deeply nested the value is (for visibility checking)
ComplexityExpression depth (simpler values are preferred for reuse)
Dependency depthHow many other pool entries this value transitively depends on
Use countHow many times the value has been reused

Pool origins

Values enter the pool from different sources:

OriginExamplePriority
Local bindinglet x = 42High (named, easy to reference)
Loop carriedx = x + 1 inside a while loopMedium
Branch joinValue computed in an if/else branchMedium
ArgumentFunction parameterHigh
TemporaryIntermediate expression resultLow

Named locals are preferred for reuse because they produce cleaner, more readable generated programs.

How reuse works

When the generator decides to reuse a pool value, it selects from entries that match the required type and are visible at the current scope depth. Selection is biased toward:

PriorityPreference
1Named locals over temporaries
2Lower complexity over higher complexity
3Less frequently used values over heavily used ones
4Deeper scope (closer to the current position)

This bias produces realistic data flow patterns: values tend to be used near where they are defined, simpler values get reused more than complex expressions, and the generator avoids over-referencing any single variable.

Composition

Instead of reusing a single pool value, the generator can compose two values:

let x = 10
let y = 3
# composed value: x + y (combines two pool entries)
let z = x + y

Composition creates expressions that reference multiple earlier values, producing richer dependency graphs. The maximum composition complexity is configurable to prevent deeply nested expressions.

Edge cases that pools enable

Value pools improve VAST's ability to find edge-case bugs because they create programs with specific value patterns:

PatternHow pools produce itWhat it tests
Zero interactionsReuse a variable that holds 0 in arithmeticDivision by zero, multiply by zero
Identity operationsCompose x + 0, x * 1 with pool valuesOptimizer correctness for identities
AliasingReuse the same variable in multiple contextsVariable scoping and register allocation
AccumulationLoop-carried values that grow over iterationsLoop variable overflow, accumulator correctness

Without pools, these patterns only appear by coincidence. With pools, they appear regularly because the generator deliberately reuses values.

Configuration

Pool behaviour is controlled by five parameters:

ParameterDefaultDescription
Reuse bias30Probability (0-100) of reusing a pool value
Composition bias20Probability (0-100) of composing two pool values
Diversity penalty1Bonus for selecting less-used entries
Max dependency depth4Maximum transitive dependencies for a pool entry
Max composition complexity3Maximum expression depth when composing

These defaults produce a balance between fresh generation and value reuse. Higher reuse and composition biases produce programs with deeper data flow at the cost of less syntactic variety.

Metrics

VAST tracks pool statistics during generation:

MetricWhat it measures
Pool size over timeHow many entries are available at each generation step
Reuse and composition ratiosHow often each strategy is selected
Dependency depth distributionHow deep the transitive chains get
Named local prevalenceWhat fraction of reused values are named locals

These metrics help tune pool configuration and verify that value pools are actually producing the intended data flow patterns.

← Reduction
CI integration →