Performance — Markdown View

Vary compiles to JVM bytecode, so at runtime it gets the same JIT compiler that optimizes Java. We expect Vary to be considerably faster than Python and hopefully within about 85% of Java on most workloads.

We wrote a small benchmark suite to see where we stand. The results are not scientific. Nine workloads on one machine with a fixed heap is not a performance study. But it tells us whether the bytecode we generate is in the right ballpark, and so far it is.

Benchmarks vs Java

Eclipse Temurin 25 on Linux, 2 GB heap. Each benchmark runs 750 ms of warmup, then 9 trials of 2500 ms. Numbers are medians.

Benchmark	Java (ms)	Vary (ms)	vs Java
Fib Iterative	49.3	49.3	≈
Int Arith	26.6	25.8	≈
Mandelbrot	111.8	110.6	≈
List Ops	6.2	4.3	-31%
Map Ops	124.7	133.5	+7%
Map Ops (defaults)	174.3	177.3	≈
Alloc	29.9	30.2	≈
String Concat	56.5	42.7	-24%
Primes Sieve	33.3	28.1	-16%

≈ means within 5%. Negative percentages mean Vary was faster.

Most of the table is a wash, which is what we hoped for. The JIT does not care that the bytecode came from Vary instead of javac.

String concat is faster because Vary uses invokedynamic-based concatenation, and the pattern in this particular benchmark JITs well. List ops and primes sieve benefit from Vary's IntArray and BoolArray types, which skip boxing. Map ops is 7% slower, likely due to how Vary's codegen handles map default values compared to hand-written Java.

Where Vary loses, it is usually boxing. List[Int] in Vary is an ArrayList<Long> under the hood, while Java can use long[] directly. IntArray and BoolArray exist to close that gap for hot paths, but they only help when you use them.

Note: Boxing means wrapping a primitive value (like a 64-bit integer) in a heap-allocated object so it can be stored in a generic collection. Every box costs an allocation and an extra pointer dereference. In tight loops over large lists, that adds up.

Toolchain performance benchmarks

Beyond runtime throughput, we now measure the compiler toolchain itself. Five benchmark suites run in CI to catch regressions:

Suite	What it measures
Startup	Cold-start time for `vary run`, `vary check`, `vary test`
Workflow	End-to-end latency of check and test workflows
Memory	Peak heap, GC pause time, allocation rate
Verification	Throughput of mutation testing and VAST differential testing
Trend	Historical score tracking across runs

Each suite has a stored baseline. Regressions above 25% fail the nightly build; regressions above 10% produce warnings. The trend suite generates historical reports for tracking performance over time.

What we have not measured

Concurrent throughput with spawn/join under realistic allocation patterns.