Binary data without the footguns

There is a class of bug that shows up in nearly every language when you work with binary data. You read a file as a string, pass it to a hashing function that wants bytes, and get the wrong answer because the string went through a UTF-8 encode that you did not ask for. Or you receive an HTTP response body as bytes, try to concatenate it with a string, and get a type error at runtime. Or you read bytes from disk, write them to a network socket through a layer that silently converts to a string, and the data arrives corrupted.

These bugs are hard to find because they often look like they work. The hash comes back as a valid hex string. The HTTP response parses fine in your test environment. The corruption only shows up when the input contains bytes that are not valid UTF-8, which might be never in your test data and always in production.

The root cause is always the same: the language does not distinguish between "a sequence of bytes" and "a string of text" at the type level, or it distinguishes them but makes conversion implicit.

In Vary, that boundary is part of the Bytes stdlib, the filesystem API, and the HTTP stdlib.

How Vary handles it

Vary has two distinct types: Str for text and Bytes for binary data. They are not interchangeable. You cannot pass a Str where a Bytes is expected, or vice versa. Converting between them requires an explicit function call that names the encoding.

This distinction runs through every module that does I/O.

The filesystem module has two pairs of read/write functions:

import fs

# Text I/O (reads and writes Str)
let text_path = fs.read_path("config.json").unwrap()
let text = fs.read_text(text_path).unwrap()
let out_path = fs.write_path("output.txt").unwrap()
fs.write_text(out_path, text).unwrap()

# Binary I/O (reads and writes Bytes)
let image_path = fs.read_path("image.png").unwrap()
let data = fs.read_bytes(image_path).unwrap()
let copy_path = fs.write_path("copy.png").unwrap()
fs.write_bytes(copy_path, data).unwrap()

There is no catch-all file reader that returns "whatever the file contains." You decide up front whether you are working with text or binary data.

The HTTP module follows the same split. Response bodies come back as Str by default (for JSON APIs), but you can get the raw Bytes:

import http

let response = http.get("https://example.com/api/data")
let json_text: Str = response.body       # text
let raw: Bytes = response.body_bytes()    # binary

The crypto module goes further. All primary operations accept and return Bytes:

import crypto

let pdf_path = fs.read_path("document.pdf").unwrap()
let data: Bytes = fs.read_bytes(pdf_path).unwrap()
let hash = crypto.sha256(data)           # Bytes in, CryptoHash out
print(hash.hex())                        # explicit conversion to Str

If you want to hash a string, you use the _str suffix variant:

let hash = crypto.sha256_str("hello")    # Str convenience

The suffix makes the type boundary visible. You always know whether you are in string-land or bytes-land.

Encoding is explicit

Converting between Bytes and Str goes through encoding functions that name the encoding:

import crypto

let raw: Bytes = crypto.random_bytes(32)

# Bytes to Str (explicit encoding)
let hex: Str = crypto.hex_encode(raw)
let b64: Str = crypto.base64_encode(raw)

# Str to Bytes (explicit decoding)
let decoded: Bytes = crypto.hex_decode(hex)
let also_decoded: Bytes = crypto.base64_decode(b64)

There is no implicit toString() on Bytes. If you try to print a Bytes value or concatenate it with a string, the type checker stops you. You have to pick an encoding.

This is annoying the first time you hit it. It is a relief the fiftieth time, when you realize you have never had to debug a "wrong encoding" issue.

A real pipeline

Here is what a realistic binary workflow looks like in Vary. Read a file, hash it, upload it, verify the server's response:

import fs
import crypto
import http
import json

# Read binary file
let artifact_path = fs.read_path("artifact.tar.gz").unwrap()
let payload: Bytes = fs.read_bytes(artifact_path).unwrap()

# Hash it locally
let local_hash = crypto.sha256(payload)

# Upload (HTTP module accepts Bytes body)
let response = http.post("https://storage.example.com/upload", payload)

# Server returns the hash it computed
let server_hash = json.parse(response.body).get("sha256").as_str()

# Compare
if local_hash.hex() == server_hash {
    print("Upload verified")
}

Every variable has a clear type. payload is Bytes, local_hash is CryptoHash, response.body is Str (JSON text), server_hash is Str. The only type boundary crossing is local_hash.hex(), which converts the hash to a hex string for comparison. At no point did binary data silently become a string or vice versa.

If you have spent time debugging a corrupted file upload where everything worked in tests but production data came through mangled, this kind of explicitness starts to feel less like ceremony and more like insurance.

What we avoided

Python famously struggles with this. In Python 2, str was bytes and unicode was text, and mixing them produced silent mojibake. Python 3 fixed it by making str always text and adding a separate bytes type, but the ecosystem still has libraries that accept either, and the error messages when you mix them up are confusing.

Go takes a different approach: string and []byte are freely convertible with a type cast, but every conversion copies the data. The compiler does not warn you when you convert back and forth unnecessarily.

Vary is stricter than both. No implicit conversion, no cheap cast, and every I/O function picks a side. The cost is a few extra characters when you need to cross the boundary. The benefit is that "why is my binary data corrupted" is a bug you never have to debug.

The design principle

The idea behind the Bytes type is the same one behind Decimal vs Float and Money vs Decimal: when two things look similar but have different semantics, give them different types.

Text and binary data look similar in memory. Both are sequences of values. But text must be valid UTF-8 while bytes have no encoding constraint. You can uppercase text but not bytes. You can XOR bytes but not text. Text can fail to decode; bytes never do. These are not edge cases. They are fundamental differences in how the data behaves, and collapsing them into one type hides that.

Fewer keystrokes now, more 3am debugging sessions later. We picked the other tradeoff.