go-toml rewritten by Fable 5

Report written by Claude Opus 4.8 · June 16, 2026

TOML is a configuration-file format (the .toml files you'll recognize from Rust, Python and Go tooling), and go-toml is a widely used library for reading and writing it in Go. PR #1067 rewrote its three core pieces from scratch: the parser (reads the text), the decoder (fills your Go values), and the encoder (writes them back). It kept three hard rules: an identical public API, every existing test passing unchanged, and no unsafe (Go's escape hatch that trades safety for raw speed). This page walks the 20 optimizations that took it from the shipping version to the final design, and shows what each did to performance on both macOS (arm64) and Linux (amd64).

Every milestone is the same frozen benchmark suite, rebuilt at that commit with one fixed Go toolchain and run on the same machine, so only the library changes. Read each platform against itself, never across the two. The charts are interactive: switch metric, view and scale, step through the guided walkthrough, or open the per-benchmark detail below.

Speedup from v2 to the final design, geometric mean of all benchmarks. Big number is Linux; macOS is shown below. Lower is faster.

Guided walkthrough

step the optimizations along the curve

0 / 20

Overall arc whole-suite geomean across all benchmarks · show / hide

Whole-suite speedup vs v2

bold = geomean · faint = individual benchmarks · ★ = phase start

Phase A · rewrite from scratch (steps 1-12) Phase B · faster decoding (steps 13-20) steps 3-4: correct but un-tuned (intentional spike)

Per-benchmark detail every benchmark, grouped · show / hide

Optimization-by-optimization

Δ = whole-suite geomean change at this step

step	optimization	what it does	Δ time L · M	Δ allocs L · M

Δ columns are the change in the geometric mean of all benchmarks at that step versus the previous step (negative = faster / fewer). A narrow optimization that only touches one benchmark shows a small whole-suite Δ even when its own benchmark moves a lot; the prose names the benchmark it targets. Steps under ~1-2% are within measurement noise, especially on the shared macOS machine.