/ Articles

Long-form analysis

Deep dives on the data behind the dashboard. Numbers come from the same probe; the prose is what we'd say over coffee about what the numbers actually mean.

▸ vLLM TP=4

Apr 25, 2026 · 12 min

vLLM TP=4 on 4×RTX 3090: 76.9 tok/s, no marketing spin

A month of single-stream decode benchmarking on the HYDRA rig — vLLM tensor parallelism on four used 3090s, what NVLink doesn't buy you, why FP8 KV cache fails on Ampere, and where this build actually beats Ollama Cloud.

More posts coming as the v15/v16 benchmark series gets broken out. The data updates hourly regardless — bookmark /live for the live numbers.