/ Playground
Race TITAN against Ollama Cloud, live
Two endpoints, one model (gemma4-31B), one prompt. Both stream tokens at the same time. The dashboard shows you that TITAN is roughly 3× faster on average—this is what "3× faster" actually feels like.
Pick a prompt. We send it to TITAN (4×RTX 3090, vLLM, on-prem) and Ollama Cloud (managed) at the same moment, both running gemma4-31B. Both models are warmed up before the race so the comparison reflects steady-state speed, not cold-start.
How this differs from the dashboard numbers
The hourly bench probe deliberately exposes cold-start variability — it sends each prompt without a separate warmup, so a freshly-routed cloud worker can produce a slow run, and that slow run shows up in the trend chart.
The playground does the opposite: a quick throwaway request hits both endpoints first, so by the time the visible race starts, both models are loaded in GPU memory. What you see here is steady-state decode speed, not cold-start. Both numbers are real — they just answer different questions.