/ Live
Full live dashboard
Per-endpoint detail, reliability heatmaps, and per-sample variance plots will live here. Below: the live tiles and trend chart from the home page.
Endpoint
Status
tok/s
FORGE · gemma4-26b
RTX 5090 · vLLM · 42m ago · vllm_metrics
OK
238.4
HYDRA-R · GPU 0
AMD Radeon Pro R9700 · llama.cpp · 42m ago · llamacpp_timings
OK
75.2
HYDRA-R · GPU 1
AMD Radeon Pro R9700 · llama.cpp · 42m ago · llamacpp_timings
OK
74.9
SCOUT · llama3.1:8b
RTX 5090 + 2× 5070 Ti · 41m ago
TIMEOUT
—
SCOUT · qwen3-8b-honcho
RTX 5090 + 2× 5070 Ti · 40m ago · vllm_metrics
OK
52.2
SCOUT · qwen3-vl-8b
RTX 5090 + 2× 5070 Ti · 41m ago · vllm_metrics
OK
33.6
TITAN · Engine A
4× RTX 3090 · vLLM TP=2 · 40m ago · vllm_metrics
OK
139.5
TITAN · Engine B
4× RTX 3090 · vLLM TP=2 · 40m ago · vllm_metrics
OK
143.5
Decode tok/s · 24H trend
Each point is one sample, taken at the top of the hour: one warmup run discarded, one timed run recorded. Same prompt every time. When an hour has no successful run, the line dives to the floor and a red dot marks the incident — timeout, rate-limit, or other non-OK status. We don't smooth incidents into the curve. Full methodology.
More charts coming as we add features. Reliability heatmap and per-sample variance scatter are next.