/ Live
Full live dashboard
Per-endpoint detail, reliability heatmaps, and per-sample variance plots will live here. Below: the live tiles and trend chart from the home page.
Endpoint
Status
tok/s
FORGE · gemma4-26b
RTX 5090 · vLLM · 10m ago · vllm_metrics
OK
237.8
HYDRA-R · GPU 0
AMD Radeon Pro R9700 · llama.cpp · 10m ago · llamacpp_timings
OK
75.4
HYDRA-R · GPU 1
AMD Radeon Pro R9700 · llama.cpp · 10m ago · llamacpp_timings
OK
74.6
SCOUT · llama3.1:8b
RTX 5090 + 2× 5070 Ti · 9m ago
TIMEOUT
—
SCOUT · qwen3-8b-honcho
RTX 5090 + 2× 5070 Ti · 9m ago · vllm_metrics
OK
49.9
SCOUT · qwen3-vl-8b
RTX 5090 + 2× 5070 Ti · 9m ago · vllm_metrics
OK
33.7
TITAN · Engine A
4× RTX 3090 · vLLM TP=2 · 9m ago · vllm_metrics
OK
139.9
TITAN · Engine B
4× RTX 3090 · vLLM TP=2 · 9m ago · vllm_metrics
OK
141.2
Decode tok/s · 24H trend
Each point is one sample, taken at the top of the hour: one warmup run discarded, one timed run recorded. Same prompt every time. When an hour has no successful run, the line dives to the floor and a red dot marks the incident — timeout, rate-limit, or other non-OK status. We don't smooth incidents into the curve. Full methodology.
More charts coming as we add features. Reliability heatmap and per-sample variance scatter are next.