/ Live

Full live dashboard

Per-endpoint detail, reliability heatmaps, and per-sample variance plots will live here. Below: the live tiles and trend chart from the home page.

Live · Endpoint StatusUpdated 39m ago · refresh hourly

Endpoint

Model

Status

tok/s

24h avg

Var

Uptime

Trend · 24h

FORGE · gemma4-26b

RTX 5090 · vLLM · 41m ago · vllm_metrics

Gemma 4 · 26B MoE

HYDRA-R · GPU 0

AMD Radeon Pro R9700 · llama.cpp · 41m ago · llamacpp_timings

Gemma 4 · 26B MoE

HYDRA-R · GPU 1

AMD Radeon Pro R9700 · llama.cpp · 41m ago · llamacpp_timings

Gemma 4 · 26B MoE

SCOUT · llama3.1:8b

RTX 5090 + 2× 5070 Ti · 40m ago

Llama 3.1 · 8B dense

SCOUT · qwen3-8b-honcho

RTX 5090 + 2× 5070 Ti · 40m ago · vllm_metrics

qwen3-8b-honcho

Qwen3 · 8B · Honcho memory

SCOUT · qwen3-vl-8b

RTX 5090 + 2× 5070 Ti · 40m ago · vllm_metrics

Qwen3-VL · 8B · vision

TITAN · Engine A

4× RTX 3090 · vLLM TP=2 · 40m ago · vllm_metrics

Gemma 4 · 26B MoE

TITAN · Engine B

4× RTX 3090 · vLLM TP=2 · 39m ago · vllm_metrics

Gemma 4 · 26B MoE

Decode tok/s · 24H trend

FORGE · gemma4-26bHYDRA-R · GPU 0HYDRA-R · GPU 1SCOUT · llama3.1:8bSCOUT · qwen3-8b-honchoSCOUT · qwen3-vl-8bTITAN · Engine ATITAN · Engine Bincident

Each point is one sample, taken at the top of the hour: one warmup run discarded, one timed run recorded. Same prompt every time. When an hour has no successful run, the line dives to the floor and a red dot marks the incident — timeout, rate-limit, or other non-OK status. We don't smooth incidents into the curve. Full methodology.

More charts coming as we add features. Reliability heatmap and per-sample variance scatter are next.