ollama·watch

/ Live

Full live dashboard

Per-endpoint detail, reliability heatmaps, and per-sample variance plots will live here. Below: the live tiles and trend chart from the home page.

Live · Endpoint StatusUpdated 11m ago · refresh hourly
gemma4-26b @ http://192.168.8.230:8002
OK11m ago
142.6tok/s · last run
142.6 tok/s · 24h avg
±9.4% (±13.0 tok/s) · variance · 24h
66.7% uptime · 24h · vllm_metrics
incidentincidentincidentincident
gemma4-26b @ http://192.168.8.230:8001
OK11m ago
139.7tok/s · last run
139.6 tok/s · 24h avg
±0.7% (±1.0 tok/s) · variance · 24h
36.4% uptime · 24h · vllm_metrics
incidentincidentincidentincidentincidentincidentincidentincidentincidentincidentincidentincidentincidentincident
whisper-large-v3 @ http://192.168.8.234:8002
DEGRADED11m ago
tok/s · last run
tok/s · 24h avg
· variance · 24h
0.0% uptime · 24h
qwen3-vl-8b @ http://192.168.8.234:8001
OK11m ago
138.5tok/s · last run
138.7 tok/s · 24h avg
±0.2% (±0.2 tok/s) · variance · 24h
100.0% uptime · 24h · llamacpp_timings
gpt-oss:120b-cloud @ ollama-cloud (ollama)
OK11m ago
124.6tok/s · last run
117.7 tok/s · 24h avg
±13.6% (±15.0 tok/s) · variance · 24h
86.4% uptime · 24h · server_wall
incidentincidentincident
gemma4:31b-cloud @ ollama-cloud (ollama)
OK12m ago
113.2tok/s · last run
87.6 tok/s · 24h avg
±41.1% (±34.9 tok/s) · variance · 24h
86.4% uptime · 24h · server_wall
incidentincidentincident
glm-5.1:cloud @ ollama-cloud (ollama)
OK12m ago
69.0tok/s · last run
88.7 tok/s · 24h avg
±37.4% (±35.4 tok/s) · variance · 24h
86.4% uptime · 24h · server_wall
incidentincidentincident
deepseek-v4-flash:cloud @ ollama-cloud (ollama)
OK12m ago
101.0tok/s · last run
64.1 tok/s · 24h avg
±31.6% (±22.7 tok/s) · variance · 24h
86.4% uptime · 24h · server_wall
incidentincidentincident
kimi-k2.6:cloud @ ollama-cloud (ollama)
OK12m ago
110.9tok/s · last run
51.5 tok/s · 24h avg
±45.0% (±28.2 tok/s) · variance · 24h
86.4% uptime · 24h · server_wall
incidentincidentincident
gemma4-26b @ http://192.168.8.231:8001
OK12m ago
74.9tok/s · last run
75.2 tok/s · 24h avg
±0.4% (±0.3 tok/s) · variance · 24h
100.0% uptime · 24h · llamacpp_timings
gemma4-26b @ http://192.168.8.231:8000
OK12m ago
75.1tok/s · last run
75.5 tok/s · 24h avg
±0.2% (±0.2 tok/s) · variance · 24h
100.0% uptime · 24h · llamacpp_timings
gemma4-26b @ http://192.168.8.233:8001
OK12m ago
232.8tok/s · last run
232.7 tok/s · 24h avg
±0.1% (±0.1 tok/s) · variance · 24h
100.0% uptime · 24h · vllm_metrics
llama3.2:3b @ apps-server1 (ollama)
OK12m ago
148.4tok/s · last run
148.0 tok/s · 24h avg
±1.4% (±2.0 tok/s) · variance · 24h
100.0% uptime · 24h · engine
llava:7b @ apps-server1 (ollama)
OK13m ago
46.0tok/s · last run
46.0 tok/s · 24h avg
±2.3% (±1.1 tok/s) · variance · 24h
100.0% uptime · 24h · engine
qwen2.5vl:7b @ apps-server1 (ollama)
OK13m ago
22.7tok/s · last run
22.3 tok/s · 24h avg
±3.7% (±0.8 tok/s) · variance · 24h
100.0% uptime · 24h · engine
Decode tok/s · 24H trend

Each point is one sample, taken at the top of the hour: one warmup run discarded, one timed run recorded. Same prompt every time. When an hour has no successful run, the line dives to the floor and a red dot marks the incident — timeout, rate-limit, or other non-OK status. We don't smooth incidents into the curve. Full methodology.

More charts coming as we add features. Reliability heatmap and per-sample variance scatter are next.