ollama·watch

/ Hardware / HYDRA

HYDRA: a 4×RTX 3090 inference rig that runs Gemma-4-31B at 76.9 tok/s

Built on the used market for ~$6,600 capex. Serves a production agentic workload while doubling as the reference benchmark target on this site. Below is the bill of materials with current Amazon and Newegg links.

▸ HYDRA · photo placeholder
replace with rig photo
Capex
~$6,600
Power
~1,500 W under load · ~120 W idle
Performance
76.9 tok/s decode · Gemma-4-31B-AWQ · single-stream
Serves
Maple agentic workflows + Ollama Watch bench rig

Bill of materials

Component Part Buy
GPU
NVIDIA RTX 3090 24GB (used market)
Used market only. Look for Founders Edition or EVGA FTW3 for cleaner thermals.
Amazon
NVLink
PNY NVLink 3-slot bridge (RTXA6000NVLINK3S-KIT)
Removed from production. Single-stream decode does not benefit from NVLink — empirically 4% slower with than without on TP=4.
Amazon
Motherboard
ASRock X299 Sage / WS X299 PRO
7 PCIe x16 slots so all four 3090s run at x8 minimum. Avoid consumer Z690/Z790.
Amazon
CPU
Intel Xeon W-2295 (or i9-10980XE)
CPU barely matters for inference but you need the lanes.
used market
RAM
128 GB DDR4 ECC (4×32 GB)
ECC because the rig runs unattended.
used market
PSU
Corsair AX1600i + secondary 1200W (dual-PSU)
4× 3090 will pull ~1500W under inference; dual-PSU keeps each unit out of derate range.
Amazon
Cooling
Noctua NH-U14S TR4-SP3 + 8× NF-A12x25
GPU stack runs hot when packed; case airflow matters more than CPU cooler.
used market
Storage
Samsung 980 Pro 2TB NVMe
Model weights cache + OS.
Amazon
Software
NVIDIA driver 580 + CUDA 13.0 + vLLM 0.17
Stack as of v15. Pin versions until benched.
used market
Not ready to commit $6,600?

Rent the equivalent on cloud GPUs first

Test your workload on a single A100 or H100 before deciding whether a 4×3090 build pays back. Both platforms below offer per-second billing.