Llama-bench on some consumer-grade AI hardware
I have been on a bender this weekend experimenting with various LLM-capable
machines in my home lab, specially the very capable yet fast
Qwen3-Coder-30B-A3B-Instruct. I haven’t found good benchmarks, though, so I
ran the small Gemma4 E4B Q4_K model (4.62 GiB, 7.52B params) using
llm-bench.
| Machine | Backend | pp512 t/s | tg128 t/s | Notes |
|---|---|---|---|---|
| xhystos | ROCm | 291.48 | 6.65 | AMD Ryzen AI 7 350 Krackan Point 32GB |
| utumno | MTL,BLAS | 1172.93 | 69.73 | Mac Studio M1 Ultra 128GB |
| ai-x1-pro | ROCm | 568.54 | 21.16 | AMD AI 9 HX 370 Strix Point 96GB |
| dgx1 | CUDA | 3633.84 | 59.42 | NVIDIA DGX Spark 128GB |
| zanzibar | CUDA | 1831.78 | 51.92 | NVIDIA A2000 12GB |