Llama-bench on some consumer-grade AI hardware · Fazal Majid's low-intensity blog

I have been on a bender this weekend experimenting with various LLM-capable machines in my home lab, specially the very capable yet fast Qwen3-Coder-30B-A3B-Instruct. I haven’t found good benchmarks, though, so I ran the small Gemma4 E4B Q4_K model (4.62 GiB, 7.52B params) using llm-bench.

Machine	Backend	pp512 t/s	tg128 t/s	Notes
xhystos	ROCm	291.48	6.65	AMD Ryzen AI 7 350 Krackan Point 32GB
utumno	MTL,BLAS	1172.93	69.73	Mac Studio M1 Ultra 128GB
ai-x1-pro	ROCm	568.54	21.16	AMD AI 9 HX 370 Strix Point 96GB
dgx1	CUDA	3633.84	59.42	NVIDIA DGX Spark 128GB
zanzibar	CUDA	1831.78	51.92	NVIDIA A2000 12GB