I have been on a bender this weekend experimenting with various LLM-capable machines in my home lab, specially the very capable yet fast Qwen3-Coder-30B-A3B-Instruct. I haven’t found good benchmarks, though, so I ran the small Gemma4 E4B Q4_K model (4.62 GiB, 7.52B params) using llm-bench.

Machine Backend pp512 t/s tg128 t/s Notes
xhystos ROCm 291.48 6.65 AMD Ryzen AI 7 350 Krackan Point 32GB
utumno MTL,BLAS 1172.93 69.73 Mac Studio M1 Ultra 128GB
ai-x1-pro ROCm 568.54 21.16 AMD AI 9 HX 370 Strix Point 96GB
dgx1 CUDA 3633.84 59.42 NVIDIA DGX Spark 128GB
zanzibar CUDA 1831.78 51.92 NVIDIA A2000 12GB