Benchmarks
Mac Studio M3 Ultra Speeds for Qwen3 235b, GPT-OSS-120b, GLM 4.5, and Deepseek V3.1
M3 Ultra Mac Studio 512GB Speeds Qwen3 235b a22b Instruct Q8 in Llama.cpp server (~15k tokens) prompt eval time 4.60 ms per token, 217.29 tokens per second eval time 67.59 ms per token, 14.80 tokens per second total time 146863.82 ms / 15763 tokens (~5k