M3 Ultra Mac Studio 512GB Speeds
Qwen3 235b a22b Instruct Q8 in Llama.cpp server
(~15k tokens)
prompt eval time
4.60 ms per token,
217.29 tokens per second
eval time
67.59 ms per token,
14.80 tokens per second
total time
146863.82 ms / 15763 tokens
(~5k tokens)
prompt eval time
4.90 ms per token,
204.24 tokens per second
eval time
57.18 ms per token,
17.49 tokens per second
total time
65510.45 ms / 5649 tokens
GPT-OSS-120b Unsloth fp16 gguf in Llama.cpp server
(~5k tokens)
prompt eval time
1.37 ms per token,
732.57 tokens per second
eval time
15.90 ms per token,
62.90 tokens per second
total time
8526.55 ms / 4447 tokens
GLM 4.5 Q8 in Llama.cpp server
(~20k tokens)
prompt eval time
7.26 ms per token,
137.82 tokens per second
eval time
103.33 ms per token,
9.68 tokens per second
total time
202089.84 ms / 21377 tokens
(15k tokens)
prompt eval time
7.16 ms per token,
139.64 tokens per second
eval time
96.64 ms per token,
10.35 tokens per second
total time
200516.47 ms / 16505 tokens
(~10k tokens)
prompt eval time
6.64 ms per token,
150.55 tokens per second
eval time
88.75 ms per token,
11.27 tokens per second
total time
108213.31 ms / 10927 tokens
(~5k tokens)
prompt eval time
6.86 ms per token,
145.70 tokens per second
eval time
81.31 ms per token,
12.30 tokens per second
total time
64483.49 ms / 6000 tokens
Deepseek V3.1 Q5_K_M in Llama.cpp server
(~13k tokens)
prompt eval time
14.22 ms per token,
70.30 tokens per second
eval time
264.86 ms per token,
3.78 tokens per second
total time
253415.56 ms / 13217 tokens
(~5k tokens)
prompt eval time
9.68 ms per token,
103.30 tokens per second
eval time
144.04 ms per token,
6.94 tokens per second
total time
119343.67 ms / 5763 tokens
(~3k tokens)
prompt eval time
11.92 ms per token,
83.86 tokens per second
eval time
107.64 ms per token,
9.29 tokens per second
total time
65396.57 ms / 3269 tokens