A New Toy...
The M5 Max Macbook Pro just arrived. First thing I did was fling llama.cpp, Wilmer and Open WebUI on it.
Honestly, the speeds are really impressive, even considering that llama.cpp hasn't fully integrated the hardware changes yet (at least, that's my understanding). Here's a comparison of Qwen3.5 35b a3b between the M5 Max Macbook vs the M3 Ultra Mac Studio
M5 Max MacBook Pro:
1450 t/s processing, 68 t/s generation
prompt eval time =
3202.80 ms / 4654 tokens
(0.69 ms per token, 1453.10 tokens per second)
eval time =
7098.19 ms / 483 tokens
(14.70 ms per token, 68.05 tokens per second)
total time = 10300.99 ms / 5137 tokens
M3 Ultra Mac Studio:
1647 t/s processing, 48 t/s generation
prompt eval time =
3810.74 ms / 6280 tokens
(0.61 ms per token, 1647.97 tokens per second)
eval time =
14695.00 ms / 704 tokens
(20.87 ms per token, 47.91 tokens per second)
total time =
18505.75 ms / 6984 tokens
So yea- the Studio processes prompts faster (at this size of model and this amount of tokens, though I think that it actually saturates better on the M5 Max at larger prompts), but generates tokens slower than the M5 Max.
Super excited to play with this. I got rid of the M2 Max Macbook, so this is my main travel machine now.