Benchmarks

Low Context Speed Comparison: Macbook, Mac Studios, and RTX 4090

It's been a while since my last Mac speed post, so I figured it was about time to post a new one. I've noticed a lot of the old "I get 500 tokens per second!" kind of talk re-appearing, so I figured some cold-hard numbers would be of assistance to anyone uncertain of what machines could run what speeds.

I apologize for not doing this deterministically. I should have, but I realized that halfway through and didn't have time to go back and redo it.

Test Setup

Today we're comparing the RTX 4090, the M2 Max MacBook Pro, the M1 Ultra Mac Studio, and the M2 Ultra Mac Studio. This comparison was done by running:

Llama 3.1 8b q8
Nemo 12b q8
Mistral Small 22b q6_K

NOTE: The tests are run using a freshly loaded model, so this is the first prompt for each machine meaning nothing cached. Additionally, I did NOT enable flash attention, as there has been back and forth in the past about it acting differently on different machines.

Results

TL;DR

Here's a table with the Total T/s for each machine if you just want to peek at that and compare real quick.

Model	RTX 4090	MacBook Pro M2 Max	M1 Ultra Mac Studio	M2 Ultra Mac Studio
Llama 3.1 8b q8	52.99 T/s	28.92 T/s	37.92 T/s	42.47 T/s
Mistral Nemo 12b q8	39.41 T/s	19.18 T/s	27.45 T/s	29.44 T/s
Mistral Small 22b q6_K	26.72 T/s	10.13 T/s	15.51 T/s	15.84 T/s

Llama 3.1 8b q8

RTX 4090:

Context Limit: 1243/16384
Tokens Processed: 349/1000
Initialization: 0.03s
Processing: 0.27s (0.3ms/T = 3286.76T/s)
Generation: 6.31s (18.1ms/T = 55.27T/s)
Total: 6.59s (52.99T/s)

MacBook Pro M2 Max:

Context Limit: 1285/16384
Tokens Processed: 387/1000
Initialization: 0.04s
Processing: 1.76s (2.0ms/T = 508.78T/s)
Generation: 11.62s (30.0ms/T = 33.32T/s)
Total: 13.38s (28.92T/s)

M1 Ultra Mac Studio:

Context Limit: 1206/16384
Tokens Processed: 308/1000
Initialization: 0.04s
Processing: 1.53s (1.7ms/T = 587.70T/s)
Generation: 6.59s (21.4ms/T = 46.70T/s)
Total: 8.12s (37.92T/s)

M2 Ultra Mac Studio:

Context Limit: 1216/16384
Tokens Processed: 318/1000
Initialization: 0.03s
Processing: 1.29s (1.4ms/T = 696.12T/s)
Generation: 6.20s (19.5ms/T = 51.32T/s)
Total: 7.49s (42.47T/s)

Mistral Nemo 12b q8

RTX 4090:

Context Limit: 1169/16384
Tokens Processed: 252/1000
Initialization: 0.04s
Processing: 0.32s (0.3ms/T = 2874.61T/s)
Generation: 6.08s (24.1ms/T = 41.47T/s)
Total: 6.39s (39.41T/s)

MacBook Pro M2 Max:

Context Limit: 1218/16384
Tokens Processed: 301/1000
Initialization: 0.05s
Processing: 2.71s (2.9ms/T = 339.00T/s)
Generation: 12.99s (43.1ms/T = 23.18T/s)
Total: 15.69s (19.18T/s)

M1 Ultra Mac Studio:

Context Limit: 1272/16384
Tokens Processed: 355/1000
Initialization: 0.04s
Processing: 2.34s (2.5ms/T = 392.38T/s)
Generation: 10.59s (29.8ms/T = 33.51T/s)
Total: 12.93s (27.45T/s)

M2 Ultra Mac Studio:

Context Limit: 1234/16384
Tokens Processed: 317/1000
Initialization: 0.04s
Processing: 1.94s (2.1ms/T = 473.41T/s)
Generation: 8.83s (27.9ms/T = 35.89T/s)
Total: 10.77s (29.44T/s)

Mistral Small 22b q6_K

RTX 4090:

Context Limit: 1481/16384
Tokens Processed: 435/1000
Initialization: 0.01s
Processing: 1.47s (1.4ms/T = 713.51T/s)
Generation: 14.81s (34.0ms/T = 29.37T/s)
Total: 16.28s (26.72T/s)

MacBook Pro M2 Max:

Context Limit: 1378/16384
Tokens Processed: 332/1000
Initialization: 0.01s
Processing: 5.92s (5.7ms/T = 176.63T/s)
Generation: 26.84s (80.8ms/T = 12.37T/s)
Total: 32.76s (10.13T/s)

M1 Ultra Mac Studio:

Context Limit: 1502/16384
Tokens Processed: 456/1000
Initialization: 0.01s
Processing: 5.47s (5.2ms/T = 191.33T/s)
Generation: 23.94s (52.5ms/T = 19.05T/s)
Total: 29.41s (15.51T/s)

M2 Ultra Mac Studio:

Context Limit: 1360/16384
Tokens Processed: 314/1000
Initialization: 0.01s
Processing: 4.38s (4.2ms/T = 238.92T/s)
Generation: 15.44s (49.2ms/T = 20.34T/s)
Total: 19.82s (15.84T/s)

Low Context Speed Comparison: Macbook, Mac Studios, and RTX 4090

Test Setup

Results

TL;DR

Llama 3.1 8b q8

Mistral Nemo 12b q8

Mistral Small 22b q6_K

Read more

I Won't Miss The Cold...

My Personal Guide for Developing Software with AI Assistance - 2026 Edition

Clawdbot...

M5 Max Macbook Pro Next Week?