Benchmarks
M3 Ultra Mac Studio 512GB prompt and write speeds for Deepseek V3 0 671b gguf q4_K_M, for those curious
UPDATE 2025-04-13: llama.cpp has had an update that GREATLY improved the prompt processing speed. Please see the new speeds below. Deepseek V3 0324 Q4_K_M w/Flash Attention 4800 token context, responding 552 tokens CtxLimit:4744/8192, Amt:552/4000, Init:0.07s, Process:65.46s