Benchmarks
Running Llama 3.1 405b q6 and Command-A 111b Q8 on M3 Ultra Mac Studio
Below are benchmarks of running Llama 3.1 405b q6 and Command A 111b Q8 on an M3 Ultra 512GB using KoboldCpp The 405b was so miserable to run that I didn't even try flash attention, and flash attention was completely broken with Command-A M3 Ultra Llama 3.