SomeOddCodeGuy's Ramblings (Page 3)

Latest

Links

A Quick List of LLM Benchmarks

A quick dump of the benchmarks that I look at and use personally; I've dropped a few that no longer appear to be kept up to date, and grabbed a few newer ones. Code Specific * https://www.swebench.com/ * https://swe-rebench.com/ * https://aider.chat/docs/leaderboards/ Coding

Generative-AI

RAG Really Is More of a Software Problem Than An AI Problem

RAG is really 90% a software development problem, 10% an AI problem. People overcomplicate it on the AI side a lot, but it's a $5 term for a $0.05 concept: give the LLM the answer before it responds to you. At its face, that's simple

Generative-AI

That LinkedIn '95% of AI Ventures Fail' Stat That's Going Around...

So over the past week I'm suddenly seeing folks posting on LinkedIn about this number that 95% of corps fail to generate significant revenue with AI projects. Honestly, I'd believe it. Personally? I think that a big reason so many AI projects die is because folks

Benchmarks

Mac Studio M3 Ultra Speeds for Qwen3 235b, GPT-OSS-120b, GLM 4.5, and Deepseek V3.1

M3 Ultra Mac Studio 512GB Speeds Qwen3 235b a22b Instruct Q8 in Llama.cpp server (~15k tokens) prompt eval time 4.60 ms per token, 217.29 tokens per second eval time 67.59 ms per token, 14.80 tokens per second total time 146863.82 ms / 15763 tokens (~5k

Reddit Shadowbans- A Deep Dive Into What Little I Could Find

So back in July, while using a fairly popular commercial VPN, I made a comment on the LocalLlama sub answering someone's question by linking one of my own posts with some benchmarks; something I would do often. After a few minutes, I decided to edit the post to

I'll Always Have A Softspot for the Old Text-Generation-WebUI Chat Bubbles

Found this screenshot from back in 2023; if I remember right, CodeLlama had just come out and I was trying to see how it would do in a coding interview. But look at that old interface for text-gen. The below picture is from the internet, and shows the new interface.

Benchmarks

Running Deepseek R1 0528 q4_K_M and mlx 4-bit on a Mac Studio M3

Mac Model: M3 Ultra Mac Studio 512GB, 80 core GPU First- this model has a shockingly small KV Cache. If any of you saw my post about running Deepseek V3 q4_K_M, you'd have seen that the KV cache buffer in llama.cpp/koboldcpp was 157GB for

Benchmarks

M3 Ultra Mac Studio 512GB prompt and write speeds for Deepseek V3 0 671b gguf q4_K_M, for those curious

UPDATE 2025-04-13: llama.cpp has had an update that GREATLY improved the prompt processing speed. Please see the new speeds below. Deepseek V3 0324 Q4_K_M w/Flash Attention 4800 token context, responding 552 tokens CtxLimit:4744/8192, Amt:552/4000, Init:0.07s, Process:65.46s (64.02T/

Benchmarks

Running Llama 3.1 405b q6 and Command-A 111b Q8 on M3 Ultra Mac Studio

Below are benchmarks of running Llama 3.1 405b q6 and Command A 111b Q8 on an M3 Ultra 512GB using KoboldCpp The 405b was so miserable to run that I didn't even try flash attention, and flash attention was completely broken with Command-A M3 Ultra Llama 3.

Benchmarks

Mac Speed Comparison: M2 Ultra vs M3 Ultra using KoboldCpp

tl;dr: Running ggufs in Koboldcpp, the M3 is marginally... slower? Slightly faster prompt processing, but slower prompt writing across all models. I added a comparison Llama.cpp run at the bottom; same speed as Kobold, give or take. Setup: * Inference engine: Koboldcpp 1.85.1 * Text: Same text on

Benchmarks

Low Context Speed Comparison: Macbook, Mac Studios, and RTX 4090

It's been a while since my last Mac speed post, so I figured it was about time to post a new one. I've noticed a lot of the old "I get 500 tokens per second!" kind of talk re-appearing, so I figured some cold-hard

My Personal Guide for Developing Software with AI Assistance: Part 2

A quick introduction before I begin. If you haven't had an opportunity to read it yet, please check out the first post: My personal guide for developing software with AI Assistance. This will not rehash that information, but is rather an addendum to it with new things that

See all