SomeOddCodeGuy's Ramblings (Page 5)

Latest

Benchmarks

M3 Ultra Mac Studio 512GB prompt and write speeds for Deepseek V3 0 671b gguf q4_K_M, for those curious

UPDATE 2025-04-13: llama.cpp has had an update that GREATLY improved the prompt processing speed. Please see the new speeds below. Deepseek V3 0324 Q4_K_M w/Flash Attention 4800 token context, responding 552 tokens CtxLimit:4744/8192, Amt:552/4000, Init:0.07s, Process:65.46s

Benchmarks

Running Llama 3.1 405b q6 and Command-A 111b Q8 on M3 Ultra Mac Studio

Below are benchmarks of running Llama 3.1 405b q6 and Command A 111b Q8 on an M3 Ultra 512GB using KoboldCpp The 405b was so miserable to run that I didn't even try flash attention, and flash attention was completely broken with Command-A M3 Ultra Llama

Benchmarks

Mac Speed Comparison: M2 Ultra vs M3 Ultra using KoboldCpp

tl;dr: Running ggufs in Koboldcpp, the M3 is marginally... slower? Slightly faster prompt processing, but slower prompt writing across all models. I added a comparison Llama.cpp run at the bottom; same speed as Kobold, give or take. Setup: * Inference engine: Koboldcpp 1.85.1 * Text: Same text on

Benchmarks

Low Context Speed Comparison: Macbook, Mac Studios, and RTX 4090

It's been a while since my last Mac speed post, so I figured it was about time to post a new one. I've noticed a lot of the old "I get 500 tokens per second!" kind of talk re-appearing, so I figured some

My Personal Guide for Developing Software with AI Assistance: Part 2

A quick introduction before I begin. If you haven't had an opportunity to read it yet, please check out the first post: My personal guide for developing software with AI Assistance. This will not rehash that information, but is rather an addendum to it with new things that

Offline-Wiki-Api

Offline Wikipedia API- An easy to use offline API that serves up full text Wikipedia articles.

Cross-Posting from Reddit This project is an answer to a previous question that I had about the easiest route to offline Wikipedia RAG. After mulling over the responses, txtai jumped out to me as the most straight forward. Since by default that dataset only returns the first paragraph of

MMLU-Pro Combined Results - Model Quantization Comparison

This post is a combination of some new results, old results, and reddit.com/u/invectorgator's results (with permission) to help give a clear picture of all testing so far. Links to the relevant posts can be found below. This was a lot of fun, and has lit

WilmerAI

Meet WilmerAI- my open source project to maximize the potential of Local LLMs via prompt routing and multi-model workflows

Cross-Posting from Reddit IMPORTANT: This is an early development, barely even Alpha, release. Wilmer is a passion project for myself, but it felt stingy not to share it given how interested everyone was in it, so I released early. It's still months from what I'd

My Personal Guide for Developing Software with AI Assistance

So, in the past I've mentioned that I use AI to assist in writing code for my personal projects, especially for things I use to automate stuff for myself, and I've gotten pretty mixed responses. Some folks say they do the same, others say AI can

WilmerAI

Almost a year later, I can finally do this. A small teaser of a project I'm working on

Ever since I first saw the group chat feature in SillyTavern, I've always wanted to have a team of AI to help me work on things. But I never liked the result of using 1 LLM to do it; it never really felt like it was doing me

Playing with Agents

So I've been digging into agents a bit. Specifically, been looking into AutoGen and CrewAI. These are super cool. I'm not sure what I'd actually use them for, but between the two I think I like CrewAI the best between the two. Next I&

Frankenmerges are actually kind of great...

For the past few months I've been working on a quiet little project on the weekends, whenever I can scrounge up time, and part of that project involves looking for the best models of each domain. Of course, there are some great coding, medical, math, etc finetunes, but

See all