SomeOddCodeGuy's Ramblings

Understanding MoE Offloading

I was trying to answer someone's question about how Llama.cpp handles offloading with Mixture of Experts models on a regular gaming PC with a 24GB GPU, and ended up spending a few hours in a deep dive.

My Next Steps with Wilmer

When I first started Wilmer, it was for a very specific reason: I wanted a semantic router, and one didn't yet exist. The routers that were available were all specifically designed to take the last message, categorize that, and route you that way. I needed more, though; what

Microsoft's New User Role Model

So this looks like it could actually be a really fun model https://huggingface.co/microsoft/UserLM-8b I like this little specific purpose LLMs the most because it opens up some neat doors. They likely made this to act as the user-proxy in autogen, and they point out on their

Reddit woes; but a light at the end of the tunnel

After 3 months, /u/reddit finally messaged me to tell me the account was permanently banned. However, the section that should contain the reason for the ban is empty. It just says Your account has been permanently banned for breaking the rules. This account has been permanently closed. To continue

Agentic Coding Pt 2...

Every weekend for a while I've put out a release to Wilmer on Sunday; generally a few features I was able to knock out on Saturday and test on Sunday. Almost always using either some combination of local models with Wilmer via Open WebUI, or using Gemini 2.

GLM 4.6 MXFP4 vs q8_0 gguf speeds on Mac M3 Ultra

Someone asked me to run the mxfp4 gguf vs q8, so I figured I'd post the results here too for anyone to see. As expected mxfp4 comes out to a little over half the size of the q8, and the speed is just a bit faster. I expect

I'm Not Sure If This Counts As "Vibe Coding"?

For anyone who knows of me, they know that I don't like using coding agents. I have nothing particularly against them, I just don't prefer them. I like the quality and control of workflows, in a direct chat window. You can see as much in my

Recursive Workflows- Offline Wikipedia Search in WilmerAI

On thing I've always wanted to do is have Wilmer workflows call themselves, so I can create a form of recursion within the workflows. This allows for a sort of semi-agentic behavior: repeated iterations on a problem with some breakout criteria. Now that may sound like an agent,

Stanford Lectures...

A few days ago Stanford dumped a whole pile of AI/ML lectures up on their youtube. They're a pretty good watch if you get bored and want to dive more into this stuff. Stanford OnlineYou can gain access to a world of education through Stanford Online, the

Heh... Oops. All Hail the Unit Tests

Don't put off unit tests. When I first started building Wilmer, I barely knew any Python, and of course I didn't have Wilmer to help me build it lol. So the early code was nothing shy of a disaster; coming from a C# background, I first

Twitter/X...

Somehow I've had it this far in life without ever actually using the site. But while I wait for one of my tickets withReddit to finally reach a human so that I can get my account back, outside of discord its one of the few places I can

My Unorthodox Homelab Setup: Updated

So my old Reddit post about my "unorthodox setup" went down with the reddit ship, and figured it was time for an update anyway, so I'm bringing it back. My setup has gotten more complex than I originally planned, built out piecemeal over the past 2.

See all

SomeOddCodeGuy's Ramblings

Featured

Understanding MoE Offloading

My Unorthodox Homelab Setup: Updated

RAG Really Is More of a Software Problem Than An AI Problem

Mac Studio M3 Ultra Speeds for Qwen3 235b, GPT-OSS-120b, GLM 4.5, and Deepseek V3.1

Latest

Understanding MoE Offloading

My Next Steps with Wilmer

Microsoft's New User Role Model

Reddit woes; but a light at the end of the tunnel

Agentic Coding Pt 2...

GLM 4.6 MXFP4 vs q8_0 gguf speeds on Mac M3 Ultra

I'm Not Sure If This Counts As "Vibe Coding"?

Recursive Workflows- Offline Wikipedia Search in WilmerAI

Stanford Lectures...

Heh... Oops. All Hail the Unit Tests

Twitter/X...

My Unorthodox Homelab Setup: Updated