SomeOddCodeGuy's Ramblings (Page 2)

Latest

A Quick-ish Rundown of LLM Basics

Over the past few days, I've realized that there are a lot of folks out there using LLMs that haven't had an opportunity to dig, even a little, into the basics of how LLMs really work. And I guess that makes sense; for the most part,

Qwen3.6, and WilmerAI OpenCode workflows

Just a random note, but Qwen3.6 35b a3b is putting a smile on my face. This little model feels like a big upgrade over 3.5's 27b or 35b a3b. Also- the Wilmer workflow for OpenCode is really going well. I need to test it more, because

Wilmer Tool Calling

So some year and a half after the request was made for me to put tool calling into Wilmer, I've finally got it in there. First off- it was a huge pain to implement; if I didn't have Wilmer itself and agentic coders to help, I&

A Quick Note on Gemma 4 Image Settings in Llama.cpp

In my last post, I mentioned using --image-min-tokens to increase the quality of image responses from Qwen3.5. I went to load Gemma 4 the same way, and hit an error: [58175] srv process_chun: processing image... [58175] encoding image slice... [58175] image slice encoded in 7490 ms [58175] decoding

A Few Tips for OCR With Qwen3.5 through Llama.cpp

Just a couple of quick tips. I am using the Unsloth Qwen3.5 27b gguf, and also tried the 122b gguf. First: The difference between the bf16 and fp32 mmproj is night and day. I was getting multiple hallucinations, errors, etc with the bf16. I swapped to the fp32 mmproj

Wrangling Qwen's Overthinking with Workflows

So I've been running Qwen3.5 122b a10b lately on the M2 Ultra (currently GLM 5 is sitting on the M3), and if you've used any of the Qwen3.5 family, you've probably seen or heard about the overthinking issue. The models are great

A New Toy...

The M5 Max Macbook Pro just arrived. First thing I did was fling llama.cpp, Wilmer and Open WebUI on it. Honestly, the speeds are really impressive, even considering that llama.cpp hasn't fully integrated the hardware changes yet (at least, that's my understanding). Here'

Slimming Down the Homelab Software Footprint

So my homelab setup post from a while back is already outdated. Not as much on the hardware part; rather the software side has consolidated dramatically. The original setup had somewhere around 20 to 30 separate WilmerAI instances running across my network. Each one was configured for a specific purpose:

The Right Monitor is Hard to Come By

It is shocking how difficult it is to find a 34" curved Ultrawide that is either 2560x1080 or 5120x2160. Back in 2020 or 2021, Spectre made one; it's been discontinued now though. The big issue for me is two fold because I have a triple monitor setup:

My Foray Back Into Linux...

So I decided to make use of one of the mini-pcs I had gotten for the homelab to build a little web browsing box. My first iteration of the web browsing box was a Windows 11 machine, which is the same machine that got me banned from reddit for VPN

Wilmer and Token Management

One of the big keys to running LLMs on a Mac is token management. That's what a lot of Wilmer is built around. Wilmer started out because I wanted to make the most of Llama 2 finetunes, but eventually its workflows became a way for me to keep

If You Have the Hardware- Use it to Learn!

If you've never messed with open source LLMs and you jumped on the ClawdBot/OpenClaw hype train: take some time to learn more about how local models work. You likely went through the trouble of getting a Mac Mini, so you now have a nice little test box

An Analogy to Help Understand Mixture of Experts

If you're having a hard time understanding MoE strength vs dense models, and roughly where they might land when comparing them, think about this super oversimplified analogy. I'm hoping it makes sense: The Scenario Imagine a paid trivia competition, but all the questions are about carpentry

I Won't Miss The Cold...

This has nothing to do with technology, but just so you know- I'm a tropical beastie, and I absolutely will not miss the 22 degree weather this pass weekend. I am no longer built for this. That is all.

My Personal Guide for Developing Software with AI Assistance - 2026 Edition

What's Changed Since 2024 So back in May of 2024 I wrote the first version of this little guide, at a time when agents were absolute crap and Wilmer was still in a state that couldn't even be called v0.01. Back then it got a

Clawdbot...

Everyone and their brother is talking about Clawdbot, but as several others have pointed out- an agent with that many connections could be a security nightmare if it can be prompt injected. But since it supports OpenAI and Ollama endpoints... I wonder how well it would work if I stuck

M5 Max Macbook Pro Next Week?

Honestly been waiting for next week for a while. Even if the wait time is 2 months or longer on actually getting the order, having an M5 Max for the hardware matmul is going to be amazing and worth the wait. One nice thing about getting a new machine now

GLM 4.6 at UD_Q3_K_XL is surprisingly usable

So I currently run GLM 4.7 Q8 on my M3 Ultra, and after wrestling to find a solid model that would work well on the M2 Ultra 192GB, I finally decided to give the older GLM 4.6 UD_Q3_K_XL a try on it, seeing how much

Agents are Growing On Me

Go back in time a year and a half- it's mid-2024, LinkedIn has discovered AI and now the buzzword of the year is "agentic". Everyone and their brother was trying to convert every single task to be doable by an "agent" and, to be

The Resistance to Using Any AI...

When folks say "I can't find a use for AI", I think far too many of them are overthinking the use-cases, or expecting a much more grand difference in their lives. Without actually relying on LLMs to do the thinking for me, I can say without

I haven't disappeared...

It's been a few weeks since I've posted or made any changes on Wilmer; I haven't stopped or lost interest, but rather I'm about to change jobs and I've been heads down on transition stuff before I leave my current

Understanding MoE Offloading

I was trying to answer someone's question about how Llama.cpp handles offloading with Mixture of Experts models on a regular gaming PC with a 24GB GPU, and ended up spending a few hours in a deep dive.

My Next Steps with Wilmer

When I first started Wilmer, it was for a very specific reason: I wanted a semantic router, and one didn't yet exist. The routers that were available were all specifically designed to take the last message, categorize that, and route you that way. I needed more, though; what

Microsoft's New User Role Model

So this looks like it could actually be a really fun model https://huggingface.co/microsoft/UserLM-8b I like this little specific purpose LLMs the most because it opens up some neat doors. They likely made this to act as the user-proxy in autogen, and they point out on their

See all