Wilmer Tool Calling
So some year and a half after the request was made for me to put tool calling into Wilmer, I've finally got it in there. First off- it was a huge pain to implement; if I didn't have Wilmer itself and agentic coders to help, I&
So some year and a half after the request was made for me to put tool calling into Wilmer, I've finally got it in there. First off- it was a huge pain to implement; if I didn't have Wilmer itself and agentic coders to help, I&
In my last post, I mentioned using --image-min-tokens to increase the quality of image responses from Qwen3.5. I went to load Gemma 4 the same way, and hit an error: [58175] srv process_chun: processing image... [58175] encoding image slice... [58175] image slice encoded in 7490 ms [58175] decoding
Just a couple of quick tips. I am using the Unsloth Qwen3.5 27b gguf, and also tried the 122b gguf. First: The difference between the bf16 and fp32 mmproj is night and day. I was getting multiple hallucinations, errors, etc with the bf16. I swapped to the fp32 mmproj
So I've been running Qwen3.5 122b a10b lately on the M2 Ultra (currently GLM 5 is sitting on the M3), and if you've used any of the Qwen3.5 family, you've probably seen or heard about the overthinking issue. The models are great
The M5 Max Macbook Pro just arrived. First thing I did was fling llama.cpp, Wilmer and Open WebUI on it. Honestly, the speeds are really impressive, even considering that llama.cpp hasn't fully integrated the hardware changes yet (at least, that's my understanding). Here'
So my homelab setup post from a while back is already outdated. Not as much on the hardware part; rather the software side has consolidated dramatically. The original setup had somewhere around 20 to 30 separate WilmerAI instances running across my network. Each one was configured for a specific purpose:
It is shocking how difficult it is to find a 34" curved Ultrawide that is either 2560x1080 or 5120x2160. Back in 2020 or 2021, Spectre made one; it's been discontinued now though. The big issue for me is two fold because I have a triple monitor setup:
So I decided to make use of one of the mini-pcs I had gotten for the homelab to build a little web browsing box. My first iteration of the web browsing box was a Windows 11 machine, which is the same machine that got me banned from reddit for VPN
One of the big keys to running LLMs on a Mac is token management. That's what a lot of Wilmer is built around. Wilmer started out because I wanted to make the most of Llama 2 finetunes, but eventually its workflows became a way for me to keep
If you've never messed with open source LLMs and you jumped on the ClawdBot/OpenClaw hype train: take some time to learn more about how local models work. You likely went through the trouble of getting a Mac Mini, so you now have a nice little test box
If you're having a hard time understanding MoE strength vs dense models, and roughly where they might land when comparing them, think about this super oversimplified analogy. I'm hoping it makes sense: The Scenario Imagine a paid trivia competition, but all the questions are about carpentry
This has nothing to do with technology, but just so you know- I'm a tropical beastie, and I absolutely will not miss the 22 degree weather this pass weekend. I am no longer built for this. That is all.
What's Changed Since 2024 So back in May of 2024 I wrote the first version of this little guide, at a time when agents were absolute crap and Wilmer was still in a state that couldn't even be called v0.01. Back then it got a
Everyone and their brother is talking about Clawdbot, but as several others have pointed out- an agent with that many connections could be a security nightmare if it can be prompt injected. But since it supports OpenAI and Ollama endpoints... I wonder how well it would work if I stuck
Honestly been waiting for next week for a while. Even if the wait time is 2 months or longer on actually getting the order, having an M5 Max for the hardware matmul is going to be amazing and worth the wait. One nice thing about getting a new machine now
So I currently run GLM 4.7 Q8 on my M3 Ultra, and after wrestling to find a solid model that would work well on the M2 Ultra 192GB, I finally decided to give the older GLM 4.6 UD_Q3_K_XL a try on it, seeing how much
Go back in time a year and a half- it's mid-2024, LinkedIn has discovered AI and now the buzzword of the year is "agentic". Everyone and their brother was trying to convert every single task to be doable by an "agent" and, to be
When folks say "I can't find a use for AI", I think far too many of them are overthinking the use-cases, or expecting a much more grand difference in their lives. Without actually relying on LLMs to do the thinking for me, I can say without
It's been a few weeks since I've posted or made any changes on Wilmer; I haven't stopped or lost interest, but rather I'm about to change jobs and I've been heads down on transition stuff before I leave my current
I was trying to answer someone's question about how Llama.cpp handles offloading with Mixture of Experts models on a regular gaming PC with a 24GB GPU, and ended up spending a few hours in a deep dive.
When I first started Wilmer, it was for a very specific reason: I wanted a semantic router, and one didn't yet exist. The routers that were available were all specifically designed to take the last message, categorize that, and route you that way. I needed more, though; what
So this looks like it could actually be a really fun model https://huggingface.co/microsoft/UserLM-8b I like this little specific purpose LLMs the most because it opens up some neat doors. They likely made this to act as the user-proxy in autogen, and they point out on their
After 3 months, /u/reddit finally messaged me to tell me the account was permanently banned. However, the section that should contain the reason for the ban is empty. It just says Your account has been permanently banned for breaking the rules. This account has been permanently closed. To continue
Every weekend for a while I've put out a release to Wilmer on Sunday; generally a few features I was able to knock out on Saturday and test on Sunday. Almost always using either some combination of local models with Wilmer via Open WebUI, or using Gemini 2.