A Few Tips for OCR With Qwen3.5 through Llama.cpp

Just a couple of quick tips. I am using the Unsloth Qwen3.5 27b gguf, and also tried the 122b gguf.

First: The difference between the bf16 and fp32 mmproj is night and day. I was getting multiple hallucinations, errors, etc with the bf16. I swapped to the fp32 mmproj and it fixed up a lot of that almost instantly. Drastic improvement. The vision projector may have components that benefit from fp32's additional mantissa bits (23 bits vs bf16's 7 bits).

Second: Forcing the model to kick up the minimum number of visual tokens. For example, I was trying to run OCR on an old image of a Japanese newspaper article from 1957 that I found. It was something like 733x1024, and the model was really struggling to read the body of the text; tons of hallucinations, just making up entire sections of text. By forcing the image-min-tokens up to 2048, it forced the model to use 3x the visual processing, and the quality went up MASSIVELY. All of a sudden it could read the paper, with only a handful of small issues.

This is what I added to the llama-server command: --image-min-tokens 2048 --image-max-tokens 8192

I did have to toss 1.1 repetition penalty in there, as it was having a hard time transcribing Japanese without failing, but otherwise it is doing a great job now.

Wrangling Qwen's Overthinking with Workflows

So I've been running Qwen3.5 122b a10b lately on the M2 Ultra (currently GLM 5 is sitting on the M3), and if you've used any of the Qwen3.5 family, you've probably seen or heard about the overthinking issue. The models are great

A New Toy...

The M5 Max Macbook Pro just arrived. First thing I did was fling llama.cpp, Wilmer and Open WebUI on it. Honestly, the speeds are really impressive, even considering that llama.cpp hasn't fully integrated the hardware changes yet (at least, that's my understanding). Here'

Slimming Down the Homelab Software Footprint

So my homelab setup post from a while back is already outdated. Not as much on the hardware part; rather the software side has consolidated dramatically. The original setup had somewhere around 20 to 30 separate WilmerAI instances running across my network. Each one was configured for a specific purpose:

The Right Monitor is Hard to Come By

It is shocking how difficult it is to find a 34" curved Ultrawide that is either 2560x1080 or 5120x2160. Back in 2020 or 2021, Spectre made one; it's been discontinued now though. The big issue for me is two fold because I have a triple monitor setup:

Read more

Wrangling Qwen's Overthinking with Workflows

A New Toy...

Slimming Down the Homelab Software Footprint

The Right Monitor is Hard to Come By