A Few Tips for OCR With Qwen3.5 through Llama.cpp

Just a couple of quick tips. I am using the Unsloth Qwen3.5 27b gguf, and also tried the 122b gguf.

First: The difference between the bf16 and fp32 mmproj is night and day. I was getting multiple hallucinations, errors, etc with the bf16. I swapped to the fp32 mmproj and it fixed up a lot of that almost instantly. Drastic improvement. The vision projector may have components that benefit from fp32's additional mantissa bits (23 bits vs bf16's 7 bits).

Second: Forcing the model to kick up the minimum number of visual tokens. For example, I was trying to run OCR on an old image of a Japanese newspaper article from 1957 that I found. It was something like 733x1024, and the model was really struggling to read the body of the text; tons of hallucinations, just making up entire sections of text. By forcing the image-min-tokens up to 2048, it forced the model to use 3x the visual processing, and the quality went up MASSIVELY. All of a sudden it could read the paper, with only a handful of small issues.

This is what I added to the llama-server command: --image-min-tokens 2048 --image-max-tokens 8192

I did have to toss 1.1 repetition penalty in there, as it was having a hard time transcribing Japanese without failing, but otherwise it is doing a great job now.