Wilmer Tool Calling

So some year and a half after the request was made for me to put tool calling into Wilmer, I've finally got it in there.

First off- it was a huge pain to implement; if I didn't have Wilmer itself and agentic coders to help, I'm not sure I'd have done it. The way streaming works with tool calling is a bit odd, too, so that was interesting to navigate. Really, this was something I couldn't have pulled off without the earlier workflow engine refactor for the Execution Context.

The idea is straightforward: Wilmer sits in between the frontend and the LLM, so it just needs to pass tool definitions from the frontend through to the model, and pass tool call responses from the model back to the frontend. Wilmer itself doesn't need to understand or execute the tools. The tricky part was that Wilmer has a whole pipeline of nodes doing different things (memory lookups, categorization, summarization, context gathering) and you really don't want tool calls accidentally hitting nodes that are just doing internal processing. So I had to put per-node controls in place. Only the nodes you explicitly flag will pass tools through; the rest just strip it out and do their job; with the exception of pulling out just the tool call outputs to give in the case of some internal nodes using chat_user_prompt_*.

Format conversion between OpenAI, Claude, and Ollama backends was also a headache since they all handle tool calling differently, and streaming tool calls needed their own handling to keep the structured data from getting mangled by the normal text processing pipeline.

But the reason I finally sat down and did this is that I've been using OpenCode more lately. Up until summer of last year I had pretty much written off agentic coding, but once Claude Code got good I found myself sucked in like everyone else. Even though I'm usually a very local-first oriented guy, I've just stuck to that since because the quality is so great.

A month or so ago I started dabbling in OpenCode, to have something for when the net goes out, and I have to say that Qwen3.5 27b combined with it is pretty nice... but nowhere near the quality of Claude (obviously). My goal hasn't changed since 2023: trying to find ways to improve the quality of local tools to that of proprietary, even if it means sacrificing speed for quality. So as with all things, after trying OpenCode for a while, my answer is: shove Wilmer into the flow.

Now that tool calling works end to end, I can do just that. The OpenCode calls pass through Wilmer, hit my workflows, and the tool calls get forwarded through to one of N number of models in llama.cpp and back without Wilmer needing to know anything about what the tools actually do. It slows everything down a lot, but the result is far less engagement from me because it gets things right in far fewer tries. Especially doing things like the earlier Qwen improvements of manually applying CoT.

I've had really great luck with getting Qwen3.5 122b to give a lot better results than stock like this, but Qwen3.5 27b has been a bit harder to wrangle. Getting it to play nice with my decision trees is fairly challenging so far.

I'm going to tinker with these OpenCode workflows for a month or so and then start putting them out for folks. Updating the example workflows in the repo is next on the list.

Wilmer Tool Calling

Read more

A Quick Note on Gemma 4 Image Settings in Llama.cpp

A Few Tips for OCR With Qwen3.5 through Llama.cpp

Wrangling Qwen's Overthinking with Workflows

A New Toy...