Frankenmerges are actually kind of great...

For the past few months I've been working on a quiet little project on the weekends, whenever I can scrounge up time, and part of that project involves looking for the best models of each domain.

Of course, there are some great coding, medical, math, etc finetunes, but one thing that local models REALLY struggle with is context understanding. Reading the between the lines.

What I need is essentially to find a model that actually understands what I'm saying when I give a request. If I ask to help me work on a task, I don't want the LLM tripping all over itself trying to understand some of my implied speech, and come up with the wrong requirements all-together. Unfortunately, most of the coding models aren't exactly a chatty bunch, so they are the worst about that.

Well, interestingly enough, the roleplayers appear to have solved this problem by throwing MORE model at the problem lol

Enter Frankenmerges: smooshing the same model, or multiple models, together like play-dough to see what you get. Jam two 70bs together and somehow get a 120b; the model loses some level of coherence in terms of its raw knowledge and problem solving ability, but what it gets in terms of general understanding? WAY better. It's ability to actually figure out what you're saying when you ask for something is amazing.

Goliath-120b is so far the best for sure. Miqu-1-120b is also really solid. I tried some others, like Miquliz-120b, that one didn't really work too well.

I wonder if it's the extra layers? Maybe more layers == better comprehension?

I haven't disappeared...

It's been a few weeks since I've posted or made any changes on Wilmer; I haven't stopped or lost interest, but rather I'm about to change jobs and I've been heads down on transition stuff before I leave my current

Understanding MoE Offloading

I was trying to answer someone's question about how Llama.cpp handles offloading with Mixture of Experts models on a regular gaming PC with a 24GB GPU, and ended up spending a few hours in a deep dive.

My Next Steps with Wilmer

When I first started Wilmer, it was for a very specific reason: I wanted a semantic router, and one didn't yet exist. The routers that were available were all specifically designed to take the last message, categorize that, and route you that way. I needed more, though; what

Microsoft's New User Role Model

So this looks like it could actually be a really fun model https://huggingface.co/microsoft/UserLM-8b I like this little specific purpose LLMs the most because it opens up some neat doors. They likely made this to act as the user-proxy in autogen, and they point out on their

Read more

I haven't disappeared...

Understanding MoE Offloading

My Next Steps with Wilmer

Microsoft's New User Role Model