Redownloading All The Old Models

So the new gguf format looks cool in that it supposedly automatically calculates the rope-freq for you. However, I do actually have data caps on my internet so I really want to try my hand at converting my existing ggmls.

It looks like you can do it a llama.cpp script:

python convert-llama-ggmlv3-to-gguf.py -i ~/models/llama-2-7b.ggmlv3.q5_1.bin -o ~/models/llama-2-7b.gguf.q5_1.bin

but it only works with ggmlv3 models; I think a lot of my older ones are not v3 so I may not get away with just converting them. I'll probably hang on to an old copy of gpt4all and oobabooga just so that I can keep running them a bit longer.

Anyhow- if you do find yourself trying to convert your ggmls, that script seems to be the magic tool to do it.

I haven't disappeared...

It's been a few weeks since I've posted or made any changes on Wilmer; I haven't stopped or lost interest, but rather I'm about to change jobs and I've been heads down on transition stuff before I leave my current

Understanding MoE Offloading

I was trying to answer someone's question about how Llama.cpp handles offloading with Mixture of Experts models on a regular gaming PC with a 24GB GPU, and ended up spending a few hours in a deep dive.

My Next Steps with Wilmer

When I first started Wilmer, it was for a very specific reason: I wanted a semantic router, and one didn't yet exist. The routers that were available were all specifically designed to take the last message, categorize that, and route you that way. I needed more, though; what

Microsoft's New User Role Model

So this looks like it could actually be a really fun model https://huggingface.co/microsoft/UserLM-8b I like this little specific purpose LLMs the most because it opens up some neat doors. They likely made this to act as the user-proxy in autogen, and they point out on their

Read more

I haven't disappeared...

Understanding MoE Offloading

My Next Steps with Wilmer

Microsoft's New User Role Model