If You Buy A Mac for LLMs, Don't Skimp on RAM

The Macbook M1 Pro 16GB is an alright machine for using smaller AI models, but I'm really feeling the pressure on the RAM side.

This thing can allot 10.7GB of its unified RAM as VRAM, which is fine for the smaller models like gpt4all-j-v1.3-groovy, but 13bs are completely out of the question. For example- ggml-vic13b-q4_0.bin, which is only 7ishGB, won't run; I kept getting a random looking crashed until I realized that's what the out of memory exception looks like. Must be some extra overhead, because I'd think that it would...

I regret not just forking over a little more for the extra RAM when I got this machine. Plus I see folks using models like GPT4-x-Vicuna-30b and get a little jealous =D

I'll probably invest in an NVidia machine for this at some point, but I'm concerned about the power draw. Would prefer not to turn my house into a bonfire on accident.

I haven't disappeared...

It's been a few weeks since I've posted or made any changes on Wilmer; I haven't stopped or lost interest, but rather I'm about to change jobs and I've been heads down on transition stuff before I leave my current

Understanding MoE Offloading

I was trying to answer someone's question about how Llama.cpp handles offloading with Mixture of Experts models on a regular gaming PC with a 24GB GPU, and ended up spending a few hours in a deep dive.

My Next Steps with Wilmer

When I first started Wilmer, it was for a very specific reason: I wanted a semantic router, and one didn't yet exist. The routers that were available were all specifically designed to take the last message, categorize that, and route you that way. I needed more, though; what

Microsoft's New User Role Model

So this looks like it could actually be a really fun model https://huggingface.co/microsoft/UserLM-8b I like this little specific purpose LLMs the most because it opens up some neat doors. They likely made this to act as the user-proxy in autogen, and they point out on their

Read more

I haven't disappeared...

Understanding MoE Offloading

My Next Steps with Wilmer

Microsoft's New User Role Model