Building and Running Llama.cpp on an Air-Gapped Mac
If you ever tried to run Llama.cpp on a MacOS device that doesn't have internet on it, you've probably hit the annoying GateKeeper errors that it's downloaded from the internet and you should delete it. Generally I just build from source to avoid that, but I ran into something interesting that I thought I'd share.
Last night I noticed that llama.cpp's newly added WebUI feature now includes downloads from huggingface and/or npm when you are running cmake, so if you are trying to build it on a computer that has no net connection, you'll hit an error:
UI: failed to download index.html from version: "Could not resolve hostname"
-- UI: downloading assets from latest: https://huggingface.co/buckets/ggml-org/llama-ui/resolve/latest
-- UI: failed to download index.html from latest: "Could not resolve hostname"
CMake Warning at /home/user/llama.cpp-b9181/scripts/ui-download.cmake:209 (message):
UI: failed to download assets from HF Bucket (llama-ui)
There was a note that if you set LLAMA_BUILD_UI=OFF then it would disable that, and you'd be able to build offline- however, that didn't work and it kept crashing. There's a fix in for that in progress, but in the meantime the fix is to set that AND LLAMA_BUILD_WEBUI=OFF.
Steps to Build Llama.cpp from Source on MacOS
NOTE: You have to have cmake installed on your machine for this to work. It's an installer you can grab and run yourself.
- Go to the repo, go to releases, go to the latest release (or the one you want), head to the bottom and download the source zip (named
Source code (zip)at the bottom). - Unzip it somewhere
- In terminal, navigate into the llama.cpp folder. For example, if you dropped it in your user folder -> llama.cpp-b9196, then you'd do
cd ~/llama.cpp-b9196 - Now you can run this to build it
cmake -B build -DLLAMA_BUILD_UI=OFF -DLLAMA_BUILD_WEBUI=OFF
cmake --build build --config Release
NOTE: There is a PR to fix the need for both. Once it's merged and tested, just -DLLAMA_BUILD_UI=OFF will work. https://github.com/ggml-org/llama.cpp/pull/23190NOTE: You can add -j after "Release" to have it use more cores. Be careful with that, though, as it can be pretty performance hungry if you do just -j without a value, as it will just use all cores.Once it's done, you will find the executables within the /build/bin folder of that directory, so in our example ~/llama.cpp-b9196/build/bin!
Using the Pre-built Assemblies on MacOS
If you decide to download one of the pre-built assemblies like macOS Apple Silicon (arm64), then you may hit an issue where it complains that the application was downloaded from the internet and only give you the option to stop/delete the file. This is the fault of GateKeeper. You can press cmd + Space, type GateKeeper, and it should open that in settings. You'll see a spot to tell it to let you run the app anyway; if you select that and then try to re-run the program, it'll prompt you for the password. Unfortunately, it will do that not only for llama-server, but all the child processes, too... sometimes it can take as many as 7-9 password types.
It's also possible to strip the com.apple.quarantine xattribute that macOS adds to internet downloaded files that causes Gatekeeper to be annoying. Removing it skips the prompts, so I usually just do that if I can't build the sourcecode myself. The command that I use is:xattr -dr com.apple.quarantine ~/replace-with-llama-folder-path.