Running Llama2 on Apple Silicon with llama.cpp
Recently, I was curious to see how easy it would be to run run Llama2 on my MacBook Pro M2, given the impressive amount of memory it makes available to both CPU and GPU. This led me to the excellent llama.cpp, a project focused on running simplified versions of the Llama models on both CPU and GPU. The process felt quite straightforward except for some instability in the llama.cpp repo just as I decided to try it out, and which has been fixed in the meantime....