How to setup a local coding agent on macOS
A local coding agent on macOS can be built using llama.cpp with Metal acceleration, Gemma 4 26B-A4B in Q4 GGUF format, and a Q8 MTP draft model for speculative decoding, achieving 72.2 tokens/second on an M1 Max with 64 GB RAM—a 24% speedup over the base model alone. The setup uses Pi as the terminal agent and supports OpenAI-compatible APIs and multimodal input, outperforming MLX-based alternatives (45.8 tok/s) in this configuration.