# llama.cpp + Ollama: GGUF Serving + Modelfile + System Prompt Versioning

> Source: https://sukruyusufkaya.com/en/learn/fine-tuning-cookbook/ftc-llama-cpp-ollama-gguf-modelfile
> Updated: 2026-05-14T14:43:01.309Z
> Category: Fine-Tuning Cookbook (Model-by-Model)
> Module: Part XV — Serving Engineering
**TLDR:** llama.cpp + Ollama — gold standard for CPU/Apple Silicon/edge. GGUF format, Ollama's Modelfile (system prompt + tools versioning), Ollama API, OpenAI-compatible endpoint. Q4_K_M Llama 8B in Ollama on RTX 4090: 95 tok/s.

