LLM Brain Selection (Claude vs GPT vs Gemini vs OSS)
Model choice is critical for agents — each vendor has different strengths in agent contexts.
Claude (Anthropic): strong on tool use, high XML semantic understanding, "reasoning"+"thinking" modes. Leader in long-context (200K-1M). Reference for agentic IDE / computer use. Expensive (Opus) but production-ready.
GPT (OpenAI): mature function calling, strong structured-output enforcement (response_format). o1/o3 reasoning models. Realtime API for voice agents.
Gemini (Google): native multimodal (video, audio), 2M context, cheap. Tool use OK but less fine-tuned than Claude/GPT.
Llama / Mistral / Qwen / DeepSeek (open-source): on-premise + privacy. Function calling limited (Llama 3.1+ supports). Self-host with vLLM.
Decision matrix: complex multi-step tool use → Claude / o1. Speed-critical structured output → GPT-4o / Gemini Flash. Privacy → Llama 3.1 70B. Edge → Llama 3.2 3B / Phi-4.