Back to full roadmap
topicfoundation
Cost Economics
Input vs output pricing, model mix, batch API, caching — per-token cost math.
2 hours3 resources
LLM API cost = (input_tokens × input_rate) + (output_tokens × output_rate). Output is typically 3-5× more expensive.
Optimization levers:
- Model selection — simple tasks → Haiku/4o-mini, complex → Opus/4o
- Prompt caching — 50-90% off on static context
- Batch API — 50% off, 24h latency tolerable
- Streaming — UX win, same cost
- Output truncation — hard cap with
max_tokens - Retrieval over context — RAG instead of long context
Practical: 100K users/month × 5 prompts = 500K requests. Without a model mix, single-model cost differs 10× from optimal.