Skip to content
Back to full roadmap
topicfoundation

Cost Economics

Input vs output pricing, model mix, batch API, caching — per-token cost math.

2 hours3 resources

LLM API cost = (input_tokens × input_rate) + (output_tokens × output_rate). Output is typically 3-5× more expensive.

Optimization levers:

  • Model selection — simple tasks → Haiku/4o-mini, complex → Opus/4o
  • Prompt caching — 50-90% off on static context
  • Batch API — 50% off, 24h latency tolerable
  • Streaming — UX win, same cost
  • Output truncation — hard cap with max_tokens
  • Retrieval over context — RAG instead of long context

Practical: 100K users/month × 5 prompts = 500K requests. Without a model mix, single-model cost differs 10× from optimal.

Resources(3)

Related steps