Back to full roadmap
topiccore
Rate Limiting & Queue Management
Vendor rate limits + concurrent-agent limits + queue with backpressure.
2 hours
Production agent system has 3 layers:
- Vendor rate limit (OpenAI/Anthropic) — RPM + TPM quotas. Exceeding → 429.
- Concurrent agent limit — how many agents can run at once (CPU/RAM/cost).
- Per-user rate limit — abuse prevention.
Pattern:
- Token-bucket algorithm (per user)
- Redis-based distributed rate limiter
- Queue (BullMQ / Sidekiq / Temporal) with backpressure
- 429 retry with exponential backoff
- Priority queue (paid > free, real-time > async)