Skip to content
Back to full roadmap
topiccore

Rate Limiting & Queue Management

Vendor rate limits + concurrent-agent limits + queue with backpressure.

2 hours

Production agent system has 3 layers:

  1. Vendor rate limit (OpenAI/Anthropic) — RPM + TPM quotas. Exceeding → 429.
  2. Concurrent agent limit — how many agents can run at once (CPU/RAM/cost).
  3. Per-user rate limit — abuse prevention.

Pattern:

  • Token-bucket algorithm (per user)
  • Redis-based distributed rate limiter
  • Queue (BullMQ / Sidekiq / Temporal) with backpressure
  • 429 retry with exponential backoff
  • Priority queue (paid > free, real-time > async)
Rate Limiting & Queue Management · AI Agent Engineer Roadmap | SYK