İçeriğe geç
Tüm roadmap'e dön
topicadvanced

Voice Agent'lar (Sesli)

STT + LLM + TTS pipeline veya OpenAI Realtime — telefon, call center, hands-free.

4 saat3 kaynak

2 mimari:

Pipeline (klasik): mikrofon → Whisper/Deepgram (STT) → LLM agent (tool use) → ElevenLabs/Cartesia (TTS) → hoparlör. Latency 1-3sn, modular, cheap.

Realtime (modern): OpenAI Realtime API / Gemini Live — audio in/out tek model, ~300ms latency. Premium UX ama pahalı.

Stack:

  • Telephony: Twilio Voice, Vapi.ai, Retell, Bland.ai
  • STT: Deepgram, AssemblyAI, Whisper API
  • TTS: ElevenLabs, Cartesia, OpenAI TTS
  • Voice agent framework: Pipecat (Daily), LiveKit Agents, Vapi

Use case'ler: restaurant rezervasyon, doktor randevu, lead qualification, customer support phone line, IVR replacement.

Kaynaklar(3)