Tüm roadmap'e dön
topicadvanced
Voice Agent'lar (Sesli)
STT + LLM + TTS pipeline veya OpenAI Realtime — telefon, call center, hands-free.
4 saat3 kaynak
2 mimari:
Pipeline (klasik): mikrofon → Whisper/Deepgram (STT) → LLM agent (tool use) → ElevenLabs/Cartesia (TTS) → hoparlör. Latency 1-3sn, modular, cheap.
Realtime (modern): OpenAI Realtime API / Gemini Live — audio in/out tek model, ~300ms latency. Premium UX ama pahalı.
Stack:
- Telephony: Twilio Voice, Vapi.ai, Retell, Bland.ai
- STT: Deepgram, AssemblyAI, Whisper API
- TTS: ElevenLabs, Cartesia, OpenAI TTS
- Voice agent framework: Pipecat (Daily), LiveKit Agents, Vapi
Use case'ler: restaurant rezervasyon, doktor randevu, lead qualification, customer support phone line, IVR replacement.