Back to full roadmap
topiccore
Audio Processing (STT + TTS)
Whisper (transcription) + ElevenLabs/OpenAI TTS → voice assistant pipeline.
3 hours3 resources
STT (Speech-to-Text): OpenAI Whisper API, Deepgram, AssemblyAI. For Turkish, Whisper-large-v3 is recommended.
LLM sits in between — user voice → text → LLM → text → voice → user.
TTS: ElevenLabs (most natural), OpenAI TTS (4 voices, fast), Cartesia (real-time low-latency).
Realtime (OpenAI): skip intermediates — voice in → voice out, ~300ms latency. Still pricey but UX revolution.