Skip to content
Back to full roadmap
topiccore

Audio Processing (STT + TTS)

Whisper (transcription) + ElevenLabs/OpenAI TTS → voice assistant pipeline.

3 hours3 resources

STT (Speech-to-Text): OpenAI Whisper API, Deepgram, AssemblyAI. For Turkish, Whisper-large-v3 is recommended.

LLM sits in between — user voice → text → LLM → text → voice → user.

TTS: ElevenLabs (most natural), OpenAI TTS (4 voices, fast), Cartesia (real-time low-latency).

Realtime (OpenAI): skip intermediates — voice in → voice out, ~300ms latency. Still pricey but UX revolution.

Resources(3)