Skip to content
Technical GlossarySpeech, Voice and Audio AI

Vocoder

A core synthesis component that generates an audible waveform from acoustic representations or spectral features.

A vocoder is the critical component in a TTS pipeline that produces the final waveform not directly from text, but from intermediate acoustic representations. Structures such as mel spectrograms are converted into real audio by the vocoder. The quality of this stage is decisive for whether the synthesized speech sounds natural. Modern neural vocoders were one of the key breakthroughs toward human-like speech quality.