Skip to content
Technical GlossarySpeech, Voice and Audio AI

Mel Spectrogram

A time-frequency representation that maps audio into a frequency scale closer to human auditory perception.

The mel spectrogram is one of the most widely used intermediate representations in modern speech and audio AI systems. It maps raw waveforms into a more learnable and acoustically meaningful surface. A wide range of tasks, including ASR, TTS, emotion analysis, and audio classification, are built on top of this representation. It offers a practical balance between temporal and frequency information.