Technical GlossarySpeech, Voice and Audio AI
Mel Spectrogram
A time-frequency representation that maps audio into a frequency scale closer to human auditory perception.
The mel spectrogram is one of the most widely used intermediate representations in modern speech and audio AI systems. It maps raw waveforms into a more learnable and acoustically meaningful surface. A wide range of tasks, including ASR, TTS, emotion analysis, and audio classification, are built on top of this representation. It offers a practical balance between temporal and frequency information.
You Might Also Like
Explore these concepts to continue your artificial intelligence journey.
