Skip to content
Technical GlossarySpeech, Voice and Audio AI

Wav2Vec 2.0 Pretraining

A self-supervised approach that learns strong speech representations from unlabeled audio and improves ASR and speech tasks.

Wav2Vec 2.0 pretraining made it possible to learn high-quality speech representations from large volumes of unlabeled audio. This is especially valuable in languages and domains where annotation cost is high. Strong ASR performance can then be achieved with relatively little labeled data during fine-tuning. It is one of the methods that fundamentally changed the self-supervised learning paradigm in speech.