Technical GlossarySpeech, Voice and Audio AI

Zero-Shot TTS

In One Line

An advanced TTS approach that can synthesize a new speaker’s voice from short reference samples without additional speaker-specific training.

Zero-shot TTS is an advanced synthesis paradigm that makes personalized voice generation extremely flexible. It can infer a new speaker’s characteristics from short reference samples without training a separate model. While this is powerful for accessibility and creative production, it also introduces serious responsibility around identity and consent. It is one of the areas where technical progress and ethical design must be considered together.