Technical GlossarySpeech, Voice and Audio AI

RNN-Transducer

In One Line

An end-to-end ASR architecture that provides a strong balance between low latency and accuracy in streaming speech recognition.

RNN-Transducer is a strong end-to-end architecture that has become widely adopted in real-time speech recognition systems. By jointly modeling acoustic time and output sequence structure, it offers more flexible alignment behavior than CTC. It provides important advantages for low-latency assistants, call-center solutions, and on-device speech interfaces. It is one of the core reference designs in modern streaming ASR.