# Whisper Architecture: Log-Mel Spectrogram + Encoder-Decoder + Language Tokens

> Source: https://sukruyusufkaya.com/en/learn/fine-tuning-cookbook/ftc-whisper-architecture-log-mel
> Updated: 2026-05-14T14:42:54.688Z
> Category: Fine-Tuning Cookbook (Model-by-Model)
> Module: Part VII — Speech & Audio Fine-Tuning
**TLDR:** Whisper (OpenAI 2022) — speech recognition's gold standard. Anatomy: 80-bin log-mel spectrogram input, 12-32 layer encoder + decoder transformer, BPE tokenizer (50K + multilingual + tasks), language tokens, task tokens, timestamp tokens. Model variants: tiny (39M) → large-v3 (1.5B) → turbo (809M).

