# Audio LLM: Qwen2-Audio + Phi-4-Multimodal Audio Branch — Audio Understanding + Reply

> Source: https://sukruyusufkaya.com/en/learn/fine-tuning-cookbook/ftc-audio-llm-qwen2-audio-phi-4-mm
> Updated: 2026-05-14T14:42:55.038Z
> Category: Fine-Tuning Cookbook (Model-by-Model)
> Module: Part VII — Speech & Audio Fine-Tuning
**TLDR:** Audio LLM = beyond Whisper. Not just transcribe, but **understands** audio content and replies. Qwen2-Audio (Alibaba, 7B), Phi-4-Multimodal audio branch. Audio-specific tasks: emotion recognition, music understanding, environmental audio Q&A. Qwen2-Audio FT recipe on RTX 4090.

