# InternVL2.5 + Idefics3 + Phi-4-Multimodal: Comparative Architecture Tour

> Source: https://sukruyusufkaya.com/en/learn/fine-tuning-cookbook/ftc-internvl-idefics-phi-multimodal
> Updated: 2026-05-14T14:42:54.259Z
> Category: Fine-Tuning Cookbook (Model-by-Model)
> Module: Part VI — Vision-Language Multimodal FT
**TLDR:** Less popular but important VLMs: InternVL2.5 (Shanghai AI Lab, 8B-78B), Idefics3 (HuggingFace), Phi-4-Multimodal (Microsoft, 5.4B vision+text). Architecture + FT pattern comparison. Which shines for niche use-cases (medical/document/scientific).

