# Grounding FT: Bounding-Box Token Format + RefCOCO-Style Task

> Source: https://sukruyusufkaya.com/en/learn/fine-tuning-cookbook/ftc-grounding-fine-tuning-bbox
> Updated: 2026-05-14T14:42:54.517Z
> Category: Fine-Tuning Cookbook (Model-by-Model)
> Module: Part VI — Vision-Language Multimodal FT
**TLDR:** VLM's 'pointing' capability: 'point to the dog' → [0.32, 0.45, 0.58, 0.71]. Bbox token format: <bbox>x1,y1,x2,y2</bbox> or normalized 0-1000 coordinates. RefCOCO dataset, grounding evaluation (IoU), Qwen 2.5-VL's native grounding support.

