# Grounding FT: Bounding-Box Token Format + RefCOCO-Style Task > Source: https://sukruyusufkaya.com/en/learn/fine-tuning-cookbook/ftc-grounding-fine-tuning-bbox > Updated: 2026-06-26T05:13:17.897Z > Category: Fine-Tuning Cookbook (Model-by-Model) > Module: Part VI — Vision-Language Multimodal FT **TLDR:** VLM's 'pointing' capability: 'point to the dog' → [0.32, 0.45, 0.58, 0.71]. Bbox token format: x1,y1,x2,y2 or normalized 0-1000 coordinates. RefCOCO dataset, grounding evaluation (IoU), Qwen 2.5-VL's native grounding support.