# Storage I/O Engineering: The Art of Letting Your Dataset Slow Down Training (and Prevention)

> Source: https://sukruyusufkaya.com/en/learn/fine-tuning-cookbook/ftc-storage-io-engineering-dataset-bottleneck
> Updated: 2026-05-14T14:42:49.909Z
> Category: Fine-Tuning Cookbook (Model-by-Model)
> Module: Part I — Hardware & Memory Engineering
**TLDR:** Dataset bottleneck: GPU is 30% idle waiting for disk. NVMe Gen3/Gen4/Gen5 throughput, dataset format choice (parquet vs arrow vs webdataset), HuggingFace datasets caching, num_workers tuning, prefetch_factor, persistent_workers, pinned memory, FSx vs S3 vs local — recipe to run RTX 4090 + 50K Turkish dataset with 0 idle.

