Capstone Module 12: Plan Your Own LLM Training Compute Budget — Chinchilla-Aware Calculator
Module 12 capstone: plan your own LLM training budget. Target model size (1B-70B), available compute (single GPU / cluster), available data — Chinchilla-aware optimal allocation calculation. Cost estimator ($/training), time estimator, quality projection.
Şükrü Yusuf KAYA
65 min read
Advanced💰 Capstone — kendi LLM training planını yap
2 ders boyunca scaling laws'i öğrendik. Şimdi uygula: kendi LLM training projeni planla. Hedef model size, compute available, data available — bunlardan optimum allocation hesapla. Cost ($), time, expected quality. Production decision'ları (Llama-3 overtraining vs Chinchilla-optimal) tartışmalı. 65 dakika sonra: real-world LLM training planlamaya hazır olacaksın.
python
# Chinchilla-aware LLM training budget plannerimport math def chinchilla_optimal(target_params=None, target_tokens=None, target_compute=None): """Given any one, compute the other two assuming Chinchilla-optimal.""" # Approximate Chinchilla recipe: D ≈ 20 × N (tokens to params) # Compute: C ≈ 6 × N × D ≈ 120 × N² if target_params is not None: N = target_params D = 20 * N C = 6 * N * D elif target_tokens is not None: D = target_tokens N = D / 20 C = 6 * N * D elif target_compute is not None: C = target_compute N = math.sqrt(C / (6 * 20)) D = 20 * N else: raise ValueError("Need one input") return {"params": N, "tokens": D, "compute_flops": C} def estimate_cost(compute_flops, gpu='H100', price_per_hour=4.0): """Estimate training cost in USD.""" # H100 BF16 throughput: ~989 TFLOPS theoretical, ~50% utilization if gpu == 'H100': throughput = 989e12 * 0.5 elif gpu == 'A100': throughput = 312e12 * 0.4 else: raise ValueError(f"Unknown GPU: {gpu}") gpu_seconds = compute_flops / throughput gpu_hours = gpu_seconds / 3600 cost = gpu_hours * price_per_hour return {"gpu_hours": gpu_hours, "cost_usd": cost, "gpu_days": gpu_hours / 24} def plan_training(target_params, num_gpus, gpu='H100'): """Full training plan.""" plan = chinchilla_optimal(target_params=target_params) cost = estimate_cost(plan["compute_flops"], gpu) # Wall-clock time (parallel GPUs) wall_hours = cost["gpu_hours"] / num_gpus wall_days = wall_hours / 24 print(f"=== Training Plan: {target_params/1e9:.1f}B params ===") print(f"Optimal tokens: {plan['tokens']/1e12:.1f}T") print(f"Compute: {plan['compute_flops']:.2e} FLOPs") print(f"GPU-hours ({gpu}): {cost['gpu_hours']:,.0f}") print(f"GPU-days: {cost['gpu_days']:.1f}") print(f"Cost: ${cost['cost_usd']:,.0f}") print(f"Wall-clock: {wall_days:.1f} days on {num_gpus} {gpu}s") # Examplesprint("\n--- 1B model (educational) ---")plan_training(1e9, num_gpus=4) print("\n--- 7B model (production-grade) ---")plan_training(7e9, num_gpus=32) print("\n--- 70B model (industry frontier) ---")plan_training(70e9, num_gpus=1024) print("\n--- 405B model (Llama-3) ---")plan_training(405e9, num_gpus=8192)Compute budget planner — Chinchilla-aware
🎉 Modül 12 Tamamlandı — Scaling Laws
3 ders boyunca: Kaplan 2020 (power law foundation), Chinchilla 2022 (1:1 param
devrim), capstone budget planner. Modern LLM training economics — Chinchilla-optimal vs Llama-3 overtraining strategy. Modül 12 envanteri: 3 ders, 200 dk. Genel müfredat: 13 modül, 77 ders, ~71 saat. Sıradaki: Modül 13 — Distributed Training (DDP/FSDP/ZeRO derinleşmesi).Modül 12 Envanteri (Tamamlandı)#
| # | Ders | Süre |
|---|---|---|
| 12.1 | Kaplan Scaling Laws (2020) | 65 dk |
| 12.2 | Chinchilla Scaling Laws (Hoffmann 2022) | 70 dk |
| 12.3 | Capstone — Compute Budget Planner | 65 dk |
| Toplam | 3 ders | 200 dk (~3.3 saat) |
Frequently Asked Questions
1B Turkish model + 5GB corpus (~1B tokens): ~$1000 compute (single H100, 1 week). 7B model + 30GB corpus (~6B tokens): ~$30K compute (8 H100, 1 month). Chinchilla undertrained for Turkish — corpus limited.
Yorumlar & Soru-Cevap
(0)Yorum yazmak için giriş yap.
Yorumlar yükleniyor...
Related Content
Module 0: Course Framework & Workshop Setup
Who Is an LLM Engineer? The AI Engineering Career Ladder from Junior to Staff
Start LearningModule 0: Course Framework & Workshop Setup
Course Philosophy: Why This Path, Why This Order — The Skeleton of an 8-Month Curriculum
Start LearningModule 0: Course Framework & Workshop Setup