Skip to content

Capstone Module 12: Plan Your Own LLM Training Compute Budget — Chinchilla-Aware Calculator

Module 12 capstone: plan your own LLM training budget. Target model size (1B-70B), available compute (single GPU / cluster), available data — Chinchilla-aware optimal allocation calculation. Cost estimator ($/training), time estimator, quality projection.

Şükrü Yusuf KAYA
65 min read
Advanced
Capstone Modül 12: Kendi LLM Training Compute Budget'ını Planla — Chinchilla-Aware Calculator
💰 Capstone — kendi LLM training planını yap
2 ders boyunca scaling laws'i öğrendik. Şimdi uygula: kendi LLM training projeni planla. Hedef model size, compute available, data available — bunlardan optimum allocation hesapla. Cost ($), time, expected quality. Production decision'ları (Llama-3 overtraining vs Chinchilla-optimal) tartışmalı. 65 dakika sonra: real-world LLM training planlamaya hazır olacaksın.
python
# Chinchilla-aware LLM training budget planner
import math
 
def chinchilla_optimal(target_params=None, target_tokens=None, target_compute=None):
"""Given any one, compute the other two assuming Chinchilla-optimal."""
# Approximate Chinchilla recipe: D ≈ 20 × N (tokens to params)
# Compute: C ≈ 6 × N × D ≈ 120 × N²
if target_params is not None:
N = target_params
D = 20 * N
C = 6 * N * D
elif target_tokens is not None:
D = target_tokens
N = D / 20
C = 6 * N * D
elif target_compute is not None:
C = target_compute
N = math.sqrt(C / (6 * 20))
D = 20 * N
else:
raise ValueError("Need one input")
return {"params": N, "tokens": D, "compute_flops": C}
 
 
def estimate_cost(compute_flops, gpu='H100', price_per_hour=4.0):
"""Estimate training cost in USD."""
# H100 BF16 throughput: ~989 TFLOPS theoretical, ~50% utilization
if gpu == 'H100':
throughput = 989e12 * 0.5
elif gpu == 'A100':
throughput = 312e12 * 0.4
else:
raise ValueError(f"Unknown GPU: {gpu}")
gpu_seconds = compute_flops / throughput
gpu_hours = gpu_seconds / 3600
cost = gpu_hours * price_per_hour
return {"gpu_hours": gpu_hours, "cost_usd": cost, "gpu_days": gpu_hours / 24}
 
 
def plan_training(target_params, num_gpus, gpu='H100'):
"""Full training plan."""
plan = chinchilla_optimal(target_params=target_params)
cost = estimate_cost(plan["compute_flops"], gpu)
# Wall-clock time (parallel GPUs)
wall_hours = cost["gpu_hours"] / num_gpus
wall_days = wall_hours / 24
print(f"=== Training Plan: {target_params/1e9:.1f}B params ===")
print(f"Optimal tokens: {plan['tokens']/1e12:.1f}T")
print(f"Compute: {plan['compute_flops']:.2e} FLOPs")
print(f"GPU-hours ({gpu}): {cost['gpu_hours']:,.0f}")
print(f"GPU-days: {cost['gpu_days']:.1f}")
print(f"Cost: ${cost['cost_usd']:,.0f}")
print(f"Wall-clock: {wall_days:.1f} days on {num_gpus} {gpu}s")
 
 
# Examples
print("\n--- 1B model (educational) ---")
plan_training(1e9, num_gpus=4)
 
print("\n--- 7B model (production-grade) ---")
plan_training(7e9, num_gpus=32)
 
print("\n--- 70B model (industry frontier) ---")
plan_training(70e9, num_gpus=1024)
 
print("\n--- 405B model (Llama-3) ---")
plan_training(405e9, num_gpus=8192)
Compute budget planner — Chinchilla-aware
🎉 Modül 12 Tamamlandı — Scaling Laws
3 ders boyunca: Kaplan 2020 (power law foundation), Chinchilla 2022 (1:1 param
devrim), capstone budget planner. Modern LLM training economics — Chinchilla-optimal vs Llama-3 overtraining strategy. Modül 12 envanteri: 3 ders, 200 dk. Genel müfredat: 13 modül, 77 ders, ~71 saat. Sıradaki: Modül 13 — Distributed Training (DDP/FSDP/ZeRO derinleşmesi).

Modül 12 Envanteri (Tamamlandı)#

#DersSüre
12.1Kaplan Scaling Laws (2020)65 dk
12.2Chinchilla Scaling Laws (Hoffmann 2022)70 dk
12.3Capstone — Compute Budget Planner65 dk
Toplam3 ders200 dk (~3.3 saat)

Frequently Asked Questions

1B Turkish model + 5GB corpus (~1B tokens): ~$1000 compute (single H100, 1 week). 7B model + 30GB corpus (~6B tokens): ~$30K compute (8 H100, 1 month). Chinchilla undertrained for Turkish — corpus limited.

Yorumlar & Soru-Cevap

(0)
Yorum yazmak için giriş yap.
Yorumlar yükleniyor...

Related Content