Yi-1.5 / InternLM2.5 / Aya Expanse: Underdog Comparative TR-MMLU
Llama / Qwen / Gemma are popular but not the only options. Yi-1.5 (01.AI), InternLM2.5 (Shanghai AI Lab), Aya Expanse (Cohere) — which shines in TR? Same recipe comparison on RTX 4090.
Şükrü Yusuf KAYA
28 min read
Advanced1. 4-Model Karşılaştırma Tablosu#
| Model | Vocab | Pre-train | TR-MMLU base | Lisans |
|---|---|---|---|---|
| Yi-1.5 6B/9B/34B | 64,000 | 3.6T (CN+EN heavy) | 25.4 / 28.7 / 38.2 | Apache 2.0 |
| InternLM2.5 7B/20B | 92,544 | 2T multilingual | 30.1 / 35.6 | Apache 2.0 |
| Aya Expanse 8B/32B | 256,000 | 200K hours synthetic (101 lang) | 42.3 / 47.1 | CC-BY-NC (research) |
| Llama 3.1 8B (ref) | 128,256 | 15T multilingual | 32.4 | Llama license |
| Qwen 2.5 7B (ref) | 151,936 | 18T multilingual | 38.1 | Apache 2.0 |
Aya Expanse 8B TR-MMLU 42.3 (!) — popüler modellerden iyi. Ama:
- Lisans: CC-BY-NC — commercial use yasak
- Cohere Research License
- Production'da kullanılamaz, sadece research
Karar matrisi:
- Commercial + TR → Qwen 2.5 7B (38.1)
- Research + TR → Aya Expanse 8B (42.3)
- Math/Code → Phi-4 (English) veya Qwen 2.5 Coder
- Edge → SmolLM3 1.7B
2. Aya Expanse — Cohere'in 101-Language Specialist'ı#
Aya Expanse 8B (Cohere, Kasım 2024):
- 256K vocab (Gemma seviyesinde)
- 101 dil pre-train + SFT
- Aya datasetler family (Cohere Aya Initiative — community translations)
- TR specifically high quality (Türkçe data %2.3 — büyük ratio)
Reçete: Aya Expanse 8B + custom TR domain SFT → cookbook'un Part IX'unda detaylı.
✅ Teslim
- 4 modeli aynı 1000 TR Alpaca subset ile FT et. 2) TR-MMLU + MT-Bench-TR ölç, tablo çıkar. 3) Sonraki ders: 3.11 — Comparative Lab: Same Recipe 10 Models.
Yorumlar & Soru-Cevap
(0)Yorum yazmak için giriş yap.
Yorumlar yükleniyor...
Related Content
Part 0 — Engineering Foundations
Welcome to the Fine-Tuning Cookbook: System, Stage Taxonomy, and the Reproducibility Contract
Start LearningPart 0 — Engineering Foundations
Reproducibility Stack: Seeds, cuDNN Flags, and Deterministic CUDA — End the 'Works on My Machine' Problem
Start LearningPart 0 — Engineering Foundations