DeepSeek-Coder-V2 16B / 236B: MoE Code Model + Multi-File Context
DeepSeek-Coder-V2 (DeepSeek 2024) — MoE arch (16B / 236B), one of strongest open code LLMs with Apache 2.0. 338 programming languages, 128K context, multi-file repo understanding. 16B (2.4B active) QLoRA possible on RTX 4090; 236B cloud only.
Şükrü Yusuf KAYA
24 min read
Advanced1. DeepSeek-Coder-V2 Specs#
| Model | Total | Active | Context | HumanEval | Lisans |
|---|---|---|---|---|---|
| DeepSeek-Coder-V2-Lite 16B | 16B | 2.4B | 128K | 90.2% | Apache 2.0 |
| DeepSeek-Coder-V2 236B | 236B | 21B | 128K | 96.3% | Apache 2.0 |
| DeepSeek-Coder-V2-Lite-Instruct | 16B | 2.4B | 128K | 92.1% | Apache 2.0 |
Lite 16B'in avantajı: Active param 2.4B → 7B-class compute, 16B-class kalite. RTX 4090'da rahat QLoRA.
✅ Teslim
- DeepSeek-Coder-V2-Lite 16B'yi RTX 4090'da load et. 2) HumanEval bench. 3) Sonraki ders: 8.4 — StarCoder 2 + CodeLlama.
Yorumlar & Soru-Cevap
(0)Yorum yazmak için giriş yap.
Yorumlar yükleniyor...
Related Content
Part 0 — Engineering Foundations
Welcome to the Fine-Tuning Cookbook: System, Stage Taxonomy, and the Reproducibility Contract
Start LearningPart 0 — Engineering Foundations
Reproducibility Stack: Seeds, cuDNN Flags, and Deterministic CUDA — End the 'Works on My Machine' Problem
Start LearningPart 0 — Engineering Foundations