The Turkish Open-Source LLM Landscape 2026: Trendyol-LLM, Cosmos-Llama, KanarYa, Kumru AI, TÜBİTAK BİLGEM, and T3 AI Baykar
A 2026 snapshot of the Turkish open-source LLM ecosystem: Trendyol-LLM, Cosmos-Llama, KanarYa, Kumru AI, the TÜBİTAK BİLGEM domestic model, and the T3 AI Baykar defense model. Detailed decision guide covering MMLU-TR and TUMLU benchmarks, licensing, tokenization gap, VRAM requirements, self-hosting needs, and which model to pick for which use case.
1. Introduction: Why Is Turkish Open-Source LLM a Sector Matter in 2026?
In 2023, there was not a single production-grade open-source LLM with strong Turkish capability. As of May 2026, six different organizations have shipped production-quality Turkish-capable open-source LLMs. This maturation has changed a key variable in Turkish enterprise AI strategy: companies no longer ask "do I have to use OpenAI?" — they now ask, "in which scenario is the domestic open-source enough?"
- Turkish Open-Source LLM
- A large language model specifically trained or continually pre-trained for Turkish comprehension, generation, translation, and task-following, whose weights are publicly accessible via Hugging Face or similar channels and whose license permits self-hosting and commercial use.
- Also known as: Domestic LLM, Turkish Foundation Model
- Wikidata: Q115305900
This article consolidates every important Turkish open-source LLM initiative as of May 2026 into a single reference, presenting each model's technical characteristics, benchmark performance, license constraints, self-hosting requirements, and when to choose which model for which use case.
Why Open-Source (Self-Host)?
Turkish enterprises gravitate to open-source Turkish LLMs for four reasons:
- KVKK and data residency. Sending personal-data-containing prompts to foreign APIs always creates regulatory risk, especially in finance, healthcare, and the public sector.
- BDDK + defense constraints. Banking and defense sectors have data that cannot be sent to foreign cloud services.
- Cost control. At high-volume usage (100M+ tokens/day), self-hosting cost falls below API cost.
- Turkish-specific fine-tuning. Domain-specific (legal, medical, e-commerce) fine-tuning requires an open base model.
2. The Anatomy of the Turkish LLM Ecosystem: 6 Players
The Turkish open-source LLM ecosystem in 2026 is shaped by six main groups, each with different technical philosophy, target audience, and licensing approach.
2.1. Trendyol-LLM (Practical E-commerce Choice)
Built by the Trendyol Group AI Lab, Trendyol-LLM has been the most active player in Turkish open-source LLM since early 2024. The family now includes 7B base + 7B chat + 7B base v2 + 7B chat v3 + 70B-base + 70B-Cybersecurity-v3, plus other variants — more than 8 total.
Technical foundation. Llama 2 7B (v1-v2) and Llama 3.1 / Llama 3.3 70B (v3) with Turkish continual pre-training + SFT + DPO. v3 releases shipped in late 2025; particularly strong in e-commerce dialogue, customer service, and product description.
Cybersecurity variant. 70B-Cybersecurity-v3 was fine-tuned on Turkish security logs, SOC tickets, and CTI reports. It is the only Turkish open-source LLM trained with the MITRE ATT&CK + Turkish TTP mapping dataset — the 2026 default for SOC automation.
License. Llama 3.1/3.3 community license — commercial use allowed, Meta's 700M MAU rule applies.
2.2. Cosmos-Llama (Academic DPO Pipeline)
Cosmos-Llama is the Turkish-optimized Llama 3 derivative released in late 2024 under the Cosmos AI umbrella. In early 2026, the Cosmos-1 architecture (a custom architectural approach) was announced: Cosmos-1 is Llama 3.1 70B's Turkish-optimized continuation + custom DPO pipeline.
Technical highlight. The Cosmos pipeline is unique in using a curated 40K+ DPO pair set for Turkish — substantially improving Turkish politeness, cultural reference handling, and "natural-sounding Turkish" output.
Academic benchmark. Leader in the 7B category on TUMLU; particularly strong in "Turkish History, Literature, Social Sciences" subsets, beating other 7Bs by 12-18%.
License. Llama community + CC-BY-SA (for the custom dataset).
2.3. KanarYa (BOUN NLP — Academic Foundation)
KanarYa is the first large-scale Turkish LLM initiative, developed by Boğaziçi University's NLP Group. KanarYa-2b (GPT-J 6B fork with Turkish continuation training) launched in 2023; KanarYa-7B followed in 2025 and KanarYa-Mistral-7B-tr in 2026.
Technical highlight. A custom BPE tokenizer for Turkish (50K vocab, 85% Turkish morphemes) significantly improves tokenization efficiency — a paragraph that costs 450 tokens in Llama tokenizer costs ~320 in KanarYa (30% savings).
Use case. Academic research, NLP education, base model for Turkish corpus-specific fine-tuning. Not as production-polished as Trendyol or Cosmos, but the most open and best-documented model for research.
License. Apache 2.0 — the most open license (commercial use, fine-tune, redistribute all free).
2.4. Kumru AI (VNGRS — Consumer GPU Target)
Released in early 2025 by VNGRS, Kumru AI-7.4B is the "consumer-friendly" player in Turkish open-source LLM. With 4-bit quantization, it runs on an 8GB-VRAM GPU (RTX 4060, M2 Mac) — the only Turkish model in this size class.
Technical highlight. Built on Mistral 7B architecture; optimized for zero-shot Turkish task performance — works on instruction following, code generation, and summarization without fine-tuning.
Use case. Local deployment, edge devices, Turkish agent prototypes, lightweight AI for SMB on-premise.
License. Apache 2.0.
2.5. TÜBİTAK BİLGEM Domestic Model Initiative
In late 2024, TÜBİTAK BİLGEM announced its Sensitive Data AI (HASA) project — a Turkish LLM designed for state institutions. As of 2026, bilgem-tr-llm-13b and bilgem-tr-llm-70b are offered to state institutions via on-prem deployment; a limited public release is planned.
Technical highlight. Pre-trained from scratch on TÜBİTAK ULAKBİM's Turkey-hosted GPU cluster; certified for EU GDPR + Turkish KVKK compliance. Enriched with Turkish legal texts, legislation, and defense terminology.
Use case. Public institutions, defense industry integration, national security projects.
License. Custom government license — only for Turkish state institutions + approved defense industry firms.
2.6. T3 AI Baykar + T3 Foundation Partnership
Announced in late 2025 by Baykar Technologies and the T3 Foundation, T3 AI targets the defense industry LLM ecosystem. The first models announced: t3-ai-defence-7b (general defense terminology) and t3-ai-uav-tactical-13b (unmanned aerial vehicle tactical dialogue).
Technical highlight. Llama 3.1 8B / 13B derivatives; fine-tuned on MITRE ATT&CK, NATO standards, Turkish Armed Forces terminology. Additional multimodal vision (image + text) training for defense drone telemetry.
Use case. Defense industry integrators, military training simulations, tactical decision support.
License. ITAR/EAR compatible custom license; only for Turkish defense firms and approved allied-country integrators.
3. Comparison Table: 2026 Turkish Open-Source LLM Landscape
| Model | Size | License | TUMLU | MMLU-TR | VRAM (FP16) | Target Use |
|---|---|---|---|---|---|---|
| Trendyol-LLM-7B-v3 | 7B | Llama 3.1 | 48.2 | 52.1 | 16 GB | E-commerce, customer service |
| Trendyol-LLM-70B-v3 | 70B | Llama 3.3 | 68.4 | 71.8 | 140 GB | High-quality enterprise |
| Trendyol-70B-Cybersecurity-v3 | 70B | Llama 3.3 | 65.1 | 70.2 | 140 GB | SOC, CTI, security |
| Cosmos-Llama-1-70B | 70B | Llama community | 66.7 | 69.4 | 140 GB | Academic, content |
| KanarYa-Mistral-7B-tr | 7B | Apache 2.0 | 42.8 | 47.6 | 14 GB | Research, fine-tune base |
| Kumru AI-7.4B | 7.4B | Apache 2.0 | 44.3 | 48.9 | 15 GB (4-bit: 4.5 GB) | Edge, SMB, agent |
| bilgem-tr-llm-13b | 13B | TÜBİTAK custom | 58.6 | 61.4 | 26 GB | Public sector, defense |
| t3-ai-defence-7b | 7B | ITAR custom | 51.2 | 55.0 | 16 GB | Defense industry |
Interpretation. Trendyol-LLM-70B-v3 leads the 70B class; Trendyol-7B-v3 and Cosmos-Llama compete in the 7B class. KanarYa is the most open (Apache 2.0) but scores lower. Kumru leads in edge scenarios. TÜBİTAK and T3 do not publish public benchmarks (state/defense constraint).
3.1. Tokenization: The Hidden Cost Dimension of Turkish LLMs
Turkish, being agglutinative, is represented with on average 1.7x more tokens than English in Llama-3 tokenizer for the same content.
Example. "Türkiye Cumhuriyeti'nin başkenti Ankara'dır." (Turkey's capital is Ankara.):
- Llama 3 tokenizer: 21 tokens
- GPT-4 tokenizer (cl100k_base): 22 tokens
- KanarYa Turkish BPE: 13 tokens
Two effects:
- Cost. With API, the same content uses 70% more tokens = 70% higher cost.
- Context window. A 128K context window model carries ~75K Turkish words vs ~95K English words.
3.2. License Complexity: Apache 2.0 vs Llama Community vs Custom
The most-confused topic in Turkish open-source LLM use is licensing:
- Apache 2.0 (KanarYa, Kumru): Full freedom, commercial + redistribute + fine-tune free. Safest for enterprise AI.
- Llama 3.1/3.3 Community License (Trendyol, Cosmos): Commercial allowed but above 700M MAU you need Meta permission; using model output to train another model is also prohibited.
- TÜBİTAK Custom Government License: Only for state institutions + approved contractors.
- T3 ITAR/EAR Compatible License: Turkish defense firms + NATO ally approved integrators.
3.3. OpenLLM-TR Leaderboard: Standardized Scores
The OpenLLM-TR Leaderboard on Hugging Face publishes Turkish LLM evaluation. As of May 2026, the aggregate score is the average across TUMLU + MMLU-TR + ARC-TR + HellaSwag-TR + Belebele-TR.
May 2026 Top-5 (7B/8B class):
- Trendyol-LLM-7B-v3: 51.4
- Cosmos-Llama-7B-v2: 50.8
- Kumru AI-7.4B: 47.1
- KanarYa-Mistral-7B-tr: 45.6
- Llama-3.1-8B-Instruct (vanilla): 41.8
May 2026 Top-3 (70B class):
- Trendyol-LLM-70B-v3: 69.7
- Cosmos-Llama-1-70B: 68.0
- Llama-3.3-70B-Instruct (vanilla): 64.2
4. Practical Setup: Which Model for Which Use Case?
| Use Case | Recommendation | Reason |
|---|---|---|
| E-commerce customer service | Trendyol-LLM-7B-v3 | Domain match + 16GB VRAM sufficient |
| SOC automation, CTI reporting | Trendyol-70B-Cybersecurity-v3 | Only Turkish open-source security fine-tune |
| Academic / legal documents | Cosmos-Llama-1-70B | High TUMLU + DPO politeness |
| SMB chatbot, local deploy | Kumru AI-7.4B | 4-bit quantize → 4.5GB VRAM |
| Turkish NLP research | KanarYa-Mistral-7B-tr | Apache 2.0 + Turkish tokenizer |
| Public institution, sensitive data | TÜBİTAK BİLGEM-13B | State-certified + on-prem |
| Defense industry | T3 AI Defence-7B | ITAR-compatible + military terminology |
| High-quality enterprise RAG | Trendyol-LLM-70B-v3 | Highest Turkish benchmark + commercial open |
4.1. Self-Host Setup: vLLM + Trendyol-LLM-7B-v3 (Most Common Scenario)
7B Turkish model + vLLM + single GPU is the most common production setup among Turkish mid-sized companies. Typical deployment:
huggingface-cli download Trendyol/Trendyol-LLM-7B-chat-v3.0 \
--local-dir /opt/models/trendyol-7b-v3
docker run --gpus all -p 8000:8000 \
-v /opt/models/trendyol-7b-v3:/model \
vllm/vllm-openai:latest \
--model /model \
--dtype bfloat16 \
--max-model-len 8192 \
--gpu-memory-utilization 0.90
Hardware required. Single A10 (24GB) or L4 (24GB) is sufficient. RTX 4090 24GB works for dev/POC. Throughput: ~80-120 tokens/s single request, ~600 tokens/s aggregate at batch 8.
4.2. 70B Scenario: Trendyol-LLM-70B-v3 + 4xH100 or 2xH200
For 70B class self-hosting, minimum is 4xH100 (4x80GB) or 2xH200 (2x141GB). With AWQ 4-bit quantization (with ~2-3% quality drop), 35GB VRAM is enough.
docker run --gpus all -p 8000:8000 \
-v /opt/models/trendyol-70b-awq:/model \
vllm/vllm-openai:latest \
--model /model \
--quantization awq \
--tensor-parallel-size 2 \
--max-model-len 16384
Throughput. 2xH200 + AWQ → ~50 tokens/s single request, ~300 tokens/s aggregate at batch 16. Sufficient for typical enterprise customer service RAG.
5. Performance / Benchmark Comparison
5.1. TUMLU (Turkish MMLU) Detail
TUMLU is a 57-subject academic benchmark with 14K+ multiple choice questions; the de-facto standard for Turkish LLM evaluation.
Domain performance (Trendyol-LLM-70B-v3 example):
- Turkish History: 78.2
- Turkish Literature: 71.6
- Law: 62.3
- Mathematics: 51.8
- Medicine (general): 64.1
- Engineering: 69.4
- Social Sciences: 76.5
- Computer Science: 73.2
Observation. Turkish LLMs are strongest in cultural/social domains and weakest in STEM (especially mathematics) — a result of corpus imbalance. For STEM use cases, GPT-5 / Claude Opus 4.7 API is safer.
6. Turkish-Specific Angle: KVKK, BDDK, and AI Sovereignty
The new dimension in 2026 is AI sovereignty — important at three levels.
6.1. KVKK Angle
When foreign API calls (OpenAI, Anthropic) include personal data in prompts (name, national ID, health, financial), KVKK Article 9 triggers a cross-border data transfer. This requires explicit consent or adequacy decision. Self-host Turkish LLMs eliminate this risk entirely.
6.2. BDDK Angle
In 2024, BDDK published "Banking AI and Machine Learning Management Communiqué," requiring banks to ensure their AI models have: (1) explainability, (2) data residency in Turkey or in adequate jurisdictions, (3) documented third-party dependencies. Within this framework, OpenAI API use is not directly prohibited but compliance burden is very high; self-host models like Trendyol-LLM-70B or BİLGEM-13B significantly reduce this burden.
6.3. Defense Industry (ITAR / EAR / Turkish Law)
Technical data in defense (tactical info, weapon system specs, operational planning) cannot be sent to foreign cloud services. T3 AI and BİLGEM models are strategically positioned to fill this gap.
7. Case Studies: Turkish Open-Source LLMs in Production
Case 1 — Turkish E-commerce Company: Trendyol-LLM-7B-v3 Customer Assistant
Company. One of Turkey's top-10 e-commerce platforms (anonymized, not Trendyol itself).
Problem. OpenAI GPT-4 API spend reached $48,000/month; 12M tokens/day, 85% customer service chat. KVKK compliance burden added ~$80,000/year in audit + consulting cost.
Solution. Trendyol-LLM-7B-v3 deployed on a 4xL4 (4x24GB) GPU cluster; vLLM + Redis cache + Langfuse observability. Tier-1 chats (order tracking, returns, product info) routed to open-source; tier-2 complex complaints fallback to GPT-5.
Result. Monthly AI spend $48K → $11K (cloud GPU + partial API). CSAT 7.2 → 7.4 (Turkish naturalness improvement). KVKK audit burden reduced 60%. ROI period (setup + team): 4 months.
Case 2 — Turkish Bank: Cosmos-Llama-1-70B + BDDK-Compliant RAG
Company. Top-5 Turkish private bank (anonymized).
Problem. Internal training chatbot + dealer support system requires BDDK-compliant LLM. OpenAI API use raises BDDK audit concerns; a fully domestic + Turkish-natural-output model is required.
Solution. Cosmos-Llama-1-70B + Qdrant + Turkish BGE-M3 embeddings. Full stack on 8xH100 cluster in the bank's Ankara DC. Prompt + response audit logs retained for 7 years; anonymization layer masks PII.
Result. 18,000 dealers + 28,000 internal users. Dealer support response time 4 hours → 12 minutes. BDDK audit "AI compliance" item received full score. Total investment $850K (hardware + integration); ROI positive within 24 months.
Case 3 — Healthcare Group: Kumru AI Edge Deploy + KVKK
Company. A group with 14 hospitals + 23 outpatient clinics (anonymized).
Problem. Doctors needed a system to automatically summarize patient consultation notes and send structured records to HBYS. Patient data must never leave hospital boundaries (KVKK + Health Ministry Regulation).
Solution. Each hospital received an RTX 4090 24GB workstation + Kumru AI-7.4B (4-bit, 4.5GB VRAM). Doctor's desktop app handles voice → text → summary → HBYS flow locally.
Result. Patient data does not leave the hospital network. Doctor's daily note-taking time 90 min → 25 min. Per-hospital setup cost ~$8K. Service rolled out to 14 locations in 8 months.
8. Risks and Cost
8.1. License Risks (Llama Community)
Trendyol-LLM and Cosmos-Llama are built on Llama 3.1/3.3 community license, so the Meta 700M MAU rule applies. No Turkish organization exceeds this today, but:
- Using model output to train another model (distillation) is prohibited.
- Use against Meta's Acceptable Use Policy (weapons, discrimination, etc.) is prohibited.
- License file must be redistributed with the model.
Apache 2.0 (KanarYa, Kumru) is exempt from these constraints, but the models' technical capability is more limited.
8.2. Continuity Risk (Maintainer Dependency)
Most Turkish open-source LLMs are maintained by small teams or a single company. Pivots, team dispersal, or strategic shifts can stop maintenance. Mitigation: back up the weights + tokenizer + dataset locally for any critical-system model.
9. Frequently Asked Questions
10. Next Steps
To leverage the Turkish open-source LLM ecosystem, three concrete steps:
- Use-case + token volume analysis. Log LLM usage for 1 month — extract token volume, prompt type distribution, KVKK risk profile. This grounds the "self-host vs API" decision.
- POC setup. Run a 4-6 week POC on Trendyol-LLM-7B-v3 or Cosmos-Llama-7B; single L4 GPU + vLLM is enough.
- Production architecture workshop. Designing the hybrid (API + self-host) strategy, KVKK + BDDK compliance, observability, and eval harness — structured workshop with a 12-week production roadmap as output.
Reach out via the contact form on the site.
References
- Trendyol-LLM-7B-chat-v3.0 Model Card — Trendyol AI Lab, Hugging Face ·
- Trendyol-LLM-70B-Cybersecurity-v3 — Trendyol AI Lab, Hugging Face ·
- Cosmos-LLaMa Turkish Language Model — YTU CE Cosmos, Hugging Face ·
- KanarYa: A Turkish Language Model — Boğaziçi University NLP Group, Hugging Face ·
- Kumru: Turkish LLM by VNGRS — VNGRS AI, Hugging Face ·
- TUMLU: Turkish Massive Multitask Language Understanding — Bayrak et al., arXiv ·
- OpenLLM-TR Leaderboard — OpenLLM-TR Community, Hugging Face Spaces ·
- Llama 3.1 Community License — Meta, Meta AI ·
- vLLM Documentation — vLLM Project, vLLM ·
- BDDK — Banking AI Management Communiqué — BDDK, BDDK ·
- KVKK — Law No. 6698 — Republic of Turkiye - KVKK, Republic of Turkiye ·
- TÜBİTAK BİLGEM AI Institute — TÜBİTAK BİLGEM, TÜBİTAK ·
- T3 Foundation — T3 Foundation, T3 Vakfı ·
- Baykar Technologies — Baykar, Baykar ·
- AWQ: Activation-aware Weight Quantization — Lin et al., arXiv ·
- Hugging Face Transformers — Hugging Face, Hugging Face ·
- Turkish BPE Tokenization — Toraman et al., ACL ·
- DPO: Direct Preference Optimization — Rafailov et al., NeurIPS ·
- Belebele: Multilingual Reading Comprehension — Bandarkar et al., arXiv ·
- ARC: AI2 Reasoning Challenge — Clark et al., AI2 ·
- Turkish Health Data Regulation — Turkish Ministry of Health, Official Gazette ·
- NVIDIA H100/H200/B200 — NVIDIA, NVIDIA ·
- MITRE ATT&CK Framework — MITRE, MITRE ·
- Turkish Defense Industry Presidency (SSB) — SSB, SSB ·
- LoRA: Low-Rank Adaptation — Hu et al., arXiv ·
This is a living document; the Turkish open-source LLM ecosystem shifts every quarter, so it is updated quarterly.
Consulting Pathways
Consulting pages closest to this article
For the most logical next step after this article, you can review the most relevant solution, role, and industry landing pages here.
Enterprise RAG Systems Development
Production-grade RAG systems that provide grounded, secure and auditable access to internal knowledge.
AI Evaluation, Guardrails and Observability
A comprehensive evaluation layer to measure, observe and control AI accuracy, safety and performance.
RAG and Compliance Assistants for Banking
Banking-focused AI systems that provide secure, grounded and auditable access to regulations, policies, procedures and internal knowledge.