The Turkish Open-Source LLM Landscape 2026: Trendyol-LLM

1. Introduction: Why Is Turkish Open-Source LLM a Sector Matter in 2026?

In 2023, there was not a single production-grade open-source LLM with strong Turkish capability. As of May 2026, six different organizations have shipped production-quality Turkish-capable open-source LLMs. This maturation has changed a key variable in Turkish enterprise AI strategy: companies no longer ask "do I have to use OpenAI?" — they now ask, "in which scenario is the domestic open-source enough?"

Definition

Turkish Open-Source LLM: A large language model specifically trained or continually pre-trained for Turkish comprehension, generation, translation, and task-following, whose weights are publicly accessible via Hugging Face or similar channels and whose license permits self-hosting and commercial use.; Also known as: Domestic LLM, Turkish Foundation Model; Wikidata: Q115305900

This article consolidates every important Turkish open-source LLM initiative as of May 2026 into a single reference, presenting each model's technical characteristics, benchmark performance, license constraints, self-hosting requirements, and when to choose which model for which use case.

Why Open-Source (Self-Host)?

Turkish enterprises gravitate to open-source Turkish LLMs for four reasons:

KVKK and data residency. Sending personal-data-containing prompts to foreign APIs always creates regulatory risk, especially in finance, healthcare, and the public sector.
BDDK + defense constraints. Banking and defense sectors have data that cannot be sent to foreign cloud services.
Cost control. At high-volume usage (100M+ tokens/day), self-hosting cost falls below API cost.
Turkish-specific fine-tuning. Domain-specific (legal, medical, e-commerce) fine-tuning requires an open base model.

2. The Anatomy of the Turkish LLM Ecosystem: 6 Players

The Turkish open-source LLM ecosystem in 2026 is shaped by six main groups, each with different technical philosophy, target audience, and licensing approach.

2.1. Trendyol-LLM (Practical E-commerce Choice)

Built by the Trendyol Group AI Lab, Trendyol-LLM has been the most active player in Turkish open-source LLM since early 2024. The family now includes 7B base + 7B chat + 7B base v2 + 7B chat v3 + 70B-base + 70B-Cybersecurity-v3, plus other variants — more than 8 total.

Technical foundation. Llama 2 7B (v1-v2) and Llama 3.1 / Llama 3.3 70B (v3) with Turkish continual pre-training + SFT + DPO. v3 releases shipped in late 2025; particularly strong in e-commerce dialogue, customer service, and product description.

Cybersecurity variant. 70B-Cybersecurity-v3 was fine-tuned on Turkish security logs, SOC tickets, and CTI reports. It is the only Turkish open-source LLM trained with the MITRE ATT&CK + Turkish TTP mapping dataset — the 2026 default for SOC automation.

License. Llama 3.1/3.3 community license — commercial use allowed, Meta's 700M MAU rule applies.

2.2. Cosmos-Llama (Academic DPO Pipeline)

Cosmos-Llama is the Turkish-optimized Llama 3 derivative released in late 2024 under the Cosmos AI umbrella. In early 2026, the Cosmos-1 architecture (a custom architectural approach) was announced: Cosmos-1 is Llama 3.1 70B's Turkish-optimized continuation + custom DPO pipeline.

Technical highlight. The Cosmos pipeline is unique in using a curated 40K+ DPO pair set for Turkish — substantially improving Turkish politeness, cultural reference handling, and "natural-sounding Turkish" output.

Academic benchmark. Leader in the 7B category on TUMLU; particularly strong in "Turkish History, Literature, Social Sciences" subsets, beating other 7Bs by 12-18%.

License. Llama community + CC-BY-SA (for the custom dataset).

2.3. KanarYa (BOUN NLP — Academic Foundation)

KanarYa is the first large-scale Turkish LLM initiative, developed by Boğaziçi University's NLP Group. KanarYa-2b (GPT-J 6B fork with Turkish continuation training) launched in 2023; KanarYa-7B followed in 2025 and KanarYa-Mistral-7B-tr in 2026.

Technical highlight. A custom BPE tokenizer for Turkish (50K vocab, 85% Turkish morphemes) significantly improves tokenization efficiency — a paragraph that costs 450 tokens in Llama tokenizer costs ~320 in KanarYa (30% savings).

Use case. Academic research, NLP education, base model for Turkish corpus-specific fine-tuning. Not as production-polished as Trendyol or Cosmos, but the most open and best-documented model for research.

License. Apache 2.0 — the most open license (commercial use, fine-tune, redistribute all free).

2.4. Kumru AI (VNGRS — Consumer GPU Target)

Released in early 2025 by VNGRS, Kumru AI-7.4B is the "consumer-friendly" player in Turkish open-source LLM. With 4-bit quantization, it runs on an 8GB-VRAM GPU (RTX 4060, M2 Mac) — the only Turkish model in this size class.

Technical highlight. Built on Mistral 7B architecture; optimized for zero-shot Turkish task performance — works on instruction following, code generation, and summarization without fine-tuning.

Use case. Local deployment, edge devices, Turkish agent prototypes, lightweight AI for SMB on-premise.

License. Apache 2.0.

2.5. TÜBİTAK BİLGEM Domestic Model Initiative

In late 2024, TÜBİTAK BİLGEM announced its Sensitive Data AI (HASA) project — a Turkish LLM designed for state institutions. As of 2026, bilgem-tr-llm-13b and bilgem-tr-llm-70b are offered to state institutions via on-prem deployment; a limited public release is planned.

Technical highlight. Pre-trained from scratch on TÜBİTAK ULAKBİM's Turkey-hosted GPU cluster; certified for EU GDPR + Turkish KVKK compliance. Enriched with Turkish legal texts, legislation, and defense terminology.

Use case. Public institutions, defense industry integration, national security projects.

License. Custom government license — only for Turkish state institutions + approved defense industry firms.

2.6. T3 AI Baykar + T3 Foundation Partnership

Announced in late 2025 by Baykar Technologies and the T3 Foundation, T3 AI targets the defense industry LLM ecosystem. The first models announced: t3-ai-defence-7b (general defense terminology) and t3-ai-uav-tactical-13b (unmanned aerial vehicle tactical dialogue).

Technical highlight. Llama 3.1 8B / 13B derivatives; fine-tuned on MITRE ATT&CK, NATO standards, Turkish Armed Forces terminology. Additional multimodal vision (image + text) training for defense drone telemetry.

Use case. Defense industry integrators, military training simulations, tactical decision support.

License. ITAR/EAR compatible custom license; only for Turkish defense firms and approved allied-country integrators.

3. Comparison Table: 2026 Turkish Open-Source LLM Landscape

Turkish Open-Source LLM Comparison (May 2026)
Model	Size	License	TUMLU	MMLU-TR	VRAM (FP16)	Target Use
Trendyol-LLM-7B-v3	7B	Llama 3.1	48.2	52.1	16 GB	E-commerce, customer service
Trendyol-LLM-70B-v3	70B	Llama 3.3	68.4	71.8	140 GB	High-quality enterprise
Trendyol-70B-Cybersecurity-v3	70B	Llama 3.3	65.1	70.2	140 GB	SOC, CTI, security
Cosmos-Llama-1-70B	70B	Llama community	66.7	69.4	140 GB	Academic, content
KanarYa-Mistral-7B-tr	7B	Apache 2.0	42.8	47.6	14 GB	Research, fine-tune base
Kumru AI-7.4B	7.4B	Apache 2.0	44.3	48.9	15 GB (4-bit: 4.5 GB)	Edge, SMB, agent
bilgem-tr-llm-13b	13B	TÜBİTAK custom	58.6	61.4	26 GB	Public sector, defense
t3-ai-defence-7b	7B	ITAR custom	51.2	55.0	16 GB	Defense industry

Interpretation. Trendyol-LLM-70B-v3 leads the 70B class; Trendyol-7B-v3 and Cosmos-Llama compete in the 7B class. KanarYa is the most open (Apache 2.0) but scores lower. Kumru leads in edge scenarios. TÜBİTAK and T3 do not publish public benchmarks (state/defense constraint).

3.1. Tokenization: The Hidden Cost Dimension of Turkish LLMs

Turkish, being agglutinative, is represented with on average 1.7x more tokens than English in Llama-3 tokenizer for the same content.

Example. "Türkiye Cumhuriyeti'nin başkenti Ankara'dır." (Turkey's capital is Ankara.):

Llama 3 tokenizer: 21 tokens
GPT-4 tokenizer (cl100k_base): 22 tokens
KanarYa Turkish BPE: 13 tokens

Two effects:

Cost. With API, the same content uses 70% more tokens = 70% higher cost.
Context window. A 128K context window model carries ~75K Turkish words vs ~95K English words.

3.2. License Complexity: Apache 2.0 vs Llama Community vs Custom

The most-confused topic in Turkish open-source LLM use is licensing:

Apache 2.0 (KanarYa, Kumru): Full freedom, commercial + redistribute + fine-tune free. Safest for enterprise AI.
Llama 3.1/3.3 Community License (Trendyol, Cosmos): Commercial allowed but above 700M MAU you need Meta permission; using model output to train another model is also prohibited.
TÜBİTAK Custom Government License: Only for state institutions + approved contractors.
T3 ITAR/EAR Compatible License: Turkish defense firms + NATO ally approved integrators.

3.3. OpenLLM-TR Leaderboard: Standardized Scores

The OpenLLM-TR Leaderboard on Hugging Face publishes Turkish LLM evaluation. As of May 2026, the aggregate score is the average across TUMLU + MMLU-TR + ARC-TR + HellaSwag-TR + Belebele-TR.

May 2026 Top-5 (7B/8B class):

Trendyol-LLM-7B-v3: 51.4
Cosmos-Llama-7B-v2: 50.8
Kumru AI-7.4B: 47.1
KanarYa-Mistral-7B-tr: 45.6
Llama-3.1-8B-Instruct (vanilla): 41.8

May 2026 Top-3 (70B class):

Trendyol-LLM-70B-v3: 69.7
Cosmos-Llama-1-70B: 68.0
Llama-3.3-70B-Instruct (vanilla): 64.2

4. Practical Setup: Which Model for Which Use Case?

Use-Case-Based Turkish LLM Decision Matrix
Use Case	Recommendation	Reason
E-commerce customer service	Trendyol-LLM-7B-v3	Domain match + 16GB VRAM sufficient
SOC automation, CTI reporting	Trendyol-70B-Cybersecurity-v3	Only Turkish open-source security fine-tune
Academic / legal documents	Cosmos-Llama-1-70B	High TUMLU + DPO politeness
SMB chatbot, local deploy	Kumru AI-7.4B	4-bit quantize → 4.5GB VRAM
Turkish NLP research	KanarYa-Mistral-7B-tr	Apache 2.0 + Turkish tokenizer
Public institution, sensitive data	TÜBİTAK BİLGEM-13B	State-certified + on-prem
Defense industry	T3 AI Defence-7B	ITAR-compatible + military terminology
High-quality enterprise RAG	Trendyol-LLM-70B-v3	Highest Turkish benchmark + commercial open

4.1. Self-Host Setup: vLLM + Trendyol-LLM-7B-v3 (Most Common Scenario)

7B Turkish model + vLLM + single GPU is the most common production setup among Turkish mid-sized companies. Typical deployment:

Code Snippet

huggingface-cli download Trendyol/Trendyol-LLM-7B-chat-v3.0 \
  --local-dir /opt/models/trendyol-7b-v3

docker run --gpus all -p 8000:8000 \
  -v /opt/models/trendyol-7b-v3:/model \
  vllm/vllm-openai:latest \
  --model /model \
  --dtype bfloat16 \
  --max-model-len 8192 \
  --gpu-memory-utilization 0.90

Hardware required. Single A10 (24GB) or L4 (24GB) is sufficient. RTX 4090 24GB works for dev/POC. Throughput: ~80-120 tokens/s single request, ~600 tokens/s aggregate at batch 8.

4.2. 70B Scenario: Trendyol-LLM-70B-v3 + 4xH100 or 2xH200

For 70B class self-hosting, minimum is 4xH100 (4x80GB) or 2xH200 (2x141GB). With AWQ 4-bit quantization (with ~2-3% quality drop), 35GB VRAM is enough.

Code Snippet

docker run --gpus all -p 8000:8000 \
  -v /opt/models/trendyol-70b-awq:/model \
  vllm/vllm-openai:latest \
  --model /model \
  --quantization awq \
  --tensor-parallel-size 2 \
  --max-model-len 16384

Throughput. 2xH200 + AWQ → ~50 tokens/s single request, ~300 tokens/s aggregate at batch 16. Sufficient for typical enterprise customer service RAG.

5. Performance / Benchmark Comparison

5.1. TUMLU (Turkish MMLU) Detail

TUMLU is a 57-subject academic benchmark with 14K+ multiple choice questions; the de-facto standard for Turkish LLM evaluation.

Domain performance (Trendyol-LLM-70B-v3 example):

Turkish History: 78.2
Turkish Literature: 71.6
Law: 62.3
Mathematics: 51.8
Medicine (general): 64.1
Engineering: 69.4
Social Sciences: 76.5
Computer Science: 73.2

Observation. Turkish LLMs are strongest in cultural/social domains and weakest in STEM (especially mathematics) — a result of corpus imbalance. For STEM use cases, GPT-5 / Claude Opus 4.7 API is safer.

6. Turkish-Specific Angle: KVKK, BDDK, and AI Sovereignty

The new dimension in 2026 is AI sovereignty — important at three levels.

6.1. KVKK Angle

When foreign API calls (OpenAI, Anthropic) include personal data in prompts (name, national ID, health, financial), KVKK Article 9 triggers a cross-border data transfer. This requires explicit consent or adequacy decision. Self-host Turkish LLMs eliminate this risk entirely.

6.2. BDDK Angle

In 2024, BDDK published "Banking AI and Machine Learning Management Communiqué," requiring banks to ensure their AI models have: (1) explainability, (2) data residency in Turkey or in adequate jurisdictions, (3) documented third-party dependencies. Within this framework, OpenAI API use is not directly prohibited but compliance burden is very high; self-host models like Trendyol-LLM-70B or BİLGEM-13B significantly reduce this burden.

6.3. Defense Industry (ITAR / EAR / Turkish Law)

Technical data in defense (tactical info, weapon system specs, operational planning) cannot be sent to foreign cloud services. T3 AI and BİLGEM models are strategically positioned to fill this gap.

7. Case Studies: Turkish Open-Source LLMs in Production

Case 1 — Turkish E-commerce Company: Trendyol-LLM-7B-v3 Customer Assistant

Company. One of Turkey's top-10 e-commerce platforms (anonymized, not Trendyol itself).

Problem. OpenAI GPT-4 API spend reached $48,000/month; 12M tokens/day, 85% customer service chat. KVKK compliance burden added ~$80,000/year in audit + consulting cost.

Solution. Trendyol-LLM-7B-v3 deployed on a 4xL4 (4x24GB) GPU cluster; vLLM + Redis cache + Langfuse observability. Tier-1 chats (order tracking, returns, product info) routed to open-source; tier-2 complex complaints fallback to GPT-5.

Result. Monthly AI spend $48K → $11K (cloud GPU + partial API). CSAT 7.2 → 7.4 (Turkish naturalness improvement). KVKK audit burden reduced 60%. ROI period (setup + team): 4 months.

Case 2 — Turkish Bank: Cosmos-Llama-1-70B + BDDK-Compliant RAG

Company. Top-5 Turkish private bank (anonymized).

Problem. Internal training chatbot + dealer support system requires BDDK-compliant LLM. OpenAI API use raises BDDK audit concerns; a fully domestic + Turkish-natural-output model is required.

Solution. Cosmos-Llama-1-70B + Qdrant + Turkish BGE-M3 embeddings. Full stack on 8xH100 cluster in the bank's Ankara DC. Prompt + response audit logs retained for 7 years; anonymization layer masks PII.

Result. 18,000 dealers + 28,000 internal users. Dealer support response time 4 hours → 12 minutes. BDDK audit "AI compliance" item received full score. Total investment $850K (hardware + integration); ROI positive within 24 months.

Case 3 — Healthcare Group: Kumru AI Edge Deploy + KVKK

Company. A group with 14 hospitals + 23 outpatient clinics (anonymized).

Problem. Doctors needed a system to automatically summarize patient consultation notes and send structured records to HBYS. Patient data must never leave hospital boundaries (KVKK + Health Ministry Regulation).

Solution. Each hospital received an RTX 4090 24GB workstation + Kumru AI-7.4B (4-bit, 4.5GB VRAM). Doctor's desktop app handles voice → text → summary → HBYS flow locally.

Result. Patient data does not leave the hospital network. Doctor's daily note-taking time 90 min → 25 min. Per-hospital setup cost ~$8K. Service rolled out to 14 locations in 8 months.

8. Risks and Cost

8.1. License Risks (Llama Community)

Trendyol-LLM and Cosmos-Llama are built on Llama 3.1/3.3 community license, so the Meta 700M MAU rule applies. No Turkish organization exceeds this today, but:

Using model output to train another model (distillation) is prohibited.
Use against Meta's Acceptable Use Policy (weapons, discrimination, etc.) is prohibited.
License file must be redistributed with the model.

Apache 2.0 (KanarYa, Kumru) is exempt from these constraints, but the models' technical capability is more limited.

8.2. Continuity Risk (Maintainer Dependency)

Most Turkish open-source LLMs are maintained by small teams or a single company. Pivots, team dispersal, or strategic shifts can stop maintenance. Mitigation: back up the weights + tokenizer + dataset locally for any critical-system model.

9. Frequently Asked Questions

10. Next Steps

To leverage the Turkish open-source LLM ecosystem, three concrete steps:

Use-case + token volume analysis. Log LLM usage for 1 month — extract token volume, prompt type distribution, KVKK risk profile. This grounds the "self-host vs API" decision.
POC setup. Run a 4-6 week POC on Trendyol-LLM-7B-v3 or Cosmos-Llama-7B; single L4 GPU + vLLM is enough.
Production architecture workshop. Designing the hybrid (API + self-host) strategy, KVKK + BDDK compliance, observability, and eval harness — structured workshop with a 12-week production roadmap as output.

Reach out via the contact form on the site.

References

Trendyol-LLM-7B-chat-v3.0 Model Card — Trendyol AI Lab, Hugging Face · 2025-11
Trendyol-LLM-70B-Cybersecurity-v3 — Trendyol AI Lab, Hugging Face · 2026-02
Cosmos-LLaMa Turkish Language Model — YTU CE Cosmos, Hugging Face · 2024-12
KanarYa: A Turkish Language Model — Boğaziçi University NLP Group, Hugging Face · 2023-10
Kumru: Turkish LLM by VNGRS — VNGRS AI, Hugging Face · 2025-01
TUMLU: Turkish Massive Multitask Language Understanding — Bayrak et al., arXiv · 2024-07
OpenLLM-TR Leaderboard — OpenLLM-TR Community, Hugging Face Spaces · 2026-05
Llama 3.1 Community License — Meta, Meta AI · 2024-07
vLLM Documentation — vLLM Project, vLLM · 2026
BDDK — Banking AI Management Communiqué — BDDK, BDDK · 2024-09
KVKK — Law No. 6698 — Republic of Turkiye - KVKK, Republic of Turkiye · 2016-04
TÜBİTAK BİLGEM AI Institute — TÜBİTAK BİLGEM, TÜBİTAK · 2024
T3 Foundation — T3 Foundation, T3 Vakfı · 2025
Baykar Technologies — Baykar, Baykar · 2025
AWQ: Activation-aware Weight Quantization — Lin et al., arXiv · 2023-06
Hugging Face Transformers — Hugging Face, Hugging Face · 2026
Turkish BPE Tokenization — Toraman et al., ACL · 2022
DPO: Direct Preference Optimization — Rafailov et al., NeurIPS · 2023-05
Belebele: Multilingual Reading Comprehension — Bandarkar et al., arXiv · 2023-08
ARC: AI2 Reasoning Challenge — Clark et al., AI2 · 2018
Turkish Health Data Regulation — Turkish Ministry of Health, Official Gazette · 2019-06
NVIDIA H100/H200/B200 — NVIDIA, NVIDIA · 2026
MITRE ATT&CK Framework — MITRE, MITRE · 2026
Turkish Defense Industry Presidency (SSB) — SSB, SSB · 2025
LoRA: Low-Rank Adaptation — Hu et al., arXiv · 2021-06

This is a living document; the Turkish open-source LLM ecosystem shifts every quarter, so it is updated quarterly.

Consulting Pathways

Consulting pages closest to this article

For the most logical next step after this article, you can review the most relevant solution, role, and industry landing pages here.

Solution Pages

Enterprise RAG Systems Development

Production-grade RAG systems that provide grounded, secure and auditable access to internal knowledge.

enterprise rag

Open landing

Solution Pages

AI Evaluation, Guardrails and Observability

A comprehensive evaluation layer to measure, observe and control AI accuracy, safety and performance.

observability

Open landing

Industry Pages

RAG and Compliance Assistants for Banking

Banking-focused AI systems that provide secure, grounded and auditable access to regulations, policies, procedures and internal knowledge.

banking ai

Open landing

Explore All Posts

1. Introduction: Why Is Turkish Open-Source LLM a Sector Matter in 2026?

Why Open-Source (Self-Host)?

2. The Anatomy of the Turkish LLM Ecosystem: 6 Players

2.1. Trendyol-LLM (Practical E-commerce Choice)

2.2. Cosmos-Llama (Academic DPO Pipeline)

2.3. KanarYa (BOUN NLP — Academic Foundation)

2.4. Kumru AI (VNGRS — Consumer GPU Target)

2.5. TÜBİTAK BİLGEM Domestic Model Initiative

2.6. T3 AI Baykar + T3 Foundation Partnership

3. Comparison Table: 2026 Turkish Open-Source LLM Landscape

3.1. Tokenization: The Hidden Cost Dimension of Turkish LLMs

3.2. License Complexity: Apache 2.0 vs Llama Community vs Custom

3.3. OpenLLM-TR Leaderboard: Standardized Scores

4. Practical Setup: Which Model for Which Use Case?

4.1. Self-Host Setup: vLLM + Trendyol-LLM-7B-v3 (Most Common Scenario)

4.2. 70B Scenario: Trendyol-LLM-70B-v3 + 4xH100 or 2xH200

5. Performance / Benchmark Comparison

5.1. TUMLU (Turkish MMLU) Detail

6. Turkish-Specific Angle: KVKK, BDDK, and AI Sovereignty

6.1. KVKK Angle

6.2. BDDK Angle

6.3. Defense Industry (ITAR / EAR / Turkish Law)

7. Case Studies: Turkish Open-Source LLMs in Production

Case 1 — Turkish E-commerce Company: Trendyol-LLM-7B-v3 Customer Assistant

Case 2 — Turkish Bank: Cosmos-Llama-1-70B + BDDK-Compliant RAG

Case 3 — Healthcare Group: Kumru AI Edge Deploy + KVKK

8. Risks and Cost

8.1. License Risks (Llama Community)

8.2. Continuity Risk (Maintainer Dependency)

9. Frequently Asked Questions

10. Next Steps

References

Consulting pages closest to this article

Enterprise RAG Systems Development

AI Evaluation, Guardrails and Observability

RAG and Compliance Assistants for Banking

Comments

Comments

LLMOps: Production-Grade LLM Operations

AI Governance and EU AI Act Compliance