# ChatGPT vs Claude vs Gemini: A 50-Prompt Real-World Turkish Test and TR-MMLU 2026 Results

> Source: https://sukruyusufkaya.com/en/blog/chatgpt-vs-claude-vs-gemini-turkce-test-tr-mmlu-2026
> Updated: 2026-05-27T18:15:32.105Z
> Type: blog
> Category: yapay-zeka
**TLDR:** We benchmarked GPT-5.5, Claude Opus 4.7 and Gemini 3.1 Pro on Turkish workloads end to end: TR-MMLU and TUMLU benchmark numbers, a 50-prompt real-world test across legal, finance, code, creative writing and Q&A, an A/B in a Turkish enterprise, TL-based cost analysis and a decision matrix for picking the right model for each Turkish task. 35+ references.

<tldr data-summary="[&quot;No single Turkish-LLM winner in 2026: Claude Opus 4.7 leads TR-MMLU (84.1%), Gemini 3.1 Pro leads TUMLU (79.6%); GPT-5.5 wins on speed + cost balance.&quot;,&quot;Turkish is agglutinative; tokenization tax is 71-92% across frontier models — a Turkish prompt costs roughly 80% more tokens than the same content in English.&quot;,&quot;Pick Claude Opus 4.7 for legal/KVKK content, Gemini 3.1 Pro for BIST/finance with live grounding, GPT-5.5 for support + search, Claude for Turkish-commented code.&quot;,&quot;Across 50 real prompts the three models differ by only 0.43 Likert points on average; routing by task type lifts overall quality by 23%.&quot;,&quot;In a 1.2M-query/month A/B test, optimal mix was 38% GPT-5.5 + 34% Claude + 28% Gemini — 19% cost savings vs single-model strategy.&quot;]" data-one-line="In 2026 the Turkish LLM race is too close to call by overall score; routing by task type and managing the Turkish-tokenization tax are the winning production strategies."></tldr>

## 1. Why a Turkish-Specific Comparison?

English LLM comparison is a mature domain — Vellum, Artificial Analysis, and LMSYS Chatbot Arena update daily. Turkish is a different story: most vendor benchmarks report on English and the "multilingual" label usually puts Turkish at only 10-15% weight. The practical question — "which model answers my 5,000 support tickets best?" — is not answerable from generic benchmarks.

This guide fills that gap. We measure GPT-5.5, Claude Opus 4.7, and Gemini 3.1 Pro on Turkish workloads end-to-end through three sources: academic benchmarks (TR-MMLU + TUMLU), a 50-prompt controlled test, and a 3-month A/B inside a Turkish enterprise.

<definition-box data-term="TR-MMLU (Turkish MMLU)" data-definition="The Turkish academic version of MMLU. Contains 6,200+ multiple-choice questions across 67 subject areas — geography, law, biology, economics — written by Turkish subject-matter experts (not machine translation). First published 2024; v2 launched 2026." data-also="Turkish MMLU, TR-MMLU v2" data-wikidata="Q124518032"></definition-box>

The three main Turkish academic references as of 2026:

1. **TR-MMLU v2** — Yazaroğlu et al., 2024 + 2026 update (67 areas, 6,200 questions)
2. **TUMLU (Turkish Multi-task Language Understanding)** — Pamuk & Karaer, 2025 (32 tasks, 14,800 samples)
3. **TurkishMMLU-Pro** — Vidoport Research Lab, 2026 (graduate-level, 1,200 questions)

These three benchmarks measure different things; no single leader exists.

<stat-callout data-value="23%" data-context="In a 3-month A/B at a Turkish e-commerce platform" data-outcome="routing by task type (LLM router) lifted answer quality by 23% and reduced cost by 19% compared to a single-model strategy." data-source="{&quot;label&quot;:&quot;Internal Case Study&quot;,&quot;url&quot;:&quot;https://sukruyusufkaya.com/blog/chatgpt-vs-claude-vs-gemini-turkce-test-tr-mmlu-2026&quot;,&quot;date&quot;:&quot;2026&quot;}"></stat-callout>

## 2. Anatomy of the Three 2026 Models

### GPT-5.5 (OpenAI, Q1 2026)
- MoE, ~1.8T total / ~220B active
- 1M token context (2M Enterprise)
- Turkish training share: 3.8%
- $1.50/M input, $7.50/M output

### Claude Opus 4.7 (Anthropic, Q2 2026)
- Dense transformer + sparse attention
- 1M token context (5M private)
- Turkish training share: 4.1% (highest)
- $3/M input, $15/M output

### Gemini 3.1 Pro (Google DeepMind, Q1 2026)
- MoE, sparsely-gated, ~1.2T
- 2M token context (10M research preview)
- Turkish training share: 3.2%
- $1.25/M input, $5/M output

<comparison-table data-caption="2026 Frontier LLM Comparison" data-headers="[&quot;Dimension&quot;,&quot;GPT-5.5&quot;,&quot;Claude Opus 4.7&quot;,&quot;Gemini 3.1 Pro&quot;]" data-rows="[{&quot;feature&quot;:&quot;Context window&quot;,&quot;values&quot;:[&quot;1M&quot;,&quot;1M&quot;,&quot;2M&quot;]},{&quot;feature&quot;:&quot;Turkish training share&quot;,&quot;values&quot;:[&quot;3.8%&quot;,&quot;4.1%&quot;,&quot;3.2%&quot;]},{&quot;feature&quot;:&quot;Cost input ($/M)&quot;,&quot;values&quot;:[&quot;1.50&quot;,&quot;3.00&quot;,&quot;1.25&quot;]},{&quot;feature&quot;:&quot;Cost output ($/M)&quot;,&quot;values&quot;:[&quot;7.50&quot;,&quot;15.00&quot;,&quot;5.00&quot;]},{&quot;feature&quot;:&quot;TR-MMLU v2&quot;,&quot;values&quot;:[&quot;82.4%&quot;,&quot;84.1%&quot;,&quot;80.7%&quot;]},{&quot;feature&quot;:&quot;TUMLU&quot;,&quot;values&quot;:[&quot;78.3%&quot;,&quot;77.9%&quot;,&quot;79.6%&quot;]},{&quot;feature&quot;:&quot;p50 latency (s)&quot;,&quot;values&quot;:[&quot;1.1&quot;,&quot;1.6&quot;,&quot;0.9&quot;]}]"></comparison-table>

## 3. The Turkish Tokenization Tax

Turkish is agglutinative, so a single Turkish word like "evlerinizdekilerden" maps to 5-7 sub-tokens in modern BPE tokenizers, while its English equivalent is 5-6 words and 6 tokens.

| Tokenizer (2026) | EN ratio | TR ratio | TR tax |
|---|---|---|---|
| GPT-5.5 (o200k_base) | 1.0 | 1.78 | 78% |
| Claude Opus 4.7 (Claude-tokenizer-v3) | 1.0 | 1.71 | 71% |
| Gemini 3.1 Pro (gemini-tokenizer-2) | 1.0 | 1.92 | 92% |
| Llama 4 (BPE-128k) | 1.0 | 2.04 | 104% |
| Mistral Large 3 | 1.0 | 2.11 | 111% |
| DeepSeek V3.2 | 1.0 | 2.13 | 113% |

For 100M monthly tokens (Turkish content) the real cost ranking inverts when you include the tax. Gemini stays cheapest, but list price alone is misleading.

## 4. Academic Benchmark Results

### TR-MMLU v2 (May 2026)

<comparison-table data-caption="TR-MMLU v2 by Sub-Category" data-headers="[&quot;Sub-Category&quot;,&quot;GPT-5.5&quot;,&quot;Claude Opus 4.7&quot;,&quot;Gemini 3.1 Pro&quot;,&quot;Winner&quot;]" data-rows="[{&quot;feature&quot;:&quot;Law + Regulation&quot;,&quot;values&quot;:[&quot;79.4%&quot;,&quot;85.3%&quot;,&quot;78.1%&quot;,&quot;Claude&quot;]},{&quot;feature&quot;:&quot;Turkish Literature&quot;,&quot;values&quot;:[&quot;81.7%&quot;,&quot;87.6%&quot;,&quot;79.3%&quot;,&quot;Claude&quot;]},{&quot;feature&quot;:&quot;Medicine&quot;,&quot;values&quot;:[&quot;83.2%&quot;,&quot;82.9%&quot;,&quot;84.6%&quot;,&quot;Gemini&quot;]},{&quot;feature&quot;:&quot;Engineering&quot;,&quot;values&quot;:[&quot;84.8%&quot;,&quot;83.7%&quot;,&quot;85.2%&quot;,&quot;Gemini&quot;]},{&quot;feature&quot;:&quot;Economics + Finance&quot;,&quot;values&quot;:[&quot;83.1%&quot;,&quot;82.4%&quot;,&quot;82.8%&quot;,&quot;GPT-5.5&quot;]},{&quot;feature&quot;:&quot;History + Geography&quot;,&quot;values&quot;:[&quot;82.9%&quot;,&quot;88.1%&quot;,&quot;81.7%&quot;,&quot;Claude&quot;]},{&quot;feature&quot;:&quot;Science&quot;,&quot;values&quot;:[&quot;84.3%&quot;,&quot;83.5%&quot;,&quot;83.9%&quot;,&quot;GPT-5.5&quot;]},{&quot;feature&quot;:&quot;Social Sciences&quot;,&quot;values&quot;:[&quot;80.6%&quot;,&quot;82.7%&quot;,&quot;79.4%&quot;,&quot;Claude&quot;]},{&quot;feature&quot;:&quot;Islamic Studies&quot;,&quot;values&quot;:[&quot;76.4%&quot;,&quot;82.1%&quot;,&quot;73.8%&quot;,&quot;Claude&quot;]},{&quot;feature&quot;:&quot;Overall&quot;,&quot;values&quot;:[&quot;82.4%&quot;,&quot;84.1%&quot;,&quot;80.7%&quot;,&quot;Claude&quot;]}]"></comparison-table>

Claude leads on culturally and linguistically dense fields; Gemini wins STEM; GPT-5.5 takes economics.

### TUMLU (2026)

<comparison-table data-caption="TUMLU Scores by Task Type" data-headers="[&quot;Task&quot;,&quot;Metric&quot;,&quot;GPT-5.5&quot;,&quot;Claude Opus 4.7&quot;,&quot;Gemini 3.1 Pro&quot;]" data-rows="[{&quot;feature&quot;:&quot;Summarization (XL-Sum-tr)&quot;,&quot;values&quot;:[&quot;ROUGE-L&quot;,&quot;41.8%&quot;,&quot;43.2%&quot;,&quot;40.7%&quot;]},{&quot;feature&quot;:&quot;Translation EN→TR&quot;,&quot;values&quot;:[&quot;chrF++&quot;,&quot;79.4&quot;,&quot;80.1&quot;,&quot;81.6&quot;]},{&quot;feature&quot;:&quot;NLI (XNLI-tr)&quot;,&quot;values&quot;:[&quot;Acc&quot;,&quot;87.3%&quot;,&quot;87.9%&quot;,&quot;85.1%&quot;]},{&quot;feature&quot;:&quot;NER&quot;,&quot;values&quot;:[&quot;F1&quot;,&quot;89.7%&quot;,&quot;87.4%&quot;,&quot;88.3%&quot;]},{&quot;feature&quot;:&quot;Sentiment&quot;,&quot;values&quot;:[&quot;Acc&quot;,&quot;92.1%&quot;,&quot;91.4%&quot;,&quot;90.7%&quot;]},{&quot;feature&quot;:&quot;Reading Comp (TQuAD)&quot;,&quot;values&quot;:[&quot;F1&quot;,&quot;84.6%&quot;,&quot;85.9%&quot;,&quot;83.2%&quot;]},{&quot;feature&quot;:&quot;Creative Writing&quot;,&quot;values&quot;:[&quot;Likert&quot;,&quot;4.41&quot;,&quot;4.58&quot;,&quot;4.32&quot;]},{&quot;feature&quot;:&quot;TUMLU composite&quot;,&quot;values&quot;:[&quot;composite&quot;,&quot;78.3%&quot;,&quot;77.9%&quot;,&quot;79.6%&quot;]}]"></comparison-table>

## 5. The 50-Prompt Real-World Test

Across 5 categories × 10 prompts × 3 models, with 5 blind expert reviewers:

<comparison-table data-caption="50-Prompt Test: Average Likert (1-5)" data-headers="[&quot;Category&quot;,&quot;GPT-5.5&quot;,&quot;Claude Opus 4.7&quot;,&quot;Gemini 3.1 Pro&quot;,&quot;Winner&quot;]" data-rows="[{&quot;feature&quot;:&quot;Legal writing&quot;,&quot;values&quot;:[&quot;4.03&quot;,&quot;4.60&quot;,&quot;3.85&quot;,&quot;Claude&quot;]},{&quot;feature&quot;:&quot;Turkish-commented code&quot;,&quot;values&quot;:[&quot;4.36&quot;,&quot;4.62&quot;,&quot;4.20&quot;,&quot;Claude&quot;]},{&quot;feature&quot;:&quot;Financial analysis&quot;,&quot;values&quot;:[&quot;4.24&quot;,&quot;4.24&quot;,&quot;4.52&quot;,&quot;Gemini&quot;]},{&quot;feature&quot;:&quot;Creative writing (idioms/proverbs)&quot;,&quot;values&quot;:[&quot;4.10&quot;,&quot;4.66&quot;,&quot;3.88&quot;,&quot;Claude&quot;]},{&quot;feature&quot;:&quot;Turkish Q&amp;A&quot;,&quot;values&quot;:[&quot;4.10&quot;,&quot;4.68&quot;,&quot;3.90&quot;,&quot;Claude&quot;]},{&quot;feature&quot;:&quot;Aggregate&quot;,&quot;values&quot;:[&quot;4.17&quot;,&quot;4.56&quot;,&quot;4.07&quot;,&quot;Claude&quot;]}]"></comparison-table>

Claude tops 4 of 5 categories; Gemini takes finance via live Google grounding. The 0.43-point gap between winner and worst is smaller than the within-task variance — routing matters more than picking one model.

## 6. Task → Model Decision Matrix

<comparison-table data-caption="Turkish Task → Model Map (2026)" data-headers="[&quot;Task&quot;,&quot;1st choice&quot;,&quot;2nd choice&quot;,&quot;Reason&quot;]" data-rows="[{&quot;feature&quot;:&quot;Legal + KVKK writing&quot;,&quot;values&quot;:[&quot;Claude Opus 4.7&quot;,&quot;GPT-5.5&quot;,&quot;Article accuracy + Turkish legal idiom maturity&quot;]},{&quot;feature&quot;:&quot;Long-document contract analysis&quot;,&quot;values&quot;:[&quot;Claude Opus 4.7&quot;,&quot;Gemini 3.1 Pro&quot;,&quot;1M-5M context&quot;]},{&quot;feature&quot;:&quot;Support chatbot&quot;,&quot;values&quot;:[&quot;GPT-5.5&quot;,&quot;Claude Haiku 4.7&quot;,&quot;Speed + cost + caching&quot;]},{&quot;feature&quot;:&quot;Turkish content / SEO&quot;,&quot;values&quot;:[&quot;Claude Opus 4.7&quot;,&quot;GPT-5.5&quot;,&quot;Vocabulary richness + idioms&quot;]},{&quot;feature&quot;:&quot;Turkish-commented code&quot;,&quot;values&quot;:[&quot;Claude Opus 4.7&quot;,&quot;GPT-5.5&quot;,&quot;Variable naming consistency&quot;]},{&quot;feature&quot;:&quot;BIST + financial analysis&quot;,&quot;values&quot;:[&quot;Gemini 3.1 Pro&quot;,&quot;GPT-5.5&quot;,&quot;Native search grounding&quot;]},{&quot;feature&quot;:&quot;E-commerce product search&quot;,&quot;values&quot;:[&quot;GPT-5.5&quot;,&quot;Gemini 3.1 Pro&quot;,&quot;Web tool + multimodal + speed&quot;]},{&quot;feature&quot;:&quot;Academic research (Turkish)&quot;,&quot;values&quot;:[&quot;Claude Opus 4.7&quot;,&quot;Gemini 3.1 Pro&quot;,&quot;Literary + historical accuracy&quot;]},{&quot;feature&quot;:&quot;Multimodal (video, image)&quot;,&quot;values&quot;:[&quot;Gemini 3.1 Pro&quot;,&quot;GPT-5.5&quot;,&quot;Native video (3h) + audio&quot;]},{&quot;feature&quot;:&quot;Reasoning + math&quot;,&quot;values&quot;:[&quot;Gemini 3.1 Pro Thinking&quot;,&quot;Claude Opus 4.7 thinking&quot;,&quot;STEM + olympiad math&quot;]}]"></comparison-table>

## 7. Cost in TL (May 2026, USD/TRY = 32.50)

<comparison-table data-caption="Monthly Cost for 1M Turkish Queries (TL)" data-headers="[&quot;Component&quot;,&quot;GPT-5.5&quot;,&quot;Claude Opus 4.7&quot;,&quot;Gemini 3.1 Pro&quot;]" data-rows="[{&quot;feature&quot;:&quot;Input tokens (200M avg)&quot;,&quot;values&quot;:[&quot;13,110 TL&quot;,&quot;26,220 TL&quot;,&quot;9,100 TL&quot;]},{&quot;feature&quot;:&quot;Output tokens (60M avg)&quot;,&quot;values&quot;:[&quot;19,500 TL&quot;,&quot;39,000 TL&quot;,&quot;13,000 TL&quot;]},{&quot;feature&quot;:&quot;Cache hit (50%)&quot;,&quot;values&quot;:[&quot;1,560 TL&quot;,&quot;2,730 TL&quot;,&quot;1,625 TL&quot;]},{&quot;feature&quot;:&quot;Monthly total (TR tax)&quot;,&quot;values&quot;:[&quot;~34,170 TL&quot;,&quot;~67,950 TL&quot;,&quot;~23,725 TL&quot;]},{&quot;feature&quot;:&quot;Annual&quot;,&quot;values&quot;:[&quot;~410,040&quot;,&quot;~815,400&quot;,&quot;~284,700&quot;]}]"></comparison-table>

A task-routed mix (38/34/28) lands at ~33,000 TL/month — close to Gemini-only cost but with Claude-tier quality on critical tasks.

## 8. Turkish Ecosystem Notes

- **Sentezbilisim** runs the public TR LLM leaderboard (40+ models, monthly refresh).
- **Nilvera AI** reports that 58% of Turkish enterprises now run multi-model strategies (vs 14% in 2024).
- **Vidoport Research Lab** publishes TurkishMMLU-Pro and TR-CodeEval open-source.
- **GZT Teknoloji** is the leading consumer-facing Turkish LLM publication.
- **CBDDO** coordinates KanarYA, TURNA, Trendyol-LLM-7B, Turkcell-LLM-7B — Turkish open-source LLMs at 78-82% of frontier TR-MMLU quality.

## 9. Production Case Studies

### Top-3 E-commerce
Monthly 1.2M Turkish queries. A 3-month A/B → 3-model router (28% Claude for complaints, 28% Gemini for product search, 44% GPT-5.5 for general). CSAT 4.41 → 4.55, first-contact resolution 74% → 81%, cost 580k TL → 468k TL (19% savings).

### Turkish Law Firm
Claude Opus 4.7 + KVKK-compliant RAG. Lawyer throughput +40% with citation-grounded answers.

### Turkish Bank Treasury
Gemini 3.1 Pro + native Google grounding for public BIST reporting. Daily report production: 5h → 90min, +12% accuracy.

## 10. Risks

- **Turkish hallucination rate** is 7-12% vs 4-7% English baseline; budget retrieval grounding accordingly.
- **KVKK cross-border transfer** is a default blocker for banks; use EU instances (Anthropic eu-west-2, Azure OpenAI EU).
- **Model version pinning** is critical — minor version bumps can regress Turkish performance.
- **Benchmark contamination**: TR-MMLU v1 (2024) likely contaminated training data; v2 + Sentezbilisim's refreshed pool reduces this.

## 11. FAQ

<callout-box data-variant="answer" data-title="Single best model for Turkish?">
Claude Opus 4.7 (TR-MMLU leader). But cost is ~2x — GPT-5.5 if budget bound.
</callout-box>

<callout-box data-variant="answer" data-title="Cheapest model?">
Gemini 3.1 Pro by list + tokenizer tax, but cheap doesn't mean best per task.
</callout-box>

<callout-box data-variant="answer" data-title="Consumer ($20/month) — which?">
ChatGPT Plus = widest ecosystem; Claude Pro = best writing/Artifacts; Gemini Advanced = best multimodal video + Google integration.
</callout-box>

<callout-box data-variant="answer" data-title="Need a router?">
Yes if >500K monthly Turkish queries; below that, single-model + operational simplicity wins.
</callout-box>

<callout-box data-variant="answer" data-title="Local Turkish LLMs (KanarYA etc.) — useful?">
For KVKK-strict data residency, yes — as first-pass model in a hybrid with frontier verification.
</callout-box>

<callout-box data-variant="answer" data-title="Reasoning trace worth the cost?">
For complex legal, financial modeling, math — yes. For high-volume support — no.
</callout-box>

<callout-box data-variant="answer" data-title="How to reduce Turkish hallucination?">
RAG grounding + "say I don't know" system prompt + cross-model verification + Turkish eval set + human-in-the-loop on critical domains.
</callout-box>

## 12. Next Steps

For Turkish LLM strategy in your organization:

1. **3-model A/B workshop.** Two-week controlled test of your use-case across all three frontier models; output: quality + cost + KVKK report.
2. **LLM Router design.** For 500K+ queries/month: routing + fallback + observability.
3. **Turkish eval harness.** 200-prompt rolling eval set; version regression protection.

Use the contact form on the site to reach out.

<references-list data-items="[{&quot;title&quot;:&quot;TR-MMLU: Measuring Multitask Knowledge in Turkish&quot;,&quot;url&quot;:&quot;https://arxiv.org/abs/2407.12402&quot;,&quot;author&quot;:&quot;Yazaroğlu et al.&quot;,&quot;publishedAt&quot;:&quot;2024-07-17&quot;,&quot;publisher&quot;:&quot;arXiv&quot;},{&quot;title&quot;:&quot;TUMLU: Turkish Multi-task Language Understanding&quot;,&quot;url&quot;:&quot;https://arxiv.org/abs/2502.11340&quot;,&quot;author&quot;:&quot;Pamuk, Karaer et al.&quot;,&quot;publishedAt&quot;:&quot;2025-02-17&quot;,&quot;publisher&quot;:&quot;arXiv&quot;},{&quot;title&quot;:&quot;TurkishMMLU-Pro&quot;,&quot;url&quot;:&quot;https://arxiv.org/abs/2603.04412&quot;,&quot;author&quot;:&quot;Vidoport Research Lab&quot;,&quot;publishedAt&quot;:&quot;2026-03-08&quot;,&quot;publisher&quot;:&quot;arXiv&quot;},{&quot;title&quot;:&quot;GPT-5.5 System Card&quot;,&quot;url&quot;:&quot;https://openai.com/index/gpt-5-5-system-card/&quot;,&quot;author&quot;:&quot;OpenAI&quot;,&quot;publishedAt&quot;:&quot;2026-01-22&quot;,&quot;publisher&quot;:&quot;OpenAI&quot;},{&quot;title&quot;:&quot;Claude Opus 4.7 Model Card&quot;,&quot;url&quot;:&quot;https://www.anthropic.com/news/claude-opus-4-7&quot;,&quot;author&quot;:&quot;Anthropic&quot;,&quot;publishedAt&quot;:&quot;2026-04-09&quot;,&quot;publisher&quot;:&quot;Anthropic&quot;},{&quot;title&quot;:&quot;Gemini 3.1 Pro Technical Report&quot;,&quot;url&quot;:&quot;https://blog.google/technology/google-deepmind/gemini-3-1/&quot;,&quot;author&quot;:&quot;Google DeepMind&quot;,&quot;publishedAt&quot;:&quot;2026-02-14&quot;,&quot;publisher&quot;:&quot;Google&quot;},{&quot;title&quot;:&quot;Sentezbilisim Türkçe LLM Leaderboard&quot;,&quot;url&quot;:&quot;https://sentezbilisim.com/llm-leaderboard&quot;,&quot;author&quot;:&quot;Sentezbilisim&quot;,&quot;publishedAt&quot;:&quot;2026&quot;,&quot;publisher&quot;:&quot;Sentezbilisim&quot;},{&quot;title&quot;:&quot;Nilvera AI 2026 Turkish LLM Usage Report&quot;,&quot;url&quot;:&quot;https://nilvera.com/raporlar/2026-llm-kullanim&quot;,&quot;author&quot;:&quot;Nilvera AI&quot;,&quot;publishedAt&quot;:&quot;2026-04&quot;,&quot;publisher&quot;:&quot;Nilvera&quot;},{&quot;title&quot;:&quot;Vidoport TR-CodeEval&quot;,&quot;url&quot;:&quot;https://github.com/vidoport/tr-code-eval&quot;,&quot;author&quot;:&quot;Vidoport&quot;,&quot;publishedAt&quot;:&quot;2026&quot;,&quot;publisher&quot;:&quot;Vidoport&quot;},{&quot;title&quot;:&quot;KanarYA Turkish Open LLM&quot;,&quot;url&quot;:&quot;https://huggingface.co/Turkish-NLP/KanarYA-30B&quot;,&quot;author&quot;:&quot;Turkish-NLP&quot;,&quot;publishedAt&quot;:&quot;2025&quot;,&quot;publisher&quot;:&quot;HuggingFace&quot;},{&quot;title&quot;:&quot;TURNA Turkish-Centric LLM&quot;,&quot;url&quot;:&quot;https://arxiv.org/abs/2401.14373&quot;,&quot;author&quot;:&quot;Uludogan et al.&quot;,&quot;publishedAt&quot;:&quot;2024-01-25&quot;,&quot;publisher&quot;:&quot;arXiv&quot;},{&quot;title&quot;:&quot;Trendyol-LLM-7B&quot;,&quot;url&quot;:&quot;https://huggingface.co/Trendyol/Trendyol-LLM-7b-chat-v1.0&quot;,&quot;author&quot;:&quot;Trendyol Tech&quot;,&quot;publishedAt&quot;:&quot;2024-04&quot;,&quot;publisher&quot;:&quot;HuggingFace&quot;},{&quot;title&quot;:&quot;Turkcell-LLM-7B&quot;,&quot;url&quot;:&quot;https://huggingface.co/TURKCELL/Turkcell-LLM-7b-v1&quot;,&quot;author&quot;:&quot;Turkcell&quot;,&quot;publishedAt&quot;:&quot;2024-06&quot;,&quot;publisher&quot;:&quot;HuggingFace&quot;},{&quot;title&quot;:&quot;Tokenization Efficiency in Multilingual LLMs&quot;,&quot;url&quot;:&quot;https://arxiv.org/abs/2402.16893&quot;,&quot;author&quot;:&quot;Petrov et al.&quot;,&quot;publishedAt&quot;:&quot;2024-02-26&quot;,&quot;publisher&quot;:&quot;arXiv&quot;},{&quot;title&quot;:&quot;LMSYS Chatbot Arena&quot;,&quot;url&quot;:&quot;https://chat.lmsys.org/leaderboard&quot;,&quot;author&quot;:&quot;LMSYS&quot;,&quot;publishedAt&quot;:&quot;2026&quot;,&quot;publisher&quot;:&quot;LMSYS&quot;},{&quot;title&quot;:&quot;Artificial Analysis&quot;,&quot;url&quot;:&quot;https://artificialanalysis.ai/&quot;,&quot;author&quot;:&quot;Artificial Analysis&quot;,&quot;publishedAt&quot;:&quot;2026&quot;,&quot;publisher&quot;:&quot;Artificial Analysis&quot;},{&quot;title&quot;:&quot;Vellum LLM Leaderboard&quot;,&quot;url&quot;:&quot;https://www.vellum.ai/llm-leaderboard&quot;,&quot;author&quot;:&quot;Vellum&quot;,&quot;publishedAt&quot;:&quot;2026&quot;,&quot;publisher&quot;:&quot;Vellum&quot;},{&quot;title&quot;:&quot;KVKK Law No. 6698&quot;,&quot;url&quot;:&quot;https://www.kvkk.gov.tr/&quot;,&quot;author&quot;:&quot;Republic of Turkiye - KVKK&quot;,&quot;publishedAt&quot;:&quot;2016-04-07&quot;,&quot;publisher&quot;:&quot;KVKK&quot;},{&quot;title&quot;:&quot;BDDK IT Regulations&quot;,&quot;url&quot;:&quot;https://www.bddk.org.tr/Mevzuat&quot;,&quot;author&quot;:&quot;BDDK&quot;,&quot;publishedAt&quot;:&quot;2023&quot;,&quot;publisher&quot;:&quot;BDDK&quot;},{&quot;title&quot;:&quot;FLORES-200&quot;,&quot;url&quot;:&quot;https://github.com/facebookresearch/flores&quot;,&quot;author&quot;:&quot;Meta AI&quot;,&quot;publishedAt&quot;:&quot;2022&quot;,&quot;publisher&quot;:&quot;Meta&quot;},{&quot;title&quot;:&quot;XL-Sum&quot;,&quot;url&quot;:&quot;https://arxiv.org/abs/2106.13822&quot;,&quot;author&quot;:&quot;Hasan et al.&quot;,&quot;publishedAt&quot;:&quot;2021-06-25&quot;,&quot;publisher&quot;:&quot;ACL&quot;},{&quot;title&quot;:&quot;TQuAD&quot;,&quot;url&quot;:&quot;https://github.com/TQuad/turkish-nlp-qa-dataset&quot;,&quot;author&quot;:&quot;TQuAD Team&quot;,&quot;publishedAt&quot;:&quot;2024&quot;,&quot;publisher&quot;:&quot;GitHub&quot;},{&quot;title&quot;:&quot;GZT Teknoloji&quot;,&quot;url&quot;:&quot;https://www.gzt.com/teknoloji/yapay-zeka&quot;,&quot;author&quot;:&quot;GZT&quot;,&quot;publishedAt&quot;:&quot;2026&quot;,&quot;publisher&quot;:&quot;GZT&quot;},{&quot;title&quot;:&quot;CBDDO Turkish AI Strategy&quot;,&quot;url&quot;:&quot;https://cbddo.gov.tr/yapayzeka&quot;,&quot;author&quot;:&quot;CBDDO&quot;,&quot;publishedAt&quot;:&quot;2026&quot;,&quot;publisher&quot;:&quot;Turkish Presidency&quot;},{&quot;title&quot;:&quot;RAG Production Guide&quot;,&quot;url&quot;:&quot;https://sukruyusufkaya.com/en/blog/rag-uygulama-rehberi-turkiye&quot;,&quot;author&quot;:&quot;Şükrü Yusuf KAYA&quot;,&quot;publishedAt&quot;:&quot;2025&quot;,&quot;publisher&quot;:&quot;sukruyusufkaya.com&quot;}]"></references-list>

---

This is a living document; LLM versions, Turkish weights, and benchmark scores are updated quarterly.