Veri Etiketleme & Kalite Yönetimi
Andrew Ng's 80/20 rule, why data quality matters more than model size in the modern LLM era, data strategies of Tesla/OpenAI/Meta, and why data labeling is an engineering discipline.
Table of Contents
Module 0: Introduction & Framework
- 1
The Data-Centric AI Manifesto: Why You Should Invest in Data More Than Models
Andrew Ng's 80/20 rule, why data quality matters more than model size in the modern LLM era, data strategies of Tesla/OpenAI/Meta, and why data labeling is an engineering discipline.
- 2
The Data Labeling Engineer Career Map: From Annotator to Head of Data Operations
Career levels in data labeling, daily workflow, skill matrix, global and Turkey salary ranges, career pivots, and which skills to develop in which order.
- 3
Turkey's Data Labeling Ecosystem: Vendors, Freelance Market, KVKK and the Turkish Data Scarcity
Data labeling vendors in Turkey, freelance market, price bands, the domestic advantage created by KVKK, the Turkish data scarcity and how to turn it into an opportunity.
- 4
[WORKSHOP] Setting Up Your Development Environment: Python, Docker, PostgreSQL and Label Studio from Scratch
The development environment we'll use throughout the Data Labeling & Quality Management course: Python 3.12 (with uv), Docker Compose, PostgreSQL, Label Studio, and your first 'Hello World' annotation project.
Module 1: The Anatomy of Data Labeling
- 1
Where Data Sits in the ML Pipeline: Collect, Label, Train, Evaluate, Deploy Loop
Full lifecycle of an ML system: collect → label → train → eval → deploy → monitor → re-collect. Each stage's relationship with labeling, the feedback loop, continuous improvement, and why the data flywheel is modern AI's main competitive advantage.
- 2
Full Taxonomy of Label Types: From Classification to Preference, 14 Formats
14 main label format types: single-label, multi-label, ordinal, NER, span, BBox, polygon, segmentation, keypoint, ranking, preference, free-form, structured and hybrid. Use cases, tooling, typical metrics and pitfalls.
- 3
Supervised, Semi-supervised, Self-supervised: How Labeling Needs Change Across Paradigms
Modern AI's five major learning paradigms — supervised, semi-supervised, self-supervised, weakly supervised, and few-shot/in-context — each with its labeling needs, cost profile, and use cases.
- 4
[CASE STUDY] Label the Same Data with 3 Different Schemas: Binary, Multi-class, Hierarchical Comparison
Label the same 1,000 Turkish review dataset with three different schemas (binary, 5-class fine-grained, hierarchical), train models, and compare performance + cost + utility. A complete case study showing the practical impact of schema decisions.
- 5
The Ground Truth Illusion: Does 'Correct Label' Actually Exist?
The philosophical foundation of data labeling: does ground truth really exist, why annotator subjectivity is inevitable, the problems the 'correct answer' assumption creates in modern AI, and the new paradigm of treating disagreement as signal.