Veri Etiketleme & Kalite Yönetimi
Andrew Ng's 80/20 rule, why data quality matters more than model size in the modern LLM era, data strategies of Tesla/OpenAI/Meta, and why data labeling is an engineering discipline.
How this learning category is structured
Each category is a progressive chain of modules — from foundational concepts to production-grade architectural choices. Following the sequence is faster, but every module is self-contained.
Module shape is consistent: a short text/video lesson (10–15 minutes), a hands-on example (code + data), a 10–15 question assessment, and a real-world use case anchor. This structure forecloses the 'I saw it, I get it' trap — the assessment-after-application tests whether the concept actually moved into working memory.
Each category emphasizes production-grade practice: in prompt engineering, not just prompt templates but prompt versioning and A/B testing; in RAG, not just chunk-and-embed but hybrid retrieval + reranker + evaluation; in LLMOps, not just deployment but observability and cost attribution.
Recommended path: complete foundational modules in order first, then selectively consume advanced modules based on need. If you prefer cohort format, drip-release paces you with peers; in self-paced mode you control the cadence.
- Each module: 10–15 minute lesson + hands-on example + assessment.
- Production-oriented; lessons anchor in real vendor/tooling choices.
- Modules are independently consumable, but the sequence accelerates retention.
- Pro membership unlocks certificate exam + AI tutor + drip cohort access.
Table of Contents
Module 0: Introduction & Framework
- 1
The Data-Centric AI Manifesto: Why You Should Invest in Data More Than Models
Andrew Ng's 80/20 rule, why data quality matters more than model size in the modern LLM era, data strategies of Tesla/OpenAI/Meta, and why data labeling is an engineering discipline.
- 2
The Data Labeling Engineer Career Map: From Annotator to Head of Data Operations
Career levels in data labeling, daily workflow, skill matrix, global and Turkey salary ranges, career pivots, and which skills to develop in which order.
- 3
Turkey's Data Labeling Ecosystem: Vendors, Freelance Market, KVKK and the Turkish Data Scarcity
Data labeling vendors in Turkey, freelance market, price bands, the domestic advantage created by KVKK, the Turkish data scarcity and how to turn it into an opportunity.
- 4
[WORKSHOP] Setting Up Your Development Environment: Python, Docker, PostgreSQL and Label Studio from Scratch
The development environment we'll use throughout the Data Labeling & Quality Management course: Python 3.12 (with uv), Docker Compose, PostgreSQL, Label Studio, and your first 'Hello World' annotation project.
Module 1: The Anatomy of Data Labeling
- 1
Where Data Sits in the ML Pipeline: Collect, Label, Train, Evaluate, Deploy Loop
Full lifecycle of an ML system: collect → label → train → eval → deploy → monitor → re-collect. Each stage's relationship with labeling, the feedback loop, continuous improvement, and why the data flywheel is modern AI's main competitive advantage.
- 2
Full Taxonomy of Label Types: From Classification to Preference, 14 Formats
14 main label format types: single-label, multi-label, ordinal, NER, span, BBox, polygon, segmentation, keypoint, ranking, preference, free-form, structured and hybrid. Use cases, tooling, typical metrics and pitfalls.
- 3
Supervised, Semi-supervised, Self-supervised: How Labeling Needs Change Across Paradigms
Modern AI's five major learning paradigms — supervised, semi-supervised, self-supervised, weakly supervised, and few-shot/in-context — each with its labeling needs, cost profile, and use cases.
- 4
[CASE STUDY] Label the Same Data with 3 Different Schemas: Binary, Multi-class, Hierarchical Comparison
Label the same 1,000 Turkish review dataset with three different schemas (binary, 5-class fine-grained, hierarchical), train models, and compare performance + cost + utility. A complete case study showing the practical impact of schema decisions.
- 5
The Ground Truth Illusion: Does 'Correct Label' Actually Exist?
The philosophical foundation of data labeling: does ground truth really exist, why annotator subjectivity is inevitable, the problems the 'correct answer' assumption creates in modern AI, and the new paradigm of treating disagreement as signal.
Frequently Asked Questions
- Modules are designed to be followed in the order shown in the table of contents. The first module lays the groundwork, later ones build on it. You can skip a section, but if a 'Prerequisites' block appears in a side module, complete those lessons first.