Anomali Tespiti
How the Anomaly Detection Engineer role differs from ML Engineer, Fraud Analyst, SRE, Quality Engineer; skill matrix, seniority levels, Turkey and global salary ranges, daily workflow, sector expectations.
Table of Contents
Module 0: Course Framework & Workshop Setup
- 1
Who Is an Anomaly Detection Engineer? Differences from Fraud, SRE, Quality Engineer Roles and the Turkey Salary Landscape
How the Anomaly Detection Engineer role differs from ML Engineer, Fraud Analyst, SRE, Quality Engineer; skill matrix, seniority levels, Turkey and global salary ranges, daily workflow, sector expectations.
- 2
Course Philosophy: Why This Path, Why This Order — The Anomaly Detection Learning River
Why we chose statistics → classical ML → deep learning → time series → domain → production order; the learning river model, which capstone builds which skill, the 5 principles we'll follow.
- 3
Workshop Setup: uv + Python 3.12 + PyOD + anomalib + PyTorch — From Zero to a Production-Ready Anomaly Detection Environment
Step-by-step setup for anomaly detection workshop: uv + Python 3.12 venv, PyTorch 2.5+, PyOD, anomalib, alibi-detect, river, Jupyter Lab on Windows WSL2, macOS MPS, and Linux CUDA.
- 4
Dataset Accounts & Cloud: Kaggle, HuggingFace, Numenta, MVTec, NASA — The Anomaly Detection Data Arsenal
Where to download the 18 datasets used in this course, Kaggle API setup, HuggingFace datasets, Numenta NAB, MVTec AD, NASA Turbofan, CWRU bearing, IEEE-CIS Fraud — plus free GPU access via Google Colab and RunPod.
Module 1: Anomaly Definition, Typology and Taxonomy
- 1
Anomaly, Outlier, Novelty, Noise: Precise Differences Between Four Frequently Confused Concepts
Precise distinctions between anomaly, outlier, novelty, and noise — terms often used interchangeably in academia and industry; Hawkins's definition; why these distinctions are critical in production.
- 2
Three Anomaly Types: Point, Contextual, and Collective — Which Method for Which?
Three fundamental types of anomalies: point, contextual, and collective. Six sector-specific examples per type, visual intuition, and method mapping.
- 3
Learning Regimes: Supervised, Semi-Supervised, Unsupervised, Weakly-Supervised — Decision Making Under Label Scarcity
Four learning regimes for anomaly detection: supervised, semi-supervised, unsupervised, and weakly-supervised. Label cost tables, which regime in which sector, and hybrid approaches.
- 4
Anomaly Detection Pipeline Anatomy: End-to-End 7 Layers from Ingestion to Alert
Seven layers of a production-grade anomaly detection pipeline: ingestion, feature engineering, scoring, thresholding, alerting, feedback loop, monitoring. Critical decisions and measurement points at each layer.
- 5
Hands-on Lab: Visualizing Three Anomaly Types with Synthetic Data — Python + Matplotlib + Plotly
Hands-on lab: generate synthetic data with Python to visualize three anomaly types (point, contextual, collective); detect each with iForest, Prophet residual, and LSTM-AE; build an interactive Plotly dashboard.
Module 2: Statistical Foundations
- 1
Normal Distribution, Z-Score, Modified Z-Score, and MAD: The Statistical Tools of Anomaly Detection
Meaning of normal distribution for anomaly detection; z-score formula, intuition, and limits; modified z-score and MAD (Median Absolute Deviation) as outlier-robust alternatives; from-scratch Python implementation.
- 2
IQR, Tukey's Fences and Adjusted Boxplot: Outlier Detection in Skewed Data
Interquartile Range (IQR), Tukey's fences (k=1.5 / k=3), boxplot anatomy, and adjusted boxplot with medcouple for skewed data — robust alternatives where z-score fails.
- 3
Grubbs, Dixon, and Generalized ESD: Turning Outlier Detection into a Hypothesis Test
Classical statistical hypothesis tests for outlier detection: Grubbs test (single outlier), Dixon Q-test (small samples), Generalized ESD (multiple outliers) — p-values, formulas, scipy implementation, and when to use which.
- 4
Chebyshev, Extreme Value Theory and Peak Over Threshold: The Statistics of Extreme Events
When normality fails: Chebyshev inequality with distribution-agnostic bounds; Extreme Value Theory (block maxima, GEV); Peak Over Threshold (POT) with Generalized Pareto Distribution — the protagonist in banking and telecom.
- 5
Robust Statistics: Huber, M-Estimator, Tukey Biweight, and MCD — Outlier-Resistant Estimation
Brittleness of classical statistics to outliers; the philosophy of robust statistics; M-estimator framework; Huber and Tukey biweight loss; Minimum Covariance Determinant (MCD) for robust multivariate estimation — the hidden foundation of modern AD.
- 6
Hands-on Lab: 5 Statistical Detector Comparison on NYC Taxi Demand Anomalies
NYC Taxi hourly demand data from Numenta NAB benchmark: run z-score, modified z, IQR, adjusted boxplot, and POT detectors side-by-side with PR-AUC comparison — the first real-dataset hands-on lab of the course.
Module 3: Data Preparation, Imbalanced Data and Labeling
- 1
Class Imbalance Problem: 1:1,000,000 Fraud Ratios and Why Accuracy Lies
Fundamental challenge of anomaly detection: imbalanced class distribution. Why classical ML fails at 1:1,000,000 ratios, the accuracy paradox, mathematical and practical impacts of imbalanced learning, sectoral imbalance table.
- 2
Sampling Strategies: SMOTE, ADASYN, Borderline-SMOTE, SMOTE-NC — The Art of Synthesizing Positive Samples
Generating synthetic positive samples for imbalanced data: random over/undersampling, SMOTE, ADASYN, Borderline-SMOTE, SMOTE-NC (numeric + categorical), SMOTE-Tomek hybrid; imblearn pipeline and common pitfalls.
- 3
Cost-Sensitive Learning and Focal Loss: Training the Loss Function for Imbalanced Data
Alternative to sampling: modifying the loss function. Cost matrix, class weight, sample weight, asymmetric loss, focal loss (Lin et al., 2017), Tversky loss, and practical applications in imbalanced AD.
- 4
Weak Supervision and Snorkel: Programmatic Labeling When Labels Are Expensive
When manual labeling is expensive, programmatic labeling: Snorkel framework, labeling functions, label model (generative), Cleanlab for label correction, strengths and weaknesses of weak supervision.
- 5
Hands-on Lab: 4 Sampling Strategy Benchmark on IEEE-CIS Fraud Data
Side-by-side benchmark of 4 imbalance strategies (baseline / SMOTE / class_weight / focal loss) on Kaggle IEEE-CIS Fraud data with PR-AUC, recall@k, and cost comparison — foundation for Capstone 1.