Skip to content
Category

Data Science and Data Management

98 terms in the Data Science and Data Management domain — each bilingual TR/EN with related-term graph.

Data CollectionData CleaningData PreprocessingFeature EngineeringData LabelingData QualityData GovernanceData PrivacySynthetic DataImbalanced Data Problems

Most Read

All Terms (98)

D
19 terms
🔁

Data Augmentation

An approach that expands the training set by transforming existing data to improve model robustness.

📚

Data Catalog

A centralized catalog structure that presents definitions, ownership, usage, and discovery information for enterprise data assets.

📥

Data Collection

The systematic process of acquiring data for analysis, reporting, and modeling workflows.

⏱️

Data Collection SLA

An operational service-level framework that defines timeliness, completeness, and availability standards for data flows.

📄

Data Contracts

An agreement approach that explicitly defines schema, quality, and delivery expectations between data producers and consumers.

🏛️

Data Governance

The enterprise framework for managing data through ownership, quality, access, usage, and control principles.

🔍

Data Lineage

The visible trace of all movements and transformations a data element undergoes from source to report or model.

📉

Data Minimization

The principle of collecting and processing only the data that is truly necessary for a defined purpose.

📡

Data Observability

A monitoring approach that aims to detect data issues, anomalies, and silent quality degradation early.

👤

Data Ownership

The principle that defines which business or technical role is responsible for the quality, definition, and use of specific data domains.

🔬

Data Profiling

The process of systematically examining a dataset’s content, distribution, missingness, uniqueness, and rule violations.

🗂️

Data Source

The system, platform, or operational touchpoint where data is generated, stored, or retrieved.

🧑‍💼

Data Stewardship

An operational approach in which specific data domains are actively stewarded for definition, quality, and appropriate use.

🔠

Data Type Mismatch

A problem arising when the expected data type of a field differs from the actual stored content type.

🧪

Derived Feature

A new feature computed or transformed from existing fields rather than directly coming from raw data.

🔐

Differential Privacy

A mathematical privacy framework that limits the extent to which any single individual’s data can affect published results.

🌫️

Diffusion-Based Synthetic Data

A modern synthetic data generation approach that reconstructs data distributions through noise injection and reverse sampling.

🎲

Domain Randomization

An approach that varies environmental factors in synthetic data generation to make models more robust to the real world.

🪞

Duplicate Record

A repeated data record that represents the same real-world entity or event more than once.