Technical GlossaryData Science and Data Management
Duplicate Record
A repeated data record that represents the same real-world entity or event more than once.
Duplicate records are one of the core issues that damage data quality and distort analytical results. Typical examples include the same customer appearing in multiple rows, the same transaction being recorded more than once, or duplication caused by integration workflows. Duplicates may inflate totals, distort ratios, and change effective sample weighting during model training. For that reason, duplicate detection should not rely only on exact row equality, but also on key fields, fuzzy matching, and business rules.
You Might Also Like
Explore these concepts to continue your artificial intelligence journey.
