Skip to content

Entity Resolution

The process of determining whether different records actually refer to the same real-world entity.

Entity resolution is one of the hardest yet most critical areas in data cleaning and integration. A common example is the same customer appearing across systems with slight spelling differences or incomplete information. This is not just a duplicate-removal problem; it is a problem of reconstructing the real-world entity correctly. Rule-based matching, probabilistic matching, and graph-based methods are often used. If handled poorly, it can create major reporting and decision errors at the customer, product, or supplier level.