Skip to content

Class Imbalance

A condition in which some classes are heavily represented while others are represented only sparsely in a dataset.

Class imbalance is one of the core challenges in machine learning problems involving rare-event detection. Critical classes such as fraud, failure, disease, or security violations are often sparsely represented in the data. In such cases, a model may appear highly accurate overall while still failing on the minority class. In imbalanced settings, metric choice, resampling strategies, and cost-sensitive learning become especially important. The key issue is that the rare class is often not unimportant, but the most important one.