Skip to yearly menu bar Skip to main content


Poster

Minority Oversampling for Imbalanced Data via Class-Preserving Regularized Auto-Encoders

Arnab Mondal · Lakshya Singhal · Piyush Lalitkumar Tiwary · Parag Singla · Prathosh A P

Auditorium 1 Foyer 145

Abstract:

Class imbalance is a common phenomenon in multiple application domains such as healthcare, where the sample occurrence of one or few class categories is more prevalent in the dataset than the rest. This work addresses the class-imbalance issue by proposing an over-sampling method for the minority classes in the latent space of a Regularized Auto-Encoder (RAE). Specifically, we construct a latent space by maximizing the conditional data likelihood using an Encoder-Decoder structure, such that oversampling through convex combinations of latent samples preserves the class identity. A jointly-trained linear classifier that separates convexly coupled latent vectors from different classes is used to impose this property on the AE's latent space. Further, the aforesaid linear classifier is used for final classification without retraining. We theoretically show that our method can achieve a low variance risk estimate compared to naive oversampling methods and is robust to overfitting. We conduct several experiments on benchmark datasets and show that our method outperforms the existing oversampling techniques for handling class imbalance.

Live content is unavailable. Log in and register to view live content