Investigating Oversampling Techniques for Fair Machine Learning Models

Abstract

Applying machine learning in real-world applications may have various implications on companies, but individuals as well. Besides obtaining lower costs, faster time to decision and higher accuracy of the decision, automation of decisions can lead to unethical and illegal consequences. More specifically, predictions can systematically discriminate against a certain group of people. This comes mainly due to dataset bias. In this paper, we investigate instances oversampling to improve fairness. We tried several strategies and two techniques, namely SMOTE and random oversampling. Besides traditional oversampling techniques, we tried oversampling of instances based on sensitive attributes as well (i.e. gender or race). We demonstrate on real-world datasets (Adult and COMPAS) that oversampling techniques increase fairness, without greater decrease in predictive accuracy. Oversampling improved fairness up to 15% and AUPRC up to 3% with a loss in AUC of 2%.

Publication
In International Conference on Decision Support System Technology 2021
Sandro Radovanović
Sandro Radovanović
Assistant Professor at University of Belgrade

My research interests include machine learning, development and design of decision support systems, decision theory, and fairness and justice concepts in algorithmic decision making.