Data Science and Engineering (DSE) Record

2025 Vol. 6 No. 1 March

Empirical Study on Using Random Class-Label Noise to Prevent Model Overfitting

Da Sun and Jakramate Bootkrajang

Published in Data Science and Engineering (DSE) Record 2025 Vol. 6 No. 1 pp. 582-606

PDF

Abstract

This study verifies a counter-intuitive hypothesis through empirical research: injecting random label noise (Noise Completely at Random, NCAR) into the training data can be used as a robust implicit regularizer of the classification model. The traditional view is that label noise will reduce the performance of the model; however, we propose that in a high-capacity model based on limited training data, controllable label noise can prevent the model from overfitting to the training data. We used 10 different two-classified data sets (UCI/OpenML) to verify this hypothesis, and set the logic regression, decision tree and multi-layer sensor (MLP) and the standard explicit regularizer (Dropout, L1, L2) at the noise level of {0%, 1%, 5%, 10%, The benchmark test was carried out under the condition of 15%}. Our results (verified under 10 random seeds and hierarchical segmentation conditions) show that label noise can usually bring better generalization performance, especially in the case of low signal-to-noise ratio (SNR) and serious category imbalance. The evidence we provide shows that the noise injection forces the optimized landscape to a flatter minimum value, thus improving the accuracy and F1-score of the test set.

All volumes

Data Science and Engineering (DSE) Record

2025 Vol. 6 No. 1 March

Abstract

All volumes

Vol. 7 2026

Vol. 6 2025

Vol. 5 2024

Vol. 4 2023

Vol. 3 2022

Vol. 2 2021

Vol. 1 2020