Unbiased Risk Estimators Can Mislead: A Case Study of Learning with Complementary Labels

Part of Proceedings of the International Conference on Machine Learning 1 pre-proceedings (ICML 2020)

Bibtex »Metadata »Paper »

Bibtek download is not availble in the pre-proceeding


Yu-Ting Chou, Gang Niu, Hsuan-Tien Lin, Masashi Sugiyama


<p>In weakly supervised learning, unbiased risk estimators (URE) are powerful tools for estimating the risk of classifiers when the training distribution differs from the test distribution. However, they lead to overfitting in many problem settings if deep networks are chosen as the classifiers. In this paper, we investigate reasons for such overfitting by studying learning with complementary labels. We argue that the quality of gradient estimation matters more than risk estimation in risk minimization. Theoretically, we find UREs give unbiased gradient estimators (UGE). Empirically, we find UGEs have a huge variance, though the bias is zero; their direction is far away from the true gradient in expectation, though the expected direction is the same as the true gradient. Hence we advocate to use biased risk estimators by taking into account the bias-variance tradeoff and the directional similarity of gradient estimation, and experiments show that they successfully mitigate the overfitting due to UREs/UGEs.</p>