Agreeing to disagree: active learning with noisy labels without crowdsourcing

Bouguelia, Mohamed-Rafik; Nowaczyk, Slawomir; Santosh, K. C.; Verikas, Antanas

doi:10.1007/s13042-017-0645-0

Agreeing to disagree: active learning with noisy labels without crowdsourcing

Original Article
Published: 27 February 2017

Volume 9, pages 1307–1319, (2018)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Mohamed-Rafik Bouguelia¹,
Slawomir Nowaczyk¹,
K. C. Santosh ORCID: orcid.org/0000-0003-4176-0236² &
…
Antanas Verikas¹

2187 Accesses
75 Citations
Explore all metrics

Abstract

We propose a new active learning method for classification, which handles label noise without relying on multiple oracles (i.e., crowdsourcing). We propose a strategy that selects (for labeling) instances with a high influence on the learned model. An instance x is said to have a high influence on the model h, if training h on x (with label \(y = h(x)\)) would result in a model that greatly disagrees with h on labeling other instances. Then, we propose another strategy that selects (for labeling) instances that are highly influenced by changes in the learned model. An instance x is said to be highly influenced, if training h with a set of instances would result in a committee of models that agree on a common label for x but disagree with h(x). We compare the two strategies and we show, on different publicly available datasets, that selecting instances according to the first strategy while eliminating noisy labels according to the second strategy, greatly improves the accuracy compared to several benchmarking methods, even when a significant amount of instances are mislabeled.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning from crowds with sparse and imbalanced annotations

Article 14 June 2022

Ye Shi, Shao-Yuan Li & Sheng-Jun Huang

Learning from Crowds in Multi-dimensional Classification Domains

Learning from crowds with active learning and self-healing

Article 21 February 2017

Zhenyu Shu, Victor S. Sheng & Jingjing Li

Notes

This is optimal given that we are only allowed to query for the label of one instance at each iteration, and it is only optimal for the given classifier.
As the decision boundary becomes more stable (over time), fine tuning it becomes more effective.
For more information about the used one-against-one multiclass strategy and the hyper-parameter selection, please visit the APIs sklearn.multiclass.OneVsOneClassifier and sklearn.model_selection.GridSearchCV and sklearn.svm.SVC on http://scikit-learn.org.

References

Abedini M, Codella N, Connell J, Garnavi R, Merler M, Pankanti S, Smith J, Syeda-Mahmood T (2015) A generalized framework for medical image classification and recognition. IBM J Res Dev 59(2/3):1–18
Article Google Scholar
Agarwal A, Garg R, Chaudhury S (2013) Greedy search for active learning of ocr. In: International conference on document analysis and recognition (ICDAR). IEEE, pp 837–841
Bache K, Lichman M (2013) Uci machine learning repository. Irvine, CA : University of California, School of Information and Computer Science. http://archive.ics.uci.edu/ml
Bouguelia MR, Belaid Y, Belaid A (2015) Identifying and mitigating labelling errors in active learning. In: Fred A, De Marsico M, Figueiredo M (eds) Pattern recognition: applications and methods. Lecture Notes in Computer Science, vol 9493. Springer, Cham, pp 35–51
Bouguelia MR, Belaid Y, Belaid A (2016) An adaptive streaming active learning strategy based on instance weighting. Pattern Recogn Lett 70:38–44
Article Google Scholar
Bouneffouf D, Laroche R, Urvoy T, Fraud R, Allesiardo R (2014) Contextual bandit for active learning: active thompson sampling. Int Conf Neural Inf Process 26(12):405–412
Google Scholar
Ekbal A, Saha S, Sikdar UK (2014) On active annotation for named entity recognition. Int J Mach Learn Cybern
Fang M, Zhu X (2014) Active learning with uncertain labeling knowledge. Pattern Recogn Lett 43:98–108
Article Google Scholar
Frnay B, Verleysen M (2014) Classification in the presence of label noise: a survey. IEEE Trans Neural Netw Learn Syst 25(5):845–869
Article Google Scholar
Gilad-Bachrach R, Navot A, Tishby N (2005) Query by committee made real. Adv Neural Inf Process Syst 5:443–450
Google Scholar
Hamidzadeh J, Monsefi R, Yazdi HS (2016) Large symmetric margin instance selection algorithm. Int J Mach Learn Cybern 7(1):25–45
Article Google Scholar
Henter D, Stahl A, Ebbecke M, Gillmann M (2015) Classifier self-assessment: active learning and active noise correction for document classification. In: 13th International conference on document analysis and recognition (ICDAR). IEEE, pp 276–280
Ipeirotis PG, Provost F, Sheng VS, Wang J (2014) Repeated labeling using multiple noisy labelers. Data Min Knowl Discov 28(2):402–441
Article MathSciNet MATH Google Scholar
Kremer J, Steenstrup Pedersen K, Igel C (2014) Active learning with support vector machines. Wiley Interdiscip Rev Data Min Knowl Discov 4(4):313–326
Article Google Scholar
Krithara A, Amini MR, Renders JM, Goutte C (2008) Semi-supervised document classification with a mislabeling error model. In: Macdonald C, Ounis I, Plachouras V, Ruthven I, White RW (eds) Advances in information retrieval, ECIR 2008. Lecture Notes in Computer Science, vol 4956. Springer, Berlin, Heidelberg, pp 370–381
Lin CH, Weld DS (2016) Re-active learning: active learning with relabeling. In: AAAI conference on artificial intelligence, pp 1845–1852
Natarajan N, Dhillon IS, Ravikumar PK,Tewari A (2013) Learning with noisy labels. In: Advances in neural information processing systems, pp 1196–1204
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830
MathSciNet MATH Google Scholar
Ramirez-Loaiza ME, Sharma M, Kumar G, Bilgic M (2016) Active learning: an empirical study of common baselines. Data Min Knowl Discove 31:287–313
Article MathSciNet Google Scholar
Rebbapragada U, Brodley CE, Sulla-Menashe D, Friedl MA (2012) Active label correction. In: IEEE 12th international conference on data mining (ICDM). IEEE, pp 1080–1085
Ren W, Li G (2015) Graph based semi-supervised learning via label fitting. Int J Mach Learn Cybern. doi:10.1007/s13042-015-0458-y
Google Scholar
Rosenberg A (2012) Classifying skewed data: importance weighting to optimize average recall. In: INTERSPEECH, pp 2242–2245
Roy N, McCallum A (2001) Toward optimal active learning through sampling estimation of error reduction. In: International conference on machine learning, pp 441–448
Settles B (2012) Active learning. Synth Lect Artif Intell Mach Learn 6(1):1–114
Article MathSciNet MATH Google Scholar
Settles B, Craven M, Ray S (2008) Multiple-instance active learning. Adv Neural Inf Process Syst 20:1289–1296
Google Scholar
Sharma M, Bilgic M (2013) Most-surely vs. least-surely uncertain. In: IEEE 12th International conference on data mining (ICDM). IEEE, pp 667–676
Small K, Roth D (2010) Margin-based active learning for structured predictions. Int J Mach Learn Cybern 1(1–4):3–25
Article Google Scholar
Tuia D, Munoz-Mari J (2013) Learning user’s confidence for active learning. IEEE Trans Geosci Remote Sens 51(2):872–880
Article Google Scholar
Vijayanarasimhan S, Grauman K (2012) Active frame selection for label propagation in videos. In: Fitzgibbon A, Lazebnik S, Perona P, Sato Y, Schmid C (eds) Computer Vision–ECCV 2012. Lecture Notes in Computer Science, vol 7576. Springer, Berlin, Heidelberg, pp 496–509
Wu J, Pan S, Cai Z, Zhu X, Zhang C (2014) Dual instance and attribute weighting for naive bayes classification. In: IEEE international conference on neural networks. IEEE, pp 1675–1679
Wu TF, Lin CJ, Weng RC (2004) Probability estimates for multi-class classification by pairwise coupling. J Mach Learn Res 5:975–1005
MathSciNet MATH Google Scholar
Zhang J, Wu X, Shengs VS (2015a) Active learning with imbalanced multiple noisy labeling. IEEE Trans Cybern 45(5):1095–1107
Article Google Scholar
Zhang XY, Wang S, Yun X (2015b) Bidirectional active learning: a two-way exploration into unlabeled and labeled data set. IEEE Trans Neural Netw Learn Syst 26(12):3034–3044
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Center for Applied Intelligent Systems Research, Halmstad University, 30118, Halmstad, Sweden
Mohamed-Rafik Bouguelia, Slawomir Nowaczyk & Antanas Verikas
Department of Computer Science, The University of South Dakota, 414 E Clark St, Vermillion, SD, 57069, USA
K. C. Santosh

Authors

Mohamed-Rafik Bouguelia
View author publications
You can also search for this author in PubMed Google Scholar
Slawomir Nowaczyk
View author publications
You can also search for this author in PubMed Google Scholar
K. C. Santosh
View author publications
You can also search for this author in PubMed Google Scholar
Antanas Verikas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to K. C. Santosh.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bouguelia, MR., Nowaczyk, S., Santosh, K.C. et al. Agreeing to disagree: active learning with noisy labels without crowdsourcing. Int. J. Mach. Learn. & Cyber. 9, 1307–1319 (2018). https://doi.org/10.1007/s13042-017-0645-0

Download citation

Received: 01 September 2016
Accepted: 18 January 2017
Published: 27 February 2017
Issue Date: August 2018
DOI: https://doi.org/10.1007/s13042-017-0645-0

Keywords

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Agreeing to disagree: active learning with noisy labels without crowdsourcing

Abstract

Access this article

Similar content being viewed by others

Learning from crowds with sparse and imbalanced annotations

Learning from Crowds in Multi-dimensional Classification Domains

Learning from crowds with active learning and self-healing

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Agreeing to disagree: active learning with noisy labels without crowdsourcing

Abstract

Access this article

Similar content being viewed by others

Learning from crowds with sparse and imbalanced annotations

Learning from Crowds in Multi-dimensional Classification Domains

Learning from crowds with active learning and self-healing

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation