Article

Experimental perspectives on learning from imbalanced data

Authors:
Jason Van Hulse

Florida Atlantic University, Boca Raton, FL

Florida Atlantic University, Boca Raton, FL
View Profile

,
Taghi M. Khoshgoftaar

Florida Atlantic University, Boca Raton, FL

Florida Atlantic University, Boca Raton, FL
View Profile

,
Amri Napolitano

Florida Atlantic University, Boca Raton, FL

Florida Atlantic University, Boca Raton, FL
View Profile

ICML '07: Proceedings of the 24th international conference on Machine learningJune 2007Pages 935–942https://doi.org/10.1145/1273496.1273614

Published:20 June 2007Publication History

ICML '07: Proceedings of the 24th international conference on Machine learning

Pages 935–942

ABSTRACT

We present a comprehensive suite of experimentation on the subject of learning from imbalanced data. When classes are imbalanced, many learning algorithms can suffer from the perspective of reduced performance. Can data sampling be used to improve the performance of learners built from imbalanced data? Is the effectiveness of sampling related to the type of learner? Do the results change if the objective is to optimize different performance metrics? We address these and other issues in this work, showing that sampling in many cases will improve classifier performance.

References

Barandela, R., Valdovinos, R. M., Sanchez, J. S., & Ferri, F. J. (2004). The imbalanced training sample problem: Under or over sampling? In Joint IAPR International Workshops on Structural, Syntactic, and Statistical Pattern Recognition (SSPR/SPR'04), Lecture Notes in Computer Science 3138, 806--814.Google ScholarCross Ref
Berenson, M. L., Levine, D. M., & Goldstein, M. (1983). Intermediate statistical methods and applications: A computer package approach. Prentice-Hall, Inc.Google Scholar
Blake, C., & Merz, C. (1998). UCI repository of machine learning databases. http://www.ics.uci.edu/mlearn/MLRepository.html. Department of Information and Computer Sciences, University of California, Irvine.Google Scholar
Breiman, L. (2001). Random forests. Machine Learning, 45, 5--32. Google ScholarDigital Library
Chawla, N. V., Hall, L. O., Bowyer, K. W., & Kegelmeyer, W. P. (2002). Smote: Synthetic minority oversampling technique. Journal of Artificial Intelligence Research, 321--357. Google ScholarDigital Library
Drummond, C., & Holte, R. C. (2003). C4.5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. Workshop on Learning from Imbalanced Data Sets II, International Conference on Machine Learning.Google Scholar
Han, H., Wang, W. Y., & Mao, B. H. (2005). Borderlinesmote: A new over-sampling method in imbalanced data sets learning. In International Conference on Intelligent Computing (ICIC'05). Lecture Notes in Computer Science 3644 (pp. 878--887). Springer-Verlag. Google ScholarDigital Library
Hand, D. J. (2005). Good practice in retail credit scorecard assessment. Journal of the Operational Research Society, 56, 1109--1117.Google ScholarCross Ref
Japkowicz, N. (2000). Learning from imbalanced data sets: a comparison of various strategies. AAAI Workshop on Learning from Imbalanced Data Sets (AAAI'00) (pp. 10--15).Google Scholar
Jo, T., & Japkowicz, N. (2004). Class imbalances versus small disjuncts. SIGKDD Explorations, 6, 40--49. Google ScholarDigital Library
Kubat, M., & Matwin, S. (1997). Addressing the curse of imbalanced training sets: One sided selection. Proceedings of the Fourteenth International Conference on Machine Learning (pp. 179--186). Morgan Kaufmann.Google Scholar
Maloof, M. (2003). Learning when data sets are imbalanced and when costs are unequal and unknown. Proceedings of the ICML'03 Workshop on Learning from Imbalanced Data Sets.Google Scholar
Monard, M. C., & Batista, G. E. A. P. A. (2002). Learning with skewed class distributions. Advances in Logic, Artificial Intelligence and Robotics (LAPTEC'02) (pp. 173--180).Google Scholar
Quinlan, J. R. (1993). C4.5: Programs for machine learning. San Mateo, California: Morgan Kaufmann. Google ScholarDigital Library
SAS Institute (2004). SAS/STAT user's guide. SAS Institute Inc.Google Scholar
Weiss, G. M., & Provost, F. (2003). Learning when training data are costly: the effect of class distribution on tree induction. Journal of Artificial Intelligence Research, 315--354. Google ScholarDigital Library
Witten, I. H., & Frank, E. (2005). Data mining: Practical machine learning tools and techniques. San Francisco, California: Morgan Kaufmann. 2nd edition. Google ScholarDigital Library
Wohlin, C., Runeson, P., Host, M., Ohlsson, M. C., Regnell, B., & Wesslen, A. (2000). Experimentation in software engineering: An introduction. Kluwer International Series in Software Engineering. Boston, MA: Kluwer Academic Publishers. Google ScholarDigital Library

Experimental perspectives on learning from imbalanced data
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning

Recommendations

Learning from Imbalanced Data

With the continuous expansion of data availability in many large-scale, complex, and networked systems, such as surveillance, security, Internet, and finance, it becomes critical to advance the fundamental understanding of knowledge discovery and ...
Read More
Multiset feature learning for highly imbalanced data classification
AAAI'17: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence

With the expansion of data, increasing imbalanced data has emerged. When the imbalance ratio of data is high, most existing imbalanced learning methods decline in classification performance. To address this problem, a few highly imbalanced learning ...
Read More
MWMOTE--Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning

Imbalanced learning problems contain an unequal distribution of data samples among different classes and pose a challenge to any classifier as it becomes hard to learn the minority class samples. Synthetic oversampling methods address this problem by ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICML '07: Proceedings of the 24th international conference on Machine learning
June 2007
1233 pages
ISBN:9781595937933
DOI:10.1145/1273496
Editor:
Zoubin Ghahramani
University of Cambridge, United Kingdom
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 June 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate140of548submissions,26%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 421
  Total Citations
  View Citations
- 2,506
  Total Downloads
- Downloads (Last 12 months)190
- Downloads (Last 6 weeks)28
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Experimental perspectives on learning from imbalanced data

ICML '07: Proceedings of the 24th international conference on Machine learning

ABSTRACT

References

Cited By

Recommendations

Learning from Imbalanced Data

Multiset feature learning for highly imbalanced data classification

MWMOTE--Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Experimental perspectives on learning from imbalanced data

ICML '07: Proceedings of the 24th international conference on Machine learning

ABSTRACT

References

Cited By

Recommendations

Learning from Imbalanced Data

Multiset feature learning for highly imbalanced data classification

MWMOTE--Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media