skip to main content
10.1145/2020408.2020553acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
poster

Active learning using on-line algorithms

Authors Info & Claims
Published:21 August 2011Publication History

ABSTRACT

This paper describes a new technique and analysis for using on-line learning algorithms to solve active learning problems. Our algorithm is called Active Vote, and it works by actively selecting instances that force several perturbed copies of an on-line algorithm to make mistakes. The main intuition for our result is based on the fact that the number of mistakes made by the optimal on-line algorithm is a lower bound on the number of labels needed for active learning. We provide performance bounds for Active Vote in both a batch and on-line model of active learning. These performance bounds depend on the algorithm having a set of unlabeled instances in which the various perturbed on-line algorithms disagree. The motivating application for Active Vote is an Internet advertisement rating program. We conduct experiments using data collected for this advertisement problem along with experiments using standard datasets. We show Active Vote can achieve an order of magnitude decrease in the number of labeled instances over various passive learning algorithms such as Support Vector Machines.

References

  1. D. Angluin. Queries revisited. Theor. Comput. Sci., 313:175--194, February 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. J. Attenberg and F. Provost. Why label when you can search? In KDD, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. F. Balcan, A. Beygelzimer, and J. Langford. Agnostic active learning. In ICML, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. F. Balcan, S. Hanneke, and J. W. Vaughan. The true sample complexity of active learning. Machine Learning, 80(2--3):111--139, 2010.Google ScholarGoogle Scholar
  5. A. Beygelzimer, S. Dasgupta, and J. Langford. Importance weighted active learning. In ICML, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Beygelzimer, D. Hsu, J. Langford, and T. Zhang. Agnostic active learning without constraints. In NIPS, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. H. D. Block. The perceptron: A model for brain functioning. Reviews of Modern Physics, 34(1):123--135, 1962.Google ScholarGoogle ScholarCross RefCross Ref
  8. N. Cesa-Bianchi, A. Conconi, and C. Gentile. On the generalization ability of on-line learning algorithms. IEEE Transactions on Information Theory, 50(9):2050--2057, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. N. Cesa-Bianchi, C. Gentile, and L. Zaniboni. Worst-case analysis of selective sampling for linear classification. Journal of Machine Learning Research, 7:1205--1230, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. Cortes and V. Vapnik. Support-vector networks. Machine Learning, 20(3):273--297, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Dasgupta. Coarse sample complexity bounds for active learning. In Y. Weiss, B. Schölkopf, and J. Platt, editors, NIPS, pages 235--242, Cambridge, MA, 2006. MIT Press.Google ScholarGoogle Scholar
  12. S. Dasgupta, D. Hsu, and C. Monteleoni. A general agnostic active learning algorithm. In NIPS, 2007.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. C. B. D.J. Newman, S. Hettich and C. Merz. UCI repository of machine learning databases, 1998; http://archive.ics.uci.edu/ml/.Google ScholarGoogle Scholar
  14. B. Edelman, M. Ostrovsky, and M. Schwarz. Internet advertising and the generalized second-price auction American Economic Review, 97(1):242--259, 2007.Google ScholarGoogle Scholar
  15. Y. Freund and R. E. Schapire. Large margin classification using the perceptron algorithm. In COLT, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Y. Freund, H. S. Seung, E. Shamir, and N. Tishby. Selective sampling using the query by committee algorithm. Machine Learning, 28:133--1680, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. Gentile. A new approximate maximal margin classification algorithm. Machine Learning, 2:213--242, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Hanneke. Rates of convergence in active learning. The Annals of Statistics, 39(1):333--361, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  19. M.-C. Jenkins and D. Smith. Conservative stemming for search and indexing. In SIGIR, 2005.Google ScholarGoogle Scholar
  20. T. Joachims. Making large-scale support vector machine learning practical. In A. S. B. Schölkopf, C. Burges, editor, Advances in Kernel Methods: Support Vector Machines. MIT Press, Cambridge, MA, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. D. Lewis and W. A. Gale. A sequential algorithm for training text classifiers. In SIGIR, pages 3--12. Springer-Verlag, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R. Liere and P. Tadepalli. Active learning with committees for text categorization. In AAAI, pages 591--596, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. N. Littlestone. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2:285--318, 1988. Google ScholarGoogle ScholarCross RefCross Ref
  24. N. Littlestone. From on-line to batch learning. In COLT, pages 269--284, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. N. Littlestone. Mistake Bounds and Linear-threshold Learning Algorithms. PhD thesis, Computer Science, University of California, Santa Cruz, 1989. Technical Report UCSC-CRL-89--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. W. Maass and M. K. Warmuth. Efficient learning with virtual threshold gates. Information and Computation, 141:378--386, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. C. Mesterharm. Improving On-line Learning. PhD thesis, Computer Science, Rutgers, The State University of New Jersey, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. B. Settles. Active learning literature survey. Computer Sciences Technical Report 1648, University of Wisconsin--Madison, 2009.Google ScholarGoogle Scholar
  29. V. Vapnik and A. Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16:264--280, 1971.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Active learning using on-line algorithms

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
      August 2011
      1446 pages
      ISBN:9781450308137
      DOI:10.1145/2020408

      Copyright © 2011 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 21 August 2011

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • poster

      Acceptance Rates

      Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader