Skip to main content
Log in

Large-width machine learning algorithm

  • Regular Paper
  • Published:
Progress in Artificial Intelligence Aims and scope Submit manuscript

Abstract

We introduce an algorithm, called Large Width (LW), that produces a multi-category classifier (defined on a distance space) with the property that the classifier has a large ‘sample width.’ (Width is a notion similar to classification margin.) LW is an incremental instance-based (also known as ‘lazy’) learning algorithm. Given a sample of labeled and unlabeled examples, it iteratively picks the next unlabeled example and classifies it while maintaining a large distance between each labeled example and its nearest-unlike prototype. (A prototype is either a labeled example or an unlabeled example which has already been classified.) Thus, LW gives a higher priority to unlabeled points whose classification decision ‘interferes’ less with the labeled sample. On a collection UCI benchmark datasets, the LW algorithm ranks at the top when compared to 11 instance-based learning algorithms (or configurations). When compared to the best candidate from instance-based learners, MLP, SVM, decision tree learner (C4.5) and Naive Bayes, LW is ranked at second place after only MLP which comes at first place by a single extra win against LW. The LW algorithm can be implemented in parallel distributed processing to yield a high speedup factor and is suitable for any distance space, with a distance function which need not necessarily satisfy the conditions of a metric.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991)

    Google Scholar 

  2. Anthony, M., Bartlett, P.L.: Neural Network Learning: Theoretical Foundations. Cambridge University Press, Cambridge (1999)

    Book  MATH  Google Scholar 

  3. Anthony, M., Ratsaby, J.: Maximal width learning of binary functions. Theoret. Comput. Sci. 411, 138–147 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  4. Anthony, M., Ratsaby, J.: Analysis of a multi-category classifier. Discret. Appl. Math. 160(16–17), 2329–2338 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  5. Anthony, M., Ratsaby, J.: A hybrid classifier based on boxes and nearest neighbors. Discret. Appl. Math. 172, 1–11 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  6. Anthony, M., Ratsaby, J.: Learning bounds via sample width for classifiers on finite metric spaces. Theoret. Comput. Sci. 529, 2–10 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  7. Anthony, M., Ratsaby, J.: A probabilistic approach to case-based inference. Theoret. Comput. Sci. 589, 61–75 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  8. Anthony, M., Ratsaby, J.: Multi-category classifiers and sample width. J. Comput. Syst. Sci. 82(8), 1223–1231 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  9. Anthony, M., Ratsaby, J.: Classification based on prototypes with spheres of influence. Inf. Comput. 256, 372–380 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  10. Anthony, M., Ratsaby, J.: Large-width bounds for learning half-spaces on distance spaces. Discret. Appl. Math. 243, 73–89 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  11. Anthony, M., Ratsaby, J.: Large width nearest prototype classification on general distance spaces. Theoret. Comput. Sci. 738, 65–79 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  12. Atkeson, C.G., Moore, A.W., Schaal, S.: Locally weighted learning. Artif. Intell. Rev. 11(1–5), 11–73 (1997)

    Article  Google Scholar 

  13. Chester, U., Ratsaby, J.: Universal distance measure for images. In: Proceedings of the \(27th\) IEEE Convention of Electrical Electronics Engineers in Israel (IEEEI’12), pages 1–4, Eilat, Israel, November 14–17 (2012)

  14. Chester, U., Ratsaby, J.: Machine learning for image classification and clustering using a universal distance measure. In: N. Brisaboa, O. Pedreira, and P. Zezula, editors, Proceedings of the 6th International Conference on Similarity Search and Applications (SISAP’13), volume 8199 of Springer Lecture Notes in Computer Science, pages 59–72 (2013)

  15. Cilibrasi, R., Vitanyi, P.: Clustering by compression. IEEE Trans. Inf. Theory 51(4), 1523–1545 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  16. Cleary, J.G., Trigg, K.E.: K*: An instance-based learner using and entropic distance measure. In: Proceedings of the Twelfth International Conference on International Conference on Machine Learning, ICML’95, 108–114, Morgan Kaufmann Publishers Inc, San Francisco (1995)

  17. Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)

    Article  MATH  Google Scholar 

  18. Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and other Kernel-based learning methods. Cambridge University Press, Cambridge (2000)

    Book  MATH  Google Scholar 

  19. Deza, M., Deza, E.: Encyclopedia of Distances, volume 15 of Series in Computer Science. Springer-Verlag, (2009)

  20. Dudani, S.A.: The distance-weighted k-nearest-neighbor rule. IEEE Trans. Syst. Man Cybernet. SMC–6(4), 325–327 (1976)

    Article  Google Scholar 

  21. Duin, R.P.W., Pekalska, E., Loog, M.: Non-euclidean dissimilarities: causes, embedding and informativeness. In: Pelillo, M. (ed.) Similarity-Based Pattern Analysis and Recognition Advances in Computer Vision and Pattern Recognition. Springer, Berlin (2013)

    Google Scholar 

  22. Frank, E., Hall, M.A., Witten, I.: The WEKA Workbench. Practical Machine Learning Tools and Techniques. Morgan Kaufmann, fourth edition, Online Appendix for Data Mining (2016)

    Google Scholar 

  23. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explorat. 11(1), 10–18 (2009)

    Article  Google Scholar 

  24. Li, M., Chen, X., Li, X., Ma, B., Vitanyi, P.: The similarity metric. IEEE Trans. Info. Theory 50(12), 3250–3264 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  25. Mitchell, T.: Machine Learning. McGraw Hill, New York (1997)

    MATH  Google Scholar 

  26. Nadeau, C., Bengio, Y.: Inference for the generalization error. Mach. Learn. 52(3), 239–281 (2003)

    Article  MATH  Google Scholar 

  27. Pekalska, E., Duin, R.P.W.: The Dissimilarity Representation for Pattern Recognition: Foundations And Applications (Machine Perception and Artificial Intelligence). World Scientific Publishing Co.Inc, River Edge, NJ (2005)

    Book  MATH  Google Scholar 

  28. Ratsaby, J., Sabaty, A.: Parallelizing the large width learning algorithm. In: IEEE International Conference on the Science of Electrical Engineering (ICSEE’2018), 1–5, December 14–16 (2018)

  29. UCI Machine Learning Repository

  30. Smola, A.J., Bartlett, P.L., Scholkopf, B., Schuurmans, D.: Advances in Large-Margin Classifiers (Neural Information Processing). MIT Press, Cambridge (2000)

    Book  MATH  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the Ministry of Science & Technology, ISRAEL. The authors thank the reviewers for their thoughtful comments and suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joel Ratsaby.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

For the experiment displayed in Table 2, the algorithms’ parameter settings are as follows:

(1)

meta.FilteredClassifier ’-F \(\backslash \)”supervised.instance.ClassBalancer -num-intervals 10\(\backslash \)” -S 1 -W lazy.LW – -M 0 -I 0 -R true’ -4523450618538717400

(2)

meta.FilteredClassifier ’-F \(\backslash \)”supervised.instance.ClassBalancer -num-intervals 10\(\backslash \)” -S 1 -W lazy.IBk – -K 1 -W 0 -A \(\backslash \)”weka.core.neighboursearch.LinearNNSearch -A /\(\backslash \)”weka.core.EuclideanDistance -R first-last/\(\backslash \)\(\backslash \)”’ -4523450618538717400

(3)

meta.FilteredClassifier ’-F \(\backslash \)”supervised.instance.ClassBalancer -num-intervals 10\(\backslash \)” -S 1 -W lazy.IBk – -K 1 -W 0 -I -A \(\backslash \)”weka.core.neighboursearch.LinearNNSearch -A /\(\backslash \)”weka.core.EuclideanDistance -R first-last/\(\backslash \)\(\backslash \)”’ -4523450618538717400

(4)

meta.FilteredClassifier ’-F \(\backslash \)”supervised.instance.ClassBalancer -num-intervals 10\(\backslash \)” -S 1 -W lazy.IBk – -K 1 -W 0 -F -A \(\backslash \)”weka.core.neighboursearch.LinearNNSearch -A /\(\backslash \)”weka.core.EuclideanDistance -R first-last/\(\backslash \)\(\backslash \)”’ -4523450618538717400

(5)

meta.FilteredClassifier ’-F \(\backslash \)”supervised.instance.ClassBalancer -num-intervals 10\(\backslash \)” -S 1 -W lazy.IBk – -K 5 -W 0 -X -A \(\backslash \)”weka.core.neighboursearch.LinearNNSearch -A /\(\backslash \)”weka.core.EuclideanDistance -R first-last/\(\backslash \)\(\backslash \)”’ -4523450618538717400

(6)

meta.FilteredClassifier ’-F \(\backslash \)”supervised.instance.ClassBalancer -num-intervals 10\(\backslash \)” -S 1 -W lazy.IBk – -K 5 -W 0 -X -I -A \(\backslash \)”weka.core.neighboursearch.LinearNNSearch -A /\(\backslash \)”weka.core.EuclideanDistance -R first-last/\(\backslash \)\(\backslash \)”’ -4523450618538717400

(7)

meta.FilteredClassifier ’-F \(\backslash \)”supervised.instance.ClassBalancer -num-intervals 10\(\backslash \)” -S 1 -W lazy.IBk – -K 5 -W 0 -X -F -A \(\backslash \)”weka.core.neighboursearch.LinearNNSearch -A /\(\backslash \)”weka.core.EuclideanDistance -R first-last/\(\backslash \)\(\backslash \)”’ -4523450618538717400

(8)

meta.FilteredClassifier ’-F \(\backslash \)”supervised.instance.ClassBalancer -num-intervals 10\(\backslash \)” -S 1 -W lazy.IBk – -K 10 -W 0 -X -A \(\backslash \)”weka.core.neighboursearch.LinearNNSearch -A /\(\backslash \)”weka.core.EuclideanDistance -R first-last/\(\backslash \)\(\backslash \)”’ -4523450618538717400

(9)

meta.FilteredClassifier ’-F \(\backslash \)”supervised.instance.ClassBalancer -num-intervals 10\(\backslash \)” -S 1 -W lazy.IBk – -K 10 -W 0 -X -I -A \(\backslash \)”weka.core.neighboursearch.LinearNNSearch -A /\(\backslash \)”weka.core.EuclideanDistance -R first-last/\(\backslash \)\(\backslash \)”’ -4523450618538717400

(10)

meta.FilteredClassifier ’-F \(\backslash \)”supervised.instance.ClassBalancer -num-intervals 10\(\backslash \)” -S 1 -W lazy.IBk – -K 10 -W 0 -X -F -A \(\backslash \)”weka.core.neighboursearch.LinearNNSearch -A /\(\backslash \)”weka.core.EuclideanDistance -R first-last/\(\backslash \)\(\backslash \)”’ -4523450618538717400

(11)

meta.FilteredClassifier ’-F \(\backslash \)”supervised.instance.ClassBalancer -num-intervals 10\(\backslash \)” -S 1 -W lazy.KStar – -B 20 -M a’ -4523450618538717400

(12)

meta.FilteredClassifier ’-F \(\backslash \)”supervised.instance.ClassBalancer -num-intervals 10\(\backslash \)” -S 1 -W lazy.LWL – -U 0 -K -1 -A \(\backslash \)”weka.core.neighboursearch.LinearNNSearch -A /\(\backslash \)”weka.core.EuclideanDistance -R first-last/\(\backslash \)\(\backslash \)” -W trees.DecisionStump’ -4523450618538717400

For the experiment displayed in Table 6, the algorithms’ parameter settings are as follows:

(1)

meta.FilteredClassifier ’-F \(\backslash \)”supervised.instance.ClassBalancer -num-intervals 10\(\backslash \)” -S 1 -W lazy.LW – -M 0 -I 0 -R true’ -4523450618538717400

(2)

meta.FilteredClassifier ’-F \(\backslash \)”supervised.instance.ClassBalancer -num-intervals 10\(\backslash \)” -S 1 -W lazy.IBk – -K 5 -W 0 -X -F -A \(\backslash \)”weka.core.neighboursearch.LinearNNSearch -A /\(\backslash \)”weka.core.EuclideanDistance -R first-last/\(\backslash \)\(\backslash \)”’ -4523450618538717400

(3)

meta.FilteredClassifier ’-F \(\backslash \)”supervised.instance.ClassBalancer -num-intervals 10\(\backslash \)” -S 1 -W functions.MultilayerPerceptron – -L 0.3 -M 0.2 -N 500 -V 0 -S 0 -E 20 -H a’ -4523450618538717400

(4)

meta.FilteredClassifier ’-F \(\backslash \)”supervised.instance.ClassBalancer -num-intervals 10\(\backslash \)” -S 1 -W trees.J48 – -C 0.25 -M 2’ -4523450618538717400

(5)

meta.FilteredClassifier ’-F \(\backslash \)”supervised.instance.ClassBalancer -num-intervals 10\(\backslash \)” -S 1 -W functions.SMO – -C 1.0 -L 0.001 -P 1.0E-12 -N 0 -V -1 -W 1 -K \(\backslash \)”functions.supportVector.PolyKernel -E 1.0 -C 250007\(\backslash \)” -calibrator \(\backslash \)”functions.Logistic -R 1.0E-8 -M -1 -num-decimal-places 4\(\backslash \)”’ -4523450618538717400

(6)

meta.FilteredClassifier ’-F \(\backslash \)”supervised.instance.ClassBalancer -num-intervals 10\(\backslash \)” -S 1 -W bayes.NaiveBayes’ -4523450618538717400

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Anthony, M., Ratsaby, J. Large-width machine learning algorithm. Prog Artif Intell 9, 275–285 (2020). https://doi.org/10.1007/s13748-020-00212-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13748-020-00212-4

Keywords

Navigation