Skip to main content

Linear and Order Statistics Combiners for Pattern Classification

  • Chapter
Combining Artificial Neural Nets

Part of the book series: Perspectives in Neural Computing ((PERSPECT.NEURAL))

Summary

Several researchers have experimentally shown that substantial improvements can be obtained in difficult pattern recognition problems by combining or integrating the outputs of multiple classifiers. This chapter provides an analytical framework to quantify the improvements in classification results due to combining. The results apply to both linear combiners and order statistics combiners. We first show that to a first order approximation, the error rate obtained over and above the Bayes error rate, is directly proportional to the variance of the actual decision boundaries around the Bayes optimum boundary. Combining classifiers in output space reduces this variance, and hence reduces the “added” error. If N unbiased classifiers are combined by simple averaging, the added error rate can be reduced by a factor of N if the individual errors in approximating the decision boundaries are uncorrelated. Expressions are then derived for linear combiners which are biased or correlated, and the effect of output correlations on ensemble performance is quantified. For order statistics based non-linear combiners, we derive expressions that indicate how much the median, the maximum and in general the ith order statistic can improve classifier performance. The analysis presented here facilitates the understanding of the relationships among error rates, classifier boundary distributions, and combining in output space. Experimental results on several public domain data sets are provided to illustrate the benefits of combining and to support the analytical results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. K. Al-Ghoneim and B. V. K. Vijaya Kumar. Learning ranks with neural networks (Invited paper). In Applications and Science of Artificial Neural Networks, Proceedings of the SPIE, volume 2492, pages 446–464, April 1995.

    Google Scholar 

  2. K. M. Ali and M. J. Pazzani. On the link between error correlation and error reduction in decision tree ensembles. Technical Report 95–38, Department of Information and Computer Science, University of California, Irvine, 1995.

    Google Scholar 

  3. B.C. Arnold, N. Balakrishnan, and H.N. Nagaraja. A First Course in Order StatisticsWiley, New York, 1992.

    MATH  Google Scholar 

  4. J.A. Barnett. Computational methods for a mathematical theory of evidence. In Proceedings of the Seventh International Joint Conference on Artificial Intelligence, pages 868–875, August 1981.

    Google Scholar 

  5. R. Battiti and A. M. Colla. Democracy in neural nets: Voting schemes for classification. Neural Networks, 7 (4): 691–709, 1994.

    Article  Google Scholar 

  6. W. G. Baxt. Improving the accuracy of an artificial neural network using multiple differently trained networks. Neural Computation, 4: 772–780, 1992.

    Article  Google Scholar 

  7. J.A. Benediktsson, J.R. Sveinsson, O.K. Ersoy, and P.H. Swain. Parallel consensual neural networks with optimally weighted outputs. In Proceedings of the World Congress on Neural Networks, pages III:129–137. INNS Press, 1994.

    Google Scholar 

  8. V. Biou, J.F. Gibrat, J.M. Levin, B. Robson, and J. Gamier. Secondary structure prediction: combination of three different methods. Protein Engineering, 2: 185–91, 1988.

    Article  Google Scholar 

  9. L. Breiman. Stacked regression. Technical Report 367, Department of Statistics, University of California, Berkeley, 1993.

    Google Scholar 

  10. L. Breiman. Bagging predictors. Technical Report 421, Department of Statistics, University of California, Berkeley, 1994.

    Google Scholar 

  11. P.K. Chan and S.J. Stolfo. A comparative evaluation of voting and meta-learning on partitioned data. In Proceedings of the Twelfth International Machine Learning Conference, pages 90–98, Tahoe City, CA, 1995. Morgan Kaufmann.

    Google Scholar 

  12. H. A. David. Order StatisticsWiley, New York, 1970.

    MATH  Google Scholar 

  13. H. Drucker and C. Cortes. Boosting decision trees. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems-8, pages 479–485. M.I.T. Press, 1996.

    Google Scholar 

  14. H. Drucker, C. Cortes, L. D. Jackel, Y. LeCun, and V. Vapnik. Boosting and other ensemble methods. Neural Computation, 6 (6): 1289–1301, 1994.

    Article  MATH  Google Scholar 

  15. H. Drucker, R. Schapire, and P. Simard. Improving performance in neural networks using a boosting algorithm. In S.J. Hanson, J. D. Cowan, and C. L. Giles, editors, Advances in Neural Information Processing Systems-5, pages 42–49. Morgan Kaufmann, 1993.

    Google Scholar 

  16. R. O. Duda and P. E. Hart. Pattern Classification and Scene AnalysisWiley, New York, 1973.

    MATH  Google Scholar 

  17. B. Efron. The Jackknife, the Bootstrap and Other Resarnpling PlansSIAM, Philadelphia, 1982.

    Google Scholar 

  18. B. Efron. Estimating the error rate of a prediction rule. Journal of the American Statistical Association, 78: 316–333, 1983.

    Article  MathSciNet  MATH  Google Scholar 

  19. Y. Freund and R. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. In Proceedings of the Second European Conference on Computational Learning Theory, pages 23–37. Springer Verlag, March 1995.

    Google Scholar 

  20. Y. Freund and R. Schapire. Experiments with a new boosting algorithm. In Proceedings of the Thirteenth International Conference on Machine Learning, pages 148–156. Morgan Kaufmann, 1996.

    Google Scholar 

  21. J. H. Friedman. An overview of predictive learning and function approximation. In V. Cherkassky, J.H. Friedman, and H. Wechsler, editors, From Statistics to Neural Networks, Proc. NATO/ASI Workshop, pages 1–55. Springer Verlag, 1994.

    Google Scholar 

  22. K. Fukunaga. Introduction to Statistical Pattern Recognition(2nd Ed.), Academic Press, 1990.

    Google Scholar 

  23. T.D. Garvey, J.D. Lowrance, and M.A. Fischler. An inference technique for integrating knowledge from disparate sources. In Proceedings of the Seventh International Joint Conference on Artificial Intelligence, pages 319–325, August 1981.

    Google Scholar 

  24. S. Geman, E. Bienenstock, and R. Doursat. Neural networks and the bias/variance dilemma. Neural Computation, 4 (1): 1–58, 1992.

    Article  Google Scholar 

  25. J. Ghosh, L. Deuser, and S. Beck. A neural network based hybrid system for detection, characterization and classification of short-duration oceanic signals. IEEE Journal of Ocean Engineering, 17 (4): 351–363, October 1992.

    Article  Google Scholar 

  26. J. Ghosh and K. Turner. Structural adaptation and generalization in supervised feedforward networks. Journal of Artificial Neural Networks, 1 (4): 431–458, 1994.

    Google Scholar 

  27. J. Ghosh, K. Turner, S. Beck, and L. Deuser. Integration of neural classifiers for passive sonar signals. In C.T. Leondes, editor, Control and Dynamic Systems-Advances in Theory and Applications, volume 77, pages 301–338. Academic Press, 1996.

    Google Scholar 

  28. C. W. J. Granger. Combining forecasts-twenty years later. Journal of Forecasting, 8 (3): 167–173, 1989.

    Article  MathSciNet  Google Scholar 

  29. J.B. Hampshire and A.H. Waibel. The Meta-Pi network: Building distributed representations for robust multisource pattern recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14 (7): 751–769, 1992.

    Article  Google Scholar 

  30. L. K. Hansen and P. Salamon. Neural network ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12 (10): 993–1000, 1990.

    Article  Google Scholar 

  31. S. Hashem and B. Schmeiser. Approximating a function and its derivatives using MSE-optimal linear combinations of trained feedforward neural networks. In Proceedings of the Joint Conference on Neural Networks, volume 87, pages I:617–620, New Jersey, 1993.

    Google Scholar 

  32. D. Heckerman. Probabilistic interpretation for MYCIN’s uncertainty factors. In L.N Kanal and J.F. Lemmer, editors, Uncertainty in Artificial Intelligence, pages 167–196. North-Holland, 1986.

    Google Scholar 

  33. T. K. Ho, J. J. Hull, and S. N. Srihari. Decision combination in multiple classifier systems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16 (1): 66–76, 1994.

    Article  Google Scholar 

  34. Robert Jacobs. Method for combining experts’ probability assessments. Neural Computation, 7 (5): 867–888, 1995.

    Article  Google Scholar 

  35. A. Jain, R. Dubes, and C. Chen. Bootstrap techniques for error estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 9: 628–633, 1987.

    Article  MATH  Google Scholar 

  36. A. Krogh and J. Vedelsby. Neural network ensembles, cross validation and active learning. In G. , D. S. Touretzky, and T. K. Leen, editors, Advances in Neural Information Processing Systems-7, pages 231–238. M.I.T. Press, 1995.

    Google Scholar 

  37. J. Lee, J.-N. Hwang, D.T. Davis, and A.C. Nelson. Integration of neural networks and decision tree classifiers for automated cytology screening. In Proceedings of the International Joint Conference on Neural Networks, Seattle, pages I:257–262, July 1991.

    Chapter  Google Scholar 

  38. E. Levin, N. Tishby, and S. A. Solla. A statistical approach to learning and generalization in layered neural networks. Proc. IEEE, 78 (10): 1568–74, Oct 1990.

    Article  Google Scholar 

  39. W.P. Lincoln and J. Skrzypek. Synergy of clustering multiple back propagation networks. In D. Touretzky, editor, Advances in Neural Information Processing Systems-2, pages 650–657. Morgan Kaufmann, 1990.

    Google Scholar 

  40. O. L. Mangasarian, R. Setiono, and W. H. Wolberg. Pattern recognition via linear programming: Theory and application to medical diagnosis. In Thomas F. Coleman and Yuying Li, editors, Large-Scale Numerical Optimization, pages 22–30. SIAM Publications, 1990.

    Google Scholar 

  41. R. Meir. Bias, variance, and the combination of estimators; the case of least linear squares. In G. Tesauro, D. S. Touretzky, and T. K. Leen, editors, Advances in Neural Information Processing Systems-7, pages 295–302. M.I.T. Press, 1995.

    Google Scholar 

  42. C.J. Merz and M.J. Pazzani. Combining neural network regression estimates with regularized linear weights. In M. C. Mozer, M. I. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems-9, pages 564–570. M.I.T. Press, 1997.

    Google Scholar 

  43. R.S. Michalski and R.L. Chilausky. Learning by being told and learning from examples: An experimental comparison of the two methods of knowledge acquisition in the context of developing an expert system for soybean disease diagnosis. International Journal of Policy Analysis and Information Systems, 4(2), 1980.

    Google Scholar 

  44. N. J. Nilsson. Learning Machines: Foundations of Trainable Pattern-Classifying SystemsMcGraw Hill, NY, 1965.

    MATH  Google Scholar 

  45. M. O. Noordewier, G. G. Towell, and J. W. Shavlik. Training knowledge-based neural networks to recognize genes in DNA sequences. In R.P. Lippmann, J.E. Moody, and D.S. Touretzky, editors, Advances in Neural Information Processing Systems-3, pages 530–536. Morgan Kaufmann, 1991.

    Google Scholar 

  46. D. W. Opitz and J. W. Shavlik. Generating accurate and diverse members of a neural-network ensemble. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems-8, pages 535–541. M.I.T. Press, 1996.

    Google Scholar 

  47. B. Parmanto, P. W. Munro, and H. R. Doyle. Reducing variance of committee prediction with resampling techniques. Connection Science, Special Issue on Combining Artificial Neural Networks: Ensemble Approaches, 8 (3 & 4): 405–426, 1996.

    Google Scholar 

  48. M.P. Perrone and L. N. Cooper. Learning from what’s been learned: Supervised learning in multi-neural network systems. In Proceedings of the World Congress on Neural Networks, pages I11:354–357. INNS Press, 1993.

    Google Scholar 

  49. M.P. Perrone and L. N. Cooper. When networks disagree: Ensemble methods for hybrid neural networks. In R. J. Mammone, editor, Neural Networks for Speech and Image Processing, chapter 10. Chapmann-Hall, 1993.

    Google Scholar 

  50. Lutz Prechelt. PROBEN1 - A set of benchmarks and benchmarking rules for neural network training algorithms. Technical Report 21/94, Fakultät für Informatik, Universität Karlsruhe, D-76128 Karlsruhe, Germany, September 1994. Anonymous FTP: /pub/papers/techreports/1994/1994–21.ps.Z on ftp.ira.uka.de.

    Google Scholar 

  51. J.R. Quinlan. Simplifying decision trees. International Journal of Man-Machine Studies, 27: 221–234, December 1987.

    Article  Google Scholar 

  52. J.R. Quinlan. C4.5: Programs for Machine LearningMorgan Kaufman, San Mateo, California, 1992.

    Google Scholar 

  53. E. Rich and K. Knight. Artificial IntelligenceMcGraw-Hill, Inc., 2 edition, 1991.

    Google Scholar 

  54. M.D. Richard and R.P. Lippmann. Neural network classifiers estimate Bayesian a posteriori probabilities. Neural Computation, 3 (4): 461–483, 1991.

    Article  Google Scholar 

  55. G. Rogova. Combining the results of several neural network classifiers. Neural Networks, 7 (5): 777–781, 1994.

    Article  Google Scholar 

  56. B. Rosen. Ensemble learning using decorrelated neural networks. Connection Science, Special Issue on Combining Artificial Neural Networks: Ensemble Approaches, 8 (3 & 4): 373–384, 1996.

    Google Scholar 

  57. D. W. Ruck, S. K. Rogers, M. E. Kabrisky, M. E. Oxley, and B. W. Suter. The multilayer Perceptron as an approximation to a Bayes optimal discriminant function. IEEE Transactions on Neural Networks, 1 (4): 296–298, 1990.

    Article  Google Scholar 

  58. A. E. Sarhan and B. G. Greenberg. Estimation of location and scale parameters by order statistics from singly and doubly censored samples. Annals of Mathematical Statistics Science, 27: 427–451, 1956.

    Article  MathSciNet  MATH  Google Scholar 

  59. R. Schapire, Y. Freund, P. Bartlett, and Lee W.S. Boosting the margin: A new explanation for the effectiveness of voting methods. In Proceedings of the Fourteenth International Conference on Machine LearningMorgan Kaufmann, 1997.

    Google Scholar 

  60. A. J. J. Sharkey. (editor). Connection Science: Special Issue on Combining Artificial Neural Networks: Ensemble Approaches, 8(3 & 4 ), 1996.

    Google Scholar 

  61. S. Shlien. Multiple binary decision tree classifiers. Pattern Recognition, 23 (7): 757–63, 1990.

    Article  Google Scholar 

  62. P.A. Shoemaker, M.J. Carlin, R.L. Shimabukuro, and C.E. Priebe. Least squares learning and approximation of posterior probabilities on classification problems by neural network models. In Proc. 2nd Workshop on Neural Networks, WNN-AIND91,Auburn, pages 187–196, February 1991.

    Google Scholar 

  63. J. W. Smith, J. E. Everhart, W. C. Dickson, W. C. Knowler, and R. S. Johannes. Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In Proceedings of the Symposium on Computer Applications and Medical Care, pages 261–265. IEEE Computer Society Press, 1988.

    Google Scholar 

  64. P. Sollich and A. Krogh. Learning with ensembles: How overfitting can be useful. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems-8, pages 190–196. M.I.T. Press, 1996.

    Google Scholar 

  65. M. Stone. Cross-validatory choice and assessment of statistical prediction. Journal of the Royal Statistical Society, 36: 111–147, 1974.

    MATH  Google Scholar 

  66. G. G. Towell and J. W. Shavlik. Interpretation of artificial neural networks: Mapping knowledge-based neural networks into rules. In J.E. Moody, S.J. Hanson, and R.P. Lippmann, editors, Advances in Neural Information Processing Systems-4, pages 977–984. Morgan Kaufmann, 1992.

    Google Scholar 

  67. K. Turner and J. Ghosh. Limits to performance gains in combined neural classifiers. In Proceedings of the Artificial Neural Networks in Engineering ‘85, pages 419–424, St. Louis, 1995.

    Google Scholar 

  68. K. Turner and J. Ghosh. order statistics combiners for neural classifiers. In Proceedings of the World Congress on Neural Networks, pages 1:31–34, Washington D.C., 1995. INNS Press.

    Google Scholar 

  69. K. Turner and J. Ghosh. Analysis of decision boundaries in linearly combined neural classifiers. Pattern Recognition, 29 (2): 341–348, February 1996.

    Article  Google Scholar 

  70. K. Turner and J. Ghosh. Error correlation and error reduction in ensemble classifiers. Connection Science, Special Issue on Combining Artificial Neural Networks: Ensemble Approaches, 8 (3 &4): 385–404, 1996.

    Google Scholar 

  71. K. Turner and J. Ghosh. Estimating the Bayes error rate through classifier combining. In Proceedings of the International Conference on Pattern Recognition, Vienna, Austria, pages IV:695–699, 1996.

    Google Scholar 

  72. K. Turner and J. Ghosh. Classifier combining through trimmed means and order statistics. In Proceedings of the International Joint Conference on Neural Networks, Anchorage, Alaska, 1998.

    Google Scholar 

  73. Kagan Turner. Linear and Order Statistics Combiners for Reliable Pattern ClassificationPhD thesis, The University of Texas, Austin, TX, May 1996.

    Google Scholar 

  74. J. M. Twomey and A. E. Smith. Committee networks by resampling. In C. H. Dagli, M. Akay, C. L. P. Chen, B. R. Fernandez, and J. Ghosh, editors, Intelligent Engineering Systems through Artificial Neural Networks, volume 5, pages 153–158. ASME Press, 1995.

    Google Scholar 

  75. S. M. Weiss and C.A. Kulikowski. Computer Systems That LearnMorgan Kaufmann, 1991.

    Google Scholar 

  76. William H. Wolberg and O.L. Mangasarian. Multisurface method of pattern separation for medical diagnosis applied to breast cytology. In Proceedings of the National Academy of Sciences, volume 87, pages 9193–9196, U.S.A, December 1990.

    Google Scholar 

  77. D. H. Wolpert. A mathematical theory of generalization. Complex Systems, 4: 151–200, 1990.

    MathSciNet  MATH  Google Scholar 

  78. D. H. Wolpert. Stacked generalization. Neural Networks, 5: 241–259, 1992.

    Article  Google Scholar 

  79. D. H. Wolpert. The existence of a priori distinctions between learning algorithms. Neural Computation, 8: 1391–1420, 1996.

    Article  Google Scholar 

  80. D. H. Wolpert. The lack of a priori distinctions between learning algorithms. Neural Computation, 8: 1341–1390, 1996.

    Article  Google Scholar 

  81. L. Xu, A. Krzyzak, and C. Y. Suen. Methods of combining multiple classifiers and their applications to handwriting recognition. IEEE Transactions on Systems, Man and Cybernetics, 22 (3): 418–435, May 1992.

    Article  Google Scholar 

  82. J.-B. Yang and M. G. Singh. An evidential reasoning approach for multiple-attribute decision making with uncertainty. IEEE Transactions on Systems, Man, and Cybernetics, 24 (1): 1–19, 1994.

    Article  Google Scholar 

  83. X. Zhang, J.P. Mesirov, and D.L. Waltz. Hybrid system for protein secondary structure prediction. J. Molecular Biology, 225: 1049–63, 1992.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag London Limited

About this chapter

Cite this chapter

Sharkey, A.J.C. (1999). Linear and Order Statistics Combiners for Pattern Classification. In: Sharkey, A.J.C. (eds) Combining Artificial Neural Nets. Perspectives in Neural Computing. Springer, London. https://doi.org/10.1007/978-1-4471-0793-4_6

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-0793-4_6

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-85233-004-0

  • Online ISBN: 978-1-4471-0793-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics