Summary
Several researchers have experimentally shown that substantial improvements can be obtained in difficult pattern recognition problems by combining or integrating the outputs of multiple classifiers. This chapter provides an analytical framework to quantify the improvements in classification results due to combining. The results apply to both linear combiners and order statistics combiners. We first show that to a first order approximation, the error rate obtained over and above the Bayes error rate, is directly proportional to the variance of the actual decision boundaries around the Bayes optimum boundary. Combining classifiers in output space reduces this variance, and hence reduces the “added” error. If N unbiased classifiers are combined by simple averaging, the added error rate can be reduced by a factor of N if the individual errors in approximating the decision boundaries are uncorrelated. Expressions are then derived for linear combiners which are biased or correlated, and the effect of output correlations on ensemble performance is quantified. For order statistics based non-linear combiners, we derive expressions that indicate how much the median, the maximum and in general the ith order statistic can improve classifier performance. The analysis presented here facilitates the understanding of the relationships among error rates, classifier boundary distributions, and combining in output space. Experimental results on several public domain data sets are provided to illustrate the benefits of combining and to support the analytical results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
K. Al-Ghoneim and B. V. K. Vijaya Kumar. Learning ranks with neural networks (Invited paper). In Applications and Science of Artificial Neural Networks, Proceedings of the SPIE, volume 2492, pages 446–464, April 1995.
K. M. Ali and M. J. Pazzani. On the link between error correlation and error reduction in decision tree ensembles. Technical Report 95–38, Department of Information and Computer Science, University of California, Irvine, 1995.
B.C. Arnold, N. Balakrishnan, and H.N. Nagaraja. A First Course in Order StatisticsWiley, New York, 1992.
J.A. Barnett. Computational methods for a mathematical theory of evidence. In Proceedings of the Seventh International Joint Conference on Artificial Intelligence, pages 868–875, August 1981.
R. Battiti and A. M. Colla. Democracy in neural nets: Voting schemes for classification. Neural Networks, 7 (4): 691–709, 1994.
W. G. Baxt. Improving the accuracy of an artificial neural network using multiple differently trained networks. Neural Computation, 4: 772–780, 1992.
J.A. Benediktsson, J.R. Sveinsson, O.K. Ersoy, and P.H. Swain. Parallel consensual neural networks with optimally weighted outputs. In Proceedings of the World Congress on Neural Networks, pages III:129–137. INNS Press, 1994.
V. Biou, J.F. Gibrat, J.M. Levin, B. Robson, and J. Gamier. Secondary structure prediction: combination of three different methods. Protein Engineering, 2: 185–91, 1988.
L. Breiman. Stacked regression. Technical Report 367, Department of Statistics, University of California, Berkeley, 1993.
L. Breiman. Bagging predictors. Technical Report 421, Department of Statistics, University of California, Berkeley, 1994.
P.K. Chan and S.J. Stolfo. A comparative evaluation of voting and meta-learning on partitioned data. In Proceedings of the Twelfth International Machine Learning Conference, pages 90–98, Tahoe City, CA, 1995. Morgan Kaufmann.
H. A. David. Order StatisticsWiley, New York, 1970.
H. Drucker and C. Cortes. Boosting decision trees. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems-8, pages 479–485. M.I.T. Press, 1996.
H. Drucker, C. Cortes, L. D. Jackel, Y. LeCun, and V. Vapnik. Boosting and other ensemble methods. Neural Computation, 6 (6): 1289–1301, 1994.
H. Drucker, R. Schapire, and P. Simard. Improving performance in neural networks using a boosting algorithm. In S.J. Hanson, J. D. Cowan, and C. L. Giles, editors, Advances in Neural Information Processing Systems-5, pages 42–49. Morgan Kaufmann, 1993.
R. O. Duda and P. E. Hart. Pattern Classification and Scene AnalysisWiley, New York, 1973.
B. Efron. The Jackknife, the Bootstrap and Other Resarnpling PlansSIAM, Philadelphia, 1982.
B. Efron. Estimating the error rate of a prediction rule. Journal of the American Statistical Association, 78: 316–333, 1983.
Y. Freund and R. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. In Proceedings of the Second European Conference on Computational Learning Theory, pages 23–37. Springer Verlag, March 1995.
Y. Freund and R. Schapire. Experiments with a new boosting algorithm. In Proceedings of the Thirteenth International Conference on Machine Learning, pages 148–156. Morgan Kaufmann, 1996.
J. H. Friedman. An overview of predictive learning and function approximation. In V. Cherkassky, J.H. Friedman, and H. Wechsler, editors, From Statistics to Neural Networks, Proc. NATO/ASI Workshop, pages 1–55. Springer Verlag, 1994.
K. Fukunaga. Introduction to Statistical Pattern Recognition(2nd Ed.), Academic Press, 1990.
T.D. Garvey, J.D. Lowrance, and M.A. Fischler. An inference technique for integrating knowledge from disparate sources. In Proceedings of the Seventh International Joint Conference on Artificial Intelligence, pages 319–325, August 1981.
S. Geman, E. Bienenstock, and R. Doursat. Neural networks and the bias/variance dilemma. Neural Computation, 4 (1): 1–58, 1992.
J. Ghosh, L. Deuser, and S. Beck. A neural network based hybrid system for detection, characterization and classification of short-duration oceanic signals. IEEE Journal of Ocean Engineering, 17 (4): 351–363, October 1992.
J. Ghosh and K. Turner. Structural adaptation and generalization in supervised feedforward networks. Journal of Artificial Neural Networks, 1 (4): 431–458, 1994.
J. Ghosh, K. Turner, S. Beck, and L. Deuser. Integration of neural classifiers for passive sonar signals. In C.T. Leondes, editor, Control and Dynamic Systems-Advances in Theory and Applications, volume 77, pages 301–338. Academic Press, 1996.
C. W. J. Granger. Combining forecasts-twenty years later. Journal of Forecasting, 8 (3): 167–173, 1989.
J.B. Hampshire and A.H. Waibel. The Meta-Pi network: Building distributed representations for robust multisource pattern recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14 (7): 751–769, 1992.
L. K. Hansen and P. Salamon. Neural network ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12 (10): 993–1000, 1990.
S. Hashem and B. Schmeiser. Approximating a function and its derivatives using MSE-optimal linear combinations of trained feedforward neural networks. In Proceedings of the Joint Conference on Neural Networks, volume 87, pages I:617–620, New Jersey, 1993.
D. Heckerman. Probabilistic interpretation for MYCIN’s uncertainty factors. In L.N Kanal and J.F. Lemmer, editors, Uncertainty in Artificial Intelligence, pages 167–196. North-Holland, 1986.
T. K. Ho, J. J. Hull, and S. N. Srihari. Decision combination in multiple classifier systems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16 (1): 66–76, 1994.
Robert Jacobs. Method for combining experts’ probability assessments. Neural Computation, 7 (5): 867–888, 1995.
A. Jain, R. Dubes, and C. Chen. Bootstrap techniques for error estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 9: 628–633, 1987.
A. Krogh and J. Vedelsby. Neural network ensembles, cross validation and active learning. In G. , D. S. Touretzky, and T. K. Leen, editors, Advances in Neural Information Processing Systems-7, pages 231–238. M.I.T. Press, 1995.
J. Lee, J.-N. Hwang, D.T. Davis, and A.C. Nelson. Integration of neural networks and decision tree classifiers for automated cytology screening. In Proceedings of the International Joint Conference on Neural Networks, Seattle, pages I:257–262, July 1991.
E. Levin, N. Tishby, and S. A. Solla. A statistical approach to learning and generalization in layered neural networks. Proc. IEEE, 78 (10): 1568–74, Oct 1990.
W.P. Lincoln and J. Skrzypek. Synergy of clustering multiple back propagation networks. In D. Touretzky, editor, Advances in Neural Information Processing Systems-2, pages 650–657. Morgan Kaufmann, 1990.
O. L. Mangasarian, R. Setiono, and W. H. Wolberg. Pattern recognition via linear programming: Theory and application to medical diagnosis. In Thomas F. Coleman and Yuying Li, editors, Large-Scale Numerical Optimization, pages 22–30. SIAM Publications, 1990.
R. Meir. Bias, variance, and the combination of estimators; the case of least linear squares. In G. Tesauro, D. S. Touretzky, and T. K. Leen, editors, Advances in Neural Information Processing Systems-7, pages 295–302. M.I.T. Press, 1995.
C.J. Merz and M.J. Pazzani. Combining neural network regression estimates with regularized linear weights. In M. C. Mozer, M. I. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems-9, pages 564–570. M.I.T. Press, 1997.
R.S. Michalski and R.L. Chilausky. Learning by being told and learning from examples: An experimental comparison of the two methods of knowledge acquisition in the context of developing an expert system for soybean disease diagnosis. International Journal of Policy Analysis and Information Systems, 4(2), 1980.
N. J. Nilsson. Learning Machines: Foundations of Trainable Pattern-Classifying SystemsMcGraw Hill, NY, 1965.
M. O. Noordewier, G. G. Towell, and J. W. Shavlik. Training knowledge-based neural networks to recognize genes in DNA sequences. In R.P. Lippmann, J.E. Moody, and D.S. Touretzky, editors, Advances in Neural Information Processing Systems-3, pages 530–536. Morgan Kaufmann, 1991.
D. W. Opitz and J. W. Shavlik. Generating accurate and diverse members of a neural-network ensemble. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems-8, pages 535–541. M.I.T. Press, 1996.
B. Parmanto, P. W. Munro, and H. R. Doyle. Reducing variance of committee prediction with resampling techniques. Connection Science, Special Issue on Combining Artificial Neural Networks: Ensemble Approaches, 8 (3 & 4): 405–426, 1996.
M.P. Perrone and L. N. Cooper. Learning from what’s been learned: Supervised learning in multi-neural network systems. In Proceedings of the World Congress on Neural Networks, pages I11:354–357. INNS Press, 1993.
M.P. Perrone and L. N. Cooper. When networks disagree: Ensemble methods for hybrid neural networks. In R. J. Mammone, editor, Neural Networks for Speech and Image Processing, chapter 10. Chapmann-Hall, 1993.
Lutz Prechelt. PROBEN1 - A set of benchmarks and benchmarking rules for neural network training algorithms. Technical Report 21/94, Fakultät für Informatik, Universität Karlsruhe, D-76128 Karlsruhe, Germany, September 1994. Anonymous FTP: /pub/papers/techreports/1994/1994–21.ps.Z on ftp.ira.uka.de.
J.R. Quinlan. Simplifying decision trees. International Journal of Man-Machine Studies, 27: 221–234, December 1987.
J.R. Quinlan. C4.5: Programs for Machine LearningMorgan Kaufman, San Mateo, California, 1992.
E. Rich and K. Knight. Artificial IntelligenceMcGraw-Hill, Inc., 2 edition, 1991.
M.D. Richard and R.P. Lippmann. Neural network classifiers estimate Bayesian a posteriori probabilities. Neural Computation, 3 (4): 461–483, 1991.
G. Rogova. Combining the results of several neural network classifiers. Neural Networks, 7 (5): 777–781, 1994.
B. Rosen. Ensemble learning using decorrelated neural networks. Connection Science, Special Issue on Combining Artificial Neural Networks: Ensemble Approaches, 8 (3 & 4): 373–384, 1996.
D. W. Ruck, S. K. Rogers, M. E. Kabrisky, M. E. Oxley, and B. W. Suter. The multilayer Perceptron as an approximation to a Bayes optimal discriminant function. IEEE Transactions on Neural Networks, 1 (4): 296–298, 1990.
A. E. Sarhan and B. G. Greenberg. Estimation of location and scale parameters by order statistics from singly and doubly censored samples. Annals of Mathematical Statistics Science, 27: 427–451, 1956.
R. Schapire, Y. Freund, P. Bartlett, and Lee W.S. Boosting the margin: A new explanation for the effectiveness of voting methods. In Proceedings of the Fourteenth International Conference on Machine LearningMorgan Kaufmann, 1997.
A. J. J. Sharkey. (editor). Connection Science: Special Issue on Combining Artificial Neural Networks: Ensemble Approaches, 8(3 & 4 ), 1996.
S. Shlien. Multiple binary decision tree classifiers. Pattern Recognition, 23 (7): 757–63, 1990.
P.A. Shoemaker, M.J. Carlin, R.L. Shimabukuro, and C.E. Priebe. Least squares learning and approximation of posterior probabilities on classification problems by neural network models. In Proc. 2nd Workshop on Neural Networks, WNN-AIND91,Auburn, pages 187–196, February 1991.
J. W. Smith, J. E. Everhart, W. C. Dickson, W. C. Knowler, and R. S. Johannes. Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In Proceedings of the Symposium on Computer Applications and Medical Care, pages 261–265. IEEE Computer Society Press, 1988.
P. Sollich and A. Krogh. Learning with ensembles: How overfitting can be useful. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems-8, pages 190–196. M.I.T. Press, 1996.
M. Stone. Cross-validatory choice and assessment of statistical prediction. Journal of the Royal Statistical Society, 36: 111–147, 1974.
G. G. Towell and J. W. Shavlik. Interpretation of artificial neural networks: Mapping knowledge-based neural networks into rules. In J.E. Moody, S.J. Hanson, and R.P. Lippmann, editors, Advances in Neural Information Processing Systems-4, pages 977–984. Morgan Kaufmann, 1992.
K. Turner and J. Ghosh. Limits to performance gains in combined neural classifiers. In Proceedings of the Artificial Neural Networks in Engineering ‘85, pages 419–424, St. Louis, 1995.
K. Turner and J. Ghosh. order statistics combiners for neural classifiers. In Proceedings of the World Congress on Neural Networks, pages 1:31–34, Washington D.C., 1995. INNS Press.
K. Turner and J. Ghosh. Analysis of decision boundaries in linearly combined neural classifiers. Pattern Recognition, 29 (2): 341–348, February 1996.
K. Turner and J. Ghosh. Error correlation and error reduction in ensemble classifiers. Connection Science, Special Issue on Combining Artificial Neural Networks: Ensemble Approaches, 8 (3 &4): 385–404, 1996.
K. Turner and J. Ghosh. Estimating the Bayes error rate through classifier combining. In Proceedings of the International Conference on Pattern Recognition, Vienna, Austria, pages IV:695–699, 1996.
K. Turner and J. Ghosh. Classifier combining through trimmed means and order statistics. In Proceedings of the International Joint Conference on Neural Networks, Anchorage, Alaska, 1998.
Kagan Turner. Linear and Order Statistics Combiners for Reliable Pattern ClassificationPhD thesis, The University of Texas, Austin, TX, May 1996.
J. M. Twomey and A. E. Smith. Committee networks by resampling. In C. H. Dagli, M. Akay, C. L. P. Chen, B. R. Fernandez, and J. Ghosh, editors, Intelligent Engineering Systems through Artificial Neural Networks, volume 5, pages 153–158. ASME Press, 1995.
S. M. Weiss and C.A. Kulikowski. Computer Systems That LearnMorgan Kaufmann, 1991.
William H. Wolberg and O.L. Mangasarian. Multisurface method of pattern separation for medical diagnosis applied to breast cytology. In Proceedings of the National Academy of Sciences, volume 87, pages 9193–9196, U.S.A, December 1990.
D. H. Wolpert. A mathematical theory of generalization. Complex Systems, 4: 151–200, 1990.
D. H. Wolpert. Stacked generalization. Neural Networks, 5: 241–259, 1992.
D. H. Wolpert. The existence of a priori distinctions between learning algorithms. Neural Computation, 8: 1391–1420, 1996.
D. H. Wolpert. The lack of a priori distinctions between learning algorithms. Neural Computation, 8: 1341–1390, 1996.
L. Xu, A. Krzyzak, and C. Y. Suen. Methods of combining multiple classifiers and their applications to handwriting recognition. IEEE Transactions on Systems, Man and Cybernetics, 22 (3): 418–435, May 1992.
J.-B. Yang and M. G. Singh. An evidential reasoning approach for multiple-attribute decision making with uncertainty. IEEE Transactions on Systems, Man, and Cybernetics, 24 (1): 1–19, 1994.
X. Zhang, J.P. Mesirov, and D.L. Waltz. Hybrid system for protein secondary structure prediction. J. Molecular Biology, 225: 1049–63, 1992.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag London Limited
About this chapter
Cite this chapter
Sharkey, A.J.C. (1999). Linear and Order Statistics Combiners for Pattern Classification. In: Sharkey, A.J.C. (eds) Combining Artificial Neural Nets. Perspectives in Neural Computing. Springer, London. https://doi.org/10.1007/978-1-4471-0793-4_6
Download citation
DOI: https://doi.org/10.1007/978-1-4471-0793-4_6
Publisher Name: Springer, London
Print ISBN: 978-1-85233-004-0
Online ISBN: 978-1-4471-0793-4
eBook Packages: Springer Book Archive