Linear and Order Statistics Combiners for Pattern Classification

Sharkey, Amanda J. C.

doi:10.1007/978-1-4471-0793-4_6

Amanda J. C. Sharkey³

Part of the book series: Perspectives in Neural Computing ((PERSPECT.NEURAL))

191 Accesses
20 Citations

Summary

Several researchers have experimentally shown that substantial improvements can be obtained in difficult pattern recognition problems by combining or integrating the outputs of multiple classifiers. This chapter provides an analytical framework to quantify the improvements in classification results due to combining. The results apply to both linear combiners and order statistics combiners. We first show that to a first order approximation, the error rate obtained over and above the Bayes error rate, is directly proportional to the variance of the actual decision boundaries around the Bayes optimum boundary. Combining classifiers in output space reduces this variance, and hence reduces the “added” error. If N unbiased classifiers are combined by simple averaging, the added error rate can be reduced by a factor of N if the individual errors in approximating the decision boundaries are uncorrelated. Expressions are then derived for linear combiners which are biased or correlated, and the effect of output correlations on ensemble performance is quantified. For order statistics based non-linear combiners, we derive expressions that indicate how much the median, the maximum and in general the ith order statistic can improve classifier performance. The analysis presented here facilitates the understanding of the relationships among error rates, classifier boundary distributions, and combining in output space. Experimental results on several public domain data sets are provided to illustrate the benefits of combining and to support the analytical results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

K. Al-Ghoneim and B. V. K. Vijaya Kumar. Learning ranks with neural networks (Invited paper). In Applications and Science of Artificial Neural Networks, Proceedings of the SPIE, volume 2492, pages 446–464, April 1995.
Google Scholar
K. M. Ali and M. J. Pazzani. On the link between error correlation and error reduction in decision tree ensembles. Technical Report 95–38, Department of Information and Computer Science, University of California, Irvine, 1995.
Google Scholar
B.C. Arnold, N. Balakrishnan, and H.N. Nagaraja. A First Course in Order StatisticsWiley, New York, 1992.
MATH Google Scholar
J.A. Barnett. Computational methods for a mathematical theory of evidence. In Proceedings of the Seventh International Joint Conference on Artificial Intelligence, pages 868–875, August 1981.
Google Scholar
R. Battiti and A. M. Colla. Democracy in neural nets: Voting schemes for classification. Neural Networks, 7 (4): 691–709, 1994.
Article Google Scholar
W. G. Baxt. Improving the accuracy of an artificial neural network using multiple differently trained networks. Neural Computation, 4: 772–780, 1992.
Article Google Scholar
J.A. Benediktsson, J.R. Sveinsson, O.K. Ersoy, and P.H. Swain. Parallel consensual neural networks with optimally weighted outputs. In Proceedings of the World Congress on Neural Networks, pages III:129–137. INNS Press, 1994.
Google Scholar
V. Biou, J.F. Gibrat, J.M. Levin, B. Robson, and J. Gamier. Secondary structure prediction: combination of three different methods. Protein Engineering, 2: 185–91, 1988.
Article Google Scholar
L. Breiman. Stacked regression. Technical Report 367, Department of Statistics, University of California, Berkeley, 1993.
Google Scholar
L. Breiman. Bagging predictors. Technical Report 421, Department of Statistics, University of California, Berkeley, 1994.
Google Scholar
P.K. Chan and S.J. Stolfo. A comparative evaluation of voting and meta-learning on partitioned data. In Proceedings of the Twelfth International Machine Learning Conference, pages 90–98, Tahoe City, CA, 1995. Morgan Kaufmann.
Google Scholar
H. A. David. Order StatisticsWiley, New York, 1970.
MATH Google Scholar
H. Drucker and C. Cortes. Boosting decision trees. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems-8, pages 479–485. M.I.T. Press, 1996.
Google Scholar
H. Drucker, C. Cortes, L. D. Jackel, Y. LeCun, and V. Vapnik. Boosting and other ensemble methods. Neural Computation, 6 (6): 1289–1301, 1994.
Article MATH Google Scholar
H. Drucker, R. Schapire, and P. Simard. Improving performance in neural networks using a boosting algorithm. In S.J. Hanson, J. D. Cowan, and C. L. Giles, editors, Advances in Neural Information Processing Systems-5, pages 42–49. Morgan Kaufmann, 1993.
Google Scholar
R. O. Duda and P. E. Hart. Pattern Classification and Scene AnalysisWiley, New York, 1973.
MATH Google Scholar
B. Efron. The Jackknife, the Bootstrap and Other Resarnpling PlansSIAM, Philadelphia, 1982.
Google Scholar
B. Efron. Estimating the error rate of a prediction rule. Journal of the American Statistical Association, 78: 316–333, 1983.
Article MathSciNet MATH Google Scholar
Y. Freund and R. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. In Proceedings of the Second European Conference on Computational Learning Theory, pages 23–37. Springer Verlag, March 1995.
Google Scholar
Y. Freund and R. Schapire. Experiments with a new boosting algorithm. In Proceedings of the Thirteenth International Conference on Machine Learning, pages 148–156. Morgan Kaufmann, 1996.
Google Scholar
J. H. Friedman. An overview of predictive learning and function approximation. In V. Cherkassky, J.H. Friedman, and H. Wechsler, editors, From Statistics to Neural Networks, Proc. NATO/ASI Workshop, pages 1–55. Springer Verlag, 1994.
Google Scholar
K. Fukunaga. Introduction to Statistical Pattern Recognition(2nd Ed.), Academic Press, 1990.
Google Scholar
T.D. Garvey, J.D. Lowrance, and M.A. Fischler. An inference technique for integrating knowledge from disparate sources. In Proceedings of the Seventh International Joint Conference on Artificial Intelligence, pages 319–325, August 1981.
Google Scholar
S. Geman, E. Bienenstock, and R. Doursat. Neural networks and the bias/variance dilemma. Neural Computation, 4 (1): 1–58, 1992.
Article Google Scholar
J. Ghosh, L. Deuser, and S. Beck. A neural network based hybrid system for detection, characterization and classification of short-duration oceanic signals. IEEE Journal of Ocean Engineering, 17 (4): 351–363, October 1992.
Article Google Scholar
J. Ghosh and K. Turner. Structural adaptation and generalization in supervised feedforward networks. Journal of Artificial Neural Networks, 1 (4): 431–458, 1994.
Google Scholar
J. Ghosh, K. Turner, S. Beck, and L. Deuser. Integration of neural classifiers for passive sonar signals. In C.T. Leondes, editor, Control and Dynamic Systems-Advances in Theory and Applications, volume 77, pages 301–338. Academic Press, 1996.
Google Scholar
C. W. J. Granger. Combining forecasts-twenty years later. Journal of Forecasting, 8 (3): 167–173, 1989.
Article MathSciNet Google Scholar
J.B. Hampshire and A.H. Waibel. The Meta-Pi network: Building distributed representations for robust multisource pattern recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14 (7): 751–769, 1992.
Article Google Scholar
L. K. Hansen and P. Salamon. Neural network ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12 (10): 993–1000, 1990.
Article Google Scholar
S. Hashem and B. Schmeiser. Approximating a function and its derivatives using MSE-optimal linear combinations of trained feedforward neural networks. In Proceedings of the Joint Conference on Neural Networks, volume 87, pages I:617–620, New Jersey, 1993.
Google Scholar
D. Heckerman. Probabilistic interpretation for MYCIN’s uncertainty factors. In L.N Kanal and J.F. Lemmer, editors, Uncertainty in Artificial Intelligence, pages 167–196. North-Holland, 1986.
Google Scholar
T. K. Ho, J. J. Hull, and S. N. Srihari. Decision combination in multiple classifier systems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16 (1): 66–76, 1994.
Article Google Scholar
Robert Jacobs. Method for combining experts’ probability assessments. Neural Computation, 7 (5): 867–888, 1995.
Article Google Scholar
A. Jain, R. Dubes, and C. Chen. Bootstrap techniques for error estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 9: 628–633, 1987.
Article MATH Google Scholar
A. Krogh and J. Vedelsby. Neural network ensembles, cross validation and active learning. In G. , D. S. Touretzky, and T. K. Leen, editors, Advances in Neural Information Processing Systems-7, pages 231–238. M.I.T. Press, 1995.
Google Scholar
J. Lee, J.-N. Hwang, D.T. Davis, and A.C. Nelson. Integration of neural networks and decision tree classifiers for automated cytology screening. In Proceedings of the International Joint Conference on Neural Networks, Seattle, pages I:257–262, July 1991.
Chapter Google Scholar
E. Levin, N. Tishby, and S. A. Solla. A statistical approach to learning and generalization in layered neural networks. Proc. IEEE, 78 (10): 1568–74, Oct 1990.
Article Google Scholar
W.P. Lincoln and J. Skrzypek. Synergy of clustering multiple back propagation networks. In D. Touretzky, editor, Advances in Neural Information Processing Systems-2, pages 650–657. Morgan Kaufmann, 1990.
Google Scholar
O. L. Mangasarian, R. Setiono, and W. H. Wolberg. Pattern recognition via linear programming: Theory and application to medical diagnosis. In Thomas F. Coleman and Yuying Li, editors, Large-Scale Numerical Optimization, pages 22–30. SIAM Publications, 1990.
Google Scholar
R. Meir. Bias, variance, and the combination of estimators; the case of least linear squares. In G. Tesauro, D. S. Touretzky, and T. K. Leen, editors, Advances in Neural Information Processing Systems-7, pages 295–302. M.I.T. Press, 1995.
Google Scholar
C.J. Merz and M.J. Pazzani. Combining neural network regression estimates with regularized linear weights. In M. C. Mozer, M. I. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems-9, pages 564–570. M.I.T. Press, 1997.
Google Scholar
R.S. Michalski and R.L. Chilausky. Learning by being told and learning from examples: An experimental comparison of the two methods of knowledge acquisition in the context of developing an expert system for soybean disease diagnosis. International Journal of Policy Analysis and Information Systems, 4(2), 1980.
Google Scholar
N. J. Nilsson. Learning Machines: Foundations of Trainable Pattern-Classifying SystemsMcGraw Hill, NY, 1965.
MATH Google Scholar
M. O. Noordewier, G. G. Towell, and J. W. Shavlik. Training knowledge-based neural networks to recognize genes in DNA sequences. In R.P. Lippmann, J.E. Moody, and D.S. Touretzky, editors, Advances in Neural Information Processing Systems-3, pages 530–536. Morgan Kaufmann, 1991.
Google Scholar
D. W. Opitz and J. W. Shavlik. Generating accurate and diverse members of a neural-network ensemble. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems-8, pages 535–541. M.I.T. Press, 1996.
Google Scholar
B. Parmanto, P. W. Munro, and H. R. Doyle. Reducing variance of committee prediction with resampling techniques. Connection Science, Special Issue on Combining Artificial Neural Networks: Ensemble Approaches, 8 (3 & 4): 405–426, 1996.
Google Scholar
M.P. Perrone and L. N. Cooper. Learning from what’s been learned: Supervised learning in multi-neural network systems. In Proceedings of the World Congress on Neural Networks, pages I11:354–357. INNS Press, 1993.
Google Scholar
M.P. Perrone and L. N. Cooper. When networks disagree: Ensemble methods for hybrid neural networks. In R. J. Mammone, editor, Neural Networks for Speech and Image Processing, chapter 10. Chapmann-Hall, 1993.
Google Scholar
Lutz Prechelt. PROBEN1 - A set of benchmarks and benchmarking rules for neural network training algorithms. Technical Report 21/94, Fakultät für Informatik, Universität Karlsruhe, D-76128 Karlsruhe, Germany, September 1994. Anonymous FTP: /pub/papers/techreports/1994/1994–21.ps.Z on ftp.ira.uka.de.
Google Scholar
J.R. Quinlan. Simplifying decision trees. International Journal of Man-Machine Studies, 27: 221–234, December 1987.
Article Google Scholar
J.R. Quinlan. C4.5: Programs for Machine LearningMorgan Kaufman, San Mateo, California, 1992.
Google Scholar
E. Rich and K. Knight. Artificial IntelligenceMcGraw-Hill, Inc., 2 edition, 1991.
Google Scholar
M.D. Richard and R.P. Lippmann. Neural network classifiers estimate Bayesian a posteriori probabilities. Neural Computation, 3 (4): 461–483, 1991.
Article Google Scholar
G. Rogova. Combining the results of several neural network classifiers. Neural Networks, 7 (5): 777–781, 1994.
Article Google Scholar
B. Rosen. Ensemble learning using decorrelated neural networks. Connection Science, Special Issue on Combining Artificial Neural Networks: Ensemble Approaches, 8 (3 & 4): 373–384, 1996.
Google Scholar
D. W. Ruck, S. K. Rogers, M. E. Kabrisky, M. E. Oxley, and B. W. Suter. The multilayer Perceptron as an approximation to a Bayes optimal discriminant function. IEEE Transactions on Neural Networks, 1 (4): 296–298, 1990.
Article Google Scholar
A. E. Sarhan and B. G. Greenberg. Estimation of location and scale parameters by order statistics from singly and doubly censored samples. Annals of Mathematical Statistics Science, 27: 427–451, 1956.
Article MathSciNet MATH Google Scholar
R. Schapire, Y. Freund, P. Bartlett, and Lee W.S. Boosting the margin: A new explanation for the effectiveness of voting methods. In Proceedings of the Fourteenth International Conference on Machine LearningMorgan Kaufmann, 1997.
Google Scholar
A. J. J. Sharkey. (editor). Connection Science: Special Issue on Combining Artificial Neural Networks: Ensemble Approaches, 8(3 & 4 ), 1996.
Google Scholar
S. Shlien. Multiple binary decision tree classifiers. Pattern Recognition, 23 (7): 757–63, 1990.
Article Google Scholar
P.A. Shoemaker, M.J. Carlin, R.L. Shimabukuro, and C.E. Priebe. Least squares learning and approximation of posterior probabilities on classification problems by neural network models. In Proc. 2nd Workshop on Neural Networks, WNN-AIND91,Auburn, pages 187–196, February 1991.
Google Scholar
J. W. Smith, J. E. Everhart, W. C. Dickson, W. C. Knowler, and R. S. Johannes. Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In Proceedings of the Symposium on Computer Applications and Medical Care, pages 261–265. IEEE Computer Society Press, 1988.
Google Scholar
P. Sollich and A. Krogh. Learning with ensembles: How overfitting can be useful. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems-8, pages 190–196. M.I.T. Press, 1996.
Google Scholar
M. Stone. Cross-validatory choice and assessment of statistical prediction. Journal of the Royal Statistical Society, 36: 111–147, 1974.
MATH Google Scholar
G. G. Towell and J. W. Shavlik. Interpretation of artificial neural networks: Mapping knowledge-based neural networks into rules. In J.E. Moody, S.J. Hanson, and R.P. Lippmann, editors, Advances in Neural Information Processing Systems-4, pages 977–984. Morgan Kaufmann, 1992.
Google Scholar
K. Turner and J. Ghosh. Limits to performance gains in combined neural classifiers. In Proceedings of the Artificial Neural Networks in Engineering ‘85, pages 419–424, St. Louis, 1995.
Google Scholar
K. Turner and J. Ghosh. order statistics combiners for neural classifiers. In Proceedings of the World Congress on Neural Networks, pages 1:31–34, Washington D.C., 1995. INNS Press.
Google Scholar
K. Turner and J. Ghosh. Analysis of decision boundaries in linearly combined neural classifiers. Pattern Recognition, 29 (2): 341–348, February 1996.
Article Google Scholar
K. Turner and J. Ghosh. Error correlation and error reduction in ensemble classifiers. Connection Science, Special Issue on Combining Artificial Neural Networks: Ensemble Approaches, 8 (3 &4): 385–404, 1996.
Google Scholar
K. Turner and J. Ghosh. Estimating the Bayes error rate through classifier combining. In Proceedings of the International Conference on Pattern Recognition, Vienna, Austria, pages IV:695–699, 1996.
Google Scholar
K. Turner and J. Ghosh. Classifier combining through trimmed means and order statistics. In Proceedings of the International Joint Conference on Neural Networks, Anchorage, Alaska, 1998.
Google Scholar
Kagan Turner. Linear and Order Statistics Combiners for Reliable Pattern ClassificationPhD thesis, The University of Texas, Austin, TX, May 1996.
Google Scholar
J. M. Twomey and A. E. Smith. Committee networks by resampling. In C. H. Dagli, M. Akay, C. L. P. Chen, B. R. Fernandez, and J. Ghosh, editors, Intelligent Engineering Systems through Artificial Neural Networks, volume 5, pages 153–158. ASME Press, 1995.
Google Scholar
S. M. Weiss and C.A. Kulikowski. Computer Systems That LearnMorgan Kaufmann, 1991.
Google Scholar
William H. Wolberg and O.L. Mangasarian. Multisurface method of pattern separation for medical diagnosis applied to breast cytology. In Proceedings of the National Academy of Sciences, volume 87, pages 9193–9196, U.S.A, December 1990.
Google Scholar
D. H. Wolpert. A mathematical theory of generalization. Complex Systems, 4: 151–200, 1990.
MathSciNet MATH Google Scholar
D. H. Wolpert. Stacked generalization. Neural Networks, 5: 241–259, 1992.
Article Google Scholar
D. H. Wolpert. The existence of a priori distinctions between learning algorithms. Neural Computation, 8: 1391–1420, 1996.
Article Google Scholar
D. H. Wolpert. The lack of a priori distinctions between learning algorithms. Neural Computation, 8: 1341–1390, 1996.
Article Google Scholar
L. Xu, A. Krzyzak, and C. Y. Suen. Methods of combining multiple classifiers and their applications to handwriting recognition. IEEE Transactions on Systems, Man and Cybernetics, 22 (3): 418–435, May 1992.
Article Google Scholar
J.-B. Yang and M. G. Singh. An evidential reasoning approach for multiple-attribute decision making with uncertainty. IEEE Transactions on Systems, Man, and Cybernetics, 24 (1): 1–19, 1994.
Article Google Scholar
X. Zhang, J.P. Mesirov, and D.L. Waltz. Hybrid system for protein secondary structure prediction. J. Molecular Biology, 225: 1049–63, 1992.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello Street, Sheffield, S1 4DP, UK
Amanda J. C. Sharkey

Authors

Amanda J. C. Sharkey
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello Street, Sheffield, S1 4DP, UK
Amanda J. C. Sharkey

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Sharkey, A.J.C. (1999). Linear and Order Statistics Combiners for Pattern Classification. In: Sharkey, A.J.C. (eds) Combining Artificial Neural Nets. Perspectives in Neural Computing. Springer, London. https://doi.org/10.1007/978-1-4471-0793-4_6

Download citation

DOI: https://doi.org/10.1007/978-1-4471-0793-4_6
Publisher Name: Springer, London
Print ISBN: 978-1-85233-004-0
Online ISBN: 978-1-4471-0793-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics