Abstract
Concept learning depends on data character. To discover how, some researchers have used theoretical analysis to relate the behavior of idealized learning algorithms to classes of concepts. Others have developed pragmatic measures that relate the behavior of empirical systems such as ID3 and PLS1 to the kinds of concepts encountered in practice. But before learning behavior can be predicted, concepts and data must be characterized. Data characteristics include their number, error, “size,” and so forth. Although potential characteristics are numerous, they are constrained by the way one views concepts. Viewing concepts as functions over instance space leads to geometric characteristics such as concept size (the proportion of positive instances) and concentration (not too many “peaks”). Experiments show that some of these characteristics drastically affect the accuracy of concept learning. Sometimes data characteristics interact in non-intuitive ways; for example, noisy data may degrade accuracy differently depending on the size of the concept. Compared with effects of some data characteristics, the choice of learning algorithm appears less important: performance accuracy is degraded only slightly when the splitting criterion is replaced with random selection. Analyzing such observations suggests directions for concept learning research.
Article PDF
Similar content being viewed by others
References
Abbott, A.L. (1987). Cohesion methods in inductive learning. Computational Intelligence, 3, 267-282.
Anderberg, M.R. (1973). Cluster analysis for applications. New York: Academic Press.
Barron, A.R., and Barron, R.L. (1988). Statistical learning networks: a unifying view. Proceedings of the 20th Interface Symposium on Statistics and Computing (pp.192-203). Reston, VA: American Statistics Association.
Breiman. L., Friedman, J.H., Olshen, R.A. and Stone, C.J. (1984). Classification and regression trees. Belmont, CA: Wadsworth.
Buchanan, B.G., Rissland, E.L., Rosenbloom, P.S., Ng, H.T., and Sullivan, J. (1987). The role of intelligent instance selection in learning systems: The near miss. Unpublished manuscript, Department of Computer Science, University of Pittsburgh, Pittsburgh, PA.
Clark, P., and Niblett, T. (1989). The CN2 induction algorithm. Machine Learning, 3, 261-283.
Coles, D., and Rendell, L.A. (1984). Some issues in training learning systems and an autonomous design. Proceedings of the Fifth Biennial Conference of the Canadian Society for Computational Studies of Intelligence (pp.99-102). Toronto, Canada: Canadian Information Processing Society.
Cover, T. (1965). Geometrical and statistical properties of systems of linear equations with applications to pattern recognition. IEEE Transactions on Electronic Computing, 14, 326-334.
Devijver, P.A., and Kittler, J. (1982). Pattern recognition: A statistical approach. Englewood Cliffs, CA: Prentice Hall.
Dietterich, T.G., London, B., Clarkson, K., and Dromey, G. (1982). Learning and inductive inference. In P.R. Cohen and E.A. Feigenbaums (Eds.), The handbook of artificial intelligence. Los Altos: Kaufmann.
Draper, N.R., and Smith, H. (1981). Applied regression analysis. New York: Wiley.
Drastal, G., and Raatz, S. (1989). Empirical results on learning in an abstraction space (Technical Report DCS-TR-248). New Brunswick, NJ: Rutgers University, Department of Computer Science.
Drastal, G., Raatz, S., and Meunier, R. (1989). Induction in an abstraction space: A form of constructive induc-tion. Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (pp.708-712). Detroit, MI: Morgan Kaufmann.
Ehrenfeucht, A., Haussler, D., Kearns, M., and Valiant, L.A. (1988). A general lower bound on the number of examples needed for learning. Proceedings of the Workshop on Computational Learning Theory (pp.139-154). Boston, MA: Morgan Kaufmann.
Gams, M., and Lavrac, N. (1987). Review of five empirical learning systems within a proposed schemata. In I. Bratko and N. Lavracs (Eds.), Progress in machine learning: Proceedings of the Second European Working Session on Learning. Wilmslow, England: Sigma Press.
Haussler, D. (1986). Quantifying inductive bias: AI learning algorithms and Valiant's learning framework. Artificial Intelligence, 36, 177-221.
Holte, R.C., Acker, L.E., and Porter, B.W. (1988). Concept learning and the problem of small disjuncts. Pro-ceedings of the Eleventh International Joint Conference on Artificial Intelligence (pp.813-818). Detroit, MI: Morgan Kaufmann.
Hogg, R.V., and Craig, A.T.,(1965). Introduction to mathematical statistics. New York: Macmillan.
Hunt, E.B., Marin, J., and Stone, P.J. (1966). Experiments in induction. New York: Academic Press.
Kearns, M., Li, M., Pitt, L., and Valiant, L.G. (1987). Recent results on boolean concept learning. Proceedings of the Fourth International Workshop on Machine Learning (pp.337-352). Irvine, CA: Morgan Kaufmann.
Lakoff, G., and Johnson, M. (1980). Metaphors we live by. Chicago: University of Chicago Press.
Langley, P.A. (1987). A general theory of discrimination learning. In D. Klahr, P. Langley, and R. Neches (Eds.), Production system models of learning and development. Cambridge, MA: MIT Press.
Lavrac, N., Mozetic, I., and Kononenko, I. (1986). An experimental comparison of two learning programs in three medical domains. Unpublished manuscript, Computer Science Department, University of Illinois, Urbana, IL.
Matheus, C.J., and Rendell, L.A. (1989). Constructive induction on decision trees. Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (pp.645-650). Detroit, MI: Morgan Kaufmann.
Michalski, R.S. (1983). A theory and methodology of inductive learning. In R.S. Michalski, J.G. Carbonell, and T.M. Mitchell (Eds.), Machine learning: An artificial intelligence approach (Vol.1). San Mateo, CA: Morgan Kaufmann.
Mingers, J. (1989). An empirical comparison of selection measures for decision-tree induction. Machine Learning, 3, 319-342.
Mitchell, T.M. (1978)Version spaces: An approach to concept learning. Doctoral Dissertation, Department of Electrical Engineering, Stanford University, Stanford, CA.
Mitchell, T.M., Keller, R.M., and Kedar-Cabelli, ST. (1986). Explanation-based generalization: A unifying view. Machine Learning, 1, 47-80.
O'Rorke, P. (1982). A comparative study of inductive learning systems AQ1IP and ID3 using a chess endgame test problem. (Technical Report No. UIUCDCS-F-82-899.) Urbana, IL: University of Illinois, Department of Computer Science.
Pagallo, G. (1989). Learning DNF by decision trees. Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (pp.639-644). Detroit, MI: Morgan Kaufmann.
Pagallo, G., and Haussler, D. (1988). Feature discovery in empirical learning. (Technical Report No. UCSC-CRL-88-08.) Santa Cruz, CA: University of California, Computer Research Laboratory.
Pitt, L., and Valiant, L. (1986). Computational limitations on learning from examples (Technical Report TR-05-86). Cambridge, MA: Harvard University, Aiken Computation Laboratory.
Quinlan, J.R. (1979). Discovering rules by induction from large collections of examples. In D. Michie (Ed.), Expert systems in the microelectronic age. Edinburgh, Scotland: Edinburgh University Press.
Quinlan, J.R. (1983). Learning efficient classification procedures and their application to chess end games. In R.S. Michalski, J.G. Carbonell, and T.M. Mitchell (Eds.), Machine learning: An artificial intelligence approach (Vol.1). San Mateo, CA: Morgan Kaufmann.
Quinlan, J.R. (1986). The effect of noise on concept learning. In R.S. Michalski, J.G. Carbonell, and T.M. Mitchell (Eds.), Machine learning: An artificial intelligence approach (Vol.2). San Mateo, CA: Morgan Kaufmann.
Quinlan, J.R. (1987a). Decision trees as probabilistic classifiers. Proceedings of the Fourth International Workshop on Machine Learning (pp.31-37). Irvine, CA: Morgan Kaufmann.
Quinlan, J.R. (1987b). Simplifying decision trees. International Journal of Man-Machine Studies, 27, 221-234.
Rendell, L.A. (1983). A new basis for state-space learning systems and a successful implementation. Artificial Intelligence, 20, 369-392.
Rendell, L.A. (1985). Substantial constructive induction using layered information compression: Tractable feature formation in search. Proceedings of the Ninth International Joint Conference on Artificial Intelligence (pp. 650-658). Los Angeles, CA: Morgan Kaufmann.
Rendell, L.A. (1986a). Induction, of and by probability. In L.N. Kanal and J. Lemmer (Eds.), Uncertainty in artificial intelligence. Amsterdam: Elsevier Science Publishers.
Rendell, L.A. (1986b). A general framework for induction and a study of selective induction. Machine Learning, 1, 177-226.
Rendell, L.A. (1988). Learning hard concepts. Proceedings of the Third European Working Session on Learning (pp.177-200). London: Pitman.
Rendell, L.A. (1989). Comparing systems and analyzing functions to improve constructive induction. Proceedings of the Fifth International Machine Learning Workshop (pp.461-464). Ithaca, NY: Morgan Kaufmann.
Rendell, L.A. (in press). Learning hard concepts: Framework and rationale. Computational Intelligence.
Rendell, L.A., Cho, H.H. and Seshu, R. (1989). Improving the design of similarity-based rule-learning systems. International Journal of Expert Systems, 2, 97-133.
Samuel, A.L. (1963). Some studies in machine learning using the game of checkers. In E.A. Feigenbaum and J. Feldman (Eds.), Computers and thought. New York: McGraw-Hill.
Simon, H.A., and Lea, G. (1974). Problem solving and rule induction: A unified view. In L. Greggs (Ed.), Knowledge and cognition. Potomac: Erlbaum.
Sleeman, D.H. (1981). A rule-based task generation system. Proceedings of the Seventh International Joint Conference on Artificial Intelligence (pp.882-887). Vancouver, Canada: Morgan Kaufmann.
Tou, J.T., and Gonzalez, R.C. (1974). Pattern recognition principles. Reading, Massachusetts: Addison-Wesley.
Valiant, L.G. (1984). A theory of the learnable. Communications of the ACM, 27, 1134-1142.
Winston, P.H. (1975). Learning structural descriptions from examples. In P.H. Winston (Ed.), The psychology of computer vision. New York: McGraw-Hill.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Rendell, L., Cho, H. Empirical Learning as a Function of Concept Character. Machine Learning 5, 267–298 (1990). https://doi.org/10.1023/A:1022651406695
Issue Date:
DOI: https://doi.org/10.1023/A:1022651406695