Abstract
In the past, nearest neighbor algorithms for learning from examples have worked best in domains in which all features had numeric values. In such domains, the examples can be treated as points and distance metrics can use standard definitions. In symbolic domains, a more sophisticated treatment of the feature space is required. We introduce a nearest neighbor algorithm for learning in domains with symbolic features. Our algorithm calculates distance tables that allow it to produce real-valued distances between instances, and attaches weights to the instances to further modify the structure of feature space. We show that this technique produces excellent classification accuracy on three problems that have been studied by machine learning researchers: predicting protein secondary structure, identifying DNA promoter sequences, and pronouncing English text. Direct experimental comparisons with the other learning algorithms show that our nearest neighbor algorithm is comparable or superior in all three domains. In addition, our algorithm has advantages in training speed, simplicity, and perspicuity. We conclude that experimental evidence favors the use and continued development of nearest neighbor algorithms for domains such as the ones studied here.
Article PDF
Similar content being viewed by others
References
Aha, D. (1989). Incremental, instance-based learning of independent and graded concept descriptions. Proceedings of the Sixth International Workshop on Machine Learning (pp. 387–391). Ithaca, NY: Morgan Kaufmann.
Aha, D. & Kibler, D. (1989). Noise-tolerant instance-based learning algorithms. Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (p. 794–799). Detroit, MI: Morgan Kaufmann.
Aha, D. (1990). A study of instance-based algorithms for supervised learning tasks. Doctoral dissertation, Department of Information and Computer Science, University of California, Irvine. Technical Report 90-42.
Aha, D., Kibler, D., & Albert, M. (1991). Instance-based learning algorithms. Machine Learning, 6 (1) 37–66.
Chou, P. & Fasman, G. (1978). Prediction of the secondary structure of proteins from their amino acid sequence. Advanced Enzymology, 47, 45–148. Biochemistry, 13, 222–245.
Cohen, F, Abarbanel, R., Kuntz, I., & Fletterick, R. (1986). Turn prediction in proteins using a pattern matching approach. Biochemistry, 25, 266–275.
Cost, S. (1990). Master's thesis, Department of Computer Science, Johns Hopkins University.
Cost, S. & Salzberg, S. (1990). Exemplar-based learning to predict protein folding. Proceedings of the Symposium on Computer Applications to Medical Care (pp. 114–118). Washington, DC.
Cover, T. & Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13 (1), 21–27.
Crick, F. & Asanuma, C. (1986). Certain aspects of the anatomy and physiology of the cerebral cortex. In J. McClelland, D. Rumelhart, & the PDP Research Group (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition (Vol. II). Cambridge, MA: MIT Press.
Dietterich, T., Hild, H., & Bakiri, G. (1990). A comparative study of ID3 and backpropagation for English text-to-speech mapping. Proceedings of the 7th International Conference on Machine Learning (pp. 24–31), San Mateo, CA: Morgan Kaufmann.
Fertig, S. & Gelernter, D. (1991). FGP: A virtual machine for acquiring knowledge from cases. Proceedings of the 12th International Joint Conference on Artificial Intelligence (pp. 796–802). Los Altos, CA: Morgan Kaufmann.
Fisher, D. & McKusick, K. (1989). An empirical comparison of ID3 and backpropagation. Proceedings of the International Joint Conference on Artificial Intelligence (pp. 788–793) San Mateo, CA: Morgan Kaufmann.
Garnier, J., Osguthorpe, D., & Robson, B. (1978). Analysis of the accuracy and implication of simple methods for predicting the secondary structure of globular proteins. Journal of Molecular Biology, 120, 97–120.
Hanson, S. & Burr, D. (1990). What connectionist models learn: Learning and representation in connectionist networks. Behavioral and Brain Sciences, 13 471–518.
Holley, L. & Karplus, M. (1989). Protein secondary structure prediction with a neural network. Proceedings of the National Academy of Sciences USA, 86, 152–156.
Kabsch, W. & Sander, C. (1983). Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometric features. Biopolymers, 22, 2577–2637.
Kontogiorgis, S. (1988). Automatic letter-to-phoneme transcription for speech synthesis (Technical Report JHU-88/22). Department of Computer Science, Johns Hopkins University.
Lathrop, R., Webster, T., & Smith, T. (1987). ARIADNE: Pattern-directed inference and hierarchical abstraction in protein structure recognition. Communications of the ACM, 30 (11), 909–921.
Lim, V. (1974). Algorithms for prediction of alpha-helical and beta-structural regions in globular proteins. Journal of Molecular Biology, 88, 873–894.
Mathews, B.W. (1975). Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica et Biophysica Acta, 405, 442–451.
McClelland, J. & Rumelhart, D. (1986). A distributed model of human learning and memory. In J. McClelland, D. Rumelhart, & the PDP Research Group (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition (Vol. II). Cambridge, MA: MIT Press.
Medin, D. & Schaffer, M. (1978). Context theory of classification learning. Psychological Review, 85 (3) 207–238.
Mooney, R., Shavlik, J., Towell, G., & Gove, A. (1989). An experimental comparison of symbolic and connectionist learning algorithms. Proceedings of the International Joint Conference on Artificial Intelligence (pp. 775–780). San Mateo, CA: Morgan Kaufmann.
Nosofsky, R. (1984). Choice, similarity, and the context theory of classification. Journal of Experimental Psychology: Learning, Memory, and Cognition 10 (1), 104–114.
O'Neill, M. (1989). Escherichia coli promoters: I. Consensus as it relates to spacing class, specificity, repeat substructure, and three dimensional organization. Journal of Biological Chemistry, 264, 5522–5530.
Preparata, F. & Shamos, M. (1985). Computational geometry: An introduction. New York: Springer-Verlag.
Qian, N. & Sejnowski, T. (1988). Predicting the secondary structure of globular proteins using neural network models. Journal of Molecular Biology, 202, 865–884.
Reed, S. (1972). Pattern recognition and categorization. Cognitive Psychology, 3, 382–407.
Rumelhart, D., Hinton, G., & Williams, R. (1986). Learning representations by backpropagating errors. Nature, 323 (9), 533–536.
Rumelhart, D., Smolensky, P., McClelland, J., & Hinton, G. (1986). Schemata and sequential thought processes in PDP models. In J. McClelland, D. Rumelhart, & the PDP Research Group (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition (Vol. II). Cambridge, MA: MIT Press.
Rumelhart, D., McClelland, J., & the PDP Research Group (1986). Parallel distributed processing: Explorations in the microstructure of cognition (Vol. I). Cambridge, MA: MIT Press.
Salzberg, S. (1989). Nested hyper-rectangles for exemplar-based learning. In K.P. Jantke (Ed.), Analogical and Inductive Inference: International Workshop All '89. Berlin: Springer-Verlag.
Salzberg, S. (1990). Learning with nested generalized exemplars. Norwell, MA: Kluwer Academic Publishers.
Salzberg, S. (1991). A nearest hyperrectangle learning method. Machine Learning, 6 (3), 251–276.
Sejnowski, T. & Rosenberg, C. (1987). NETtalk: A parallel network that learns to read aloud. Complex Systems, 1 145–168. (Also Technical Report JHU/EECS-86/01. Baltimore, MD: John Hopkins University.
Shavlik, J., Mooney, R., & Towell, G. (1989). Symbolic and neural learning algorithms: an experimental comparison (Technical Report #857). Madison, WI: Computer Sciences Department, University of Wisconsin.
Sigillito, V. (1989). Personal communication.
Stanfill, C. & Waltz, D. (1986). Toward memory-based reasoning. Communications of the ACM, 29 (12), 1213–1228.
Towell, G., Shavlik, J., & Noordewier, M. (1990). Refinement of approximate domain theories by knowledge-based neural networks. Proceedings Eighth National Conference on Artificial Intelligence (pp. 861–866). Menlo Park, CA: AAAI Press.
Waltz, D. (1990). Massively parallel AI. Proceedings Eighth National Conference on Artificial Intelligence (pp. 1117–1122). Menlo Park, CA: AAAI Press.
Weiss, S. & Kapouleas, I. (1989). An empirical comparison of pattern recognition, neural nets, and machine learning classification methods. Proceedings of the International Joint Conference on Artificial Intelligence (pp. 781–787). San Mateo, CA: Morgan Kaufmann.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Cost, S., Salzberg, S. A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features. Machine Learning 10, 57–78 (1993). https://doi.org/10.1023/A:1022664626993
Issue Date:
DOI: https://doi.org/10.1023/A:1022664626993