Abstract
This study reports the results of a series of experiments in the techniques of automatic document classification. Two different classification schedules are compared along with two methods of automatically classifying documents into categories. It is concluded that, while there is no significant difference in the predictive efficiency between the Bayesian and the Factor Score methods, automatic document classification is enhanced by the use of a factor-analytically-derived classification schedule. Approximately 55 percent of the document were automatically and correctly classified.
- 1 Institute Radio Engineers, 1959. Abstracts of current computer literature. IRE Trans. EC-8, 1, 2, and 3.Google Scholar
- 2 BORKO, I-I. The construction of an empirically based mathematically derived classification system. Proc. Spring Joint Comput. Conf. 21 (1962), 279-289.Google Scholar
- 3 BORKO, H., AND BERNICK, M. Automatic document classification. J. ACM, 10 (1963), 151-102. Google Scholar
- 4 FRUCHTER, B., AND JENNINGS, E. Factor analysis no. 1. In H. BORKO (Ed.), Computer Applications in the Behavioral Sciences, Prentice-Hall, Englewood Cliffs, N. J., 1962.Google Scholar
- 5 HARMAN, H. H. Modern Factor Analysis. U. of Chicago Press, Chicago, Ill., 1960.Google Scholar
- 6 LUHN, H. F. A statistical approach to mechanized encoding and searching of literary information. IBM J. Res. Develop. i (1957), 309-317.Google Scholar
- 7 MARON, M. E. Automatic indexing: an experimental inquiry. J. ACM 8 (1961), 407-4t7. Google Scholar
- 8 OLNAY, J. C. FEAT, an inventory program for information retrieval. FN-4018, System Development Corp., Santa Moniea, Calif., 1960.Google Scholar
Index Terms
- Automatic Document Classification Part II . Additional Experiments
Recommendations
Automatic document classification: natural language processing, statistical analysis, and expert system techniques used together
SIGIR '92: Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrievalIn this paper we describe an automated method of classifying research project descriptions: a human expert classifies a sample set of projects into a set of disjoint and pre-defined classes, and then the computer learns from this sample how to classify ...
Semi-automatic document classification: exploiting document difficulty
ECIR'12: Proceedings of the 34th European conference on Advances in Information RetrievalThere are circumstances where classification is required only if a certain condition, such a specific level of quality, is met. This paper investigates a semi-automatic solution where only the predictions for the documents which are more likely to be ...
Comments