Mining Text Using Keyword Distributions

Feldman, Ronen; Dagan, Ido; Hirsh, Haym

doi:10.1023/A:1008623632443

Mining Text Using Keyword Distributions

Published: May 1998

Volume 10, pages 281–300, (1998)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Ronen Feldman¹,
Ido Dagan¹ &
Haym Hirsh²

574 Accesses
82 Citations
3 Altmetric
Explore all metrics

Abstract

Knowledge Discovery in Databases (KDD) focuses on the computerized exploration of large amounts of data and on the discovery of interesting patterns within them. While most work on KDD has been concerned with structured databases, there has been little work on handling the huge amount of information that is available only in unstructured textual form. This paper describes the KDT system for Knowledge Discovery in Text, in which documents are labeled by keywords, and knowledge discovery is performed by analyzing the co-occurrence frequencies of the various keywords labeling the documents. We show how this keyword-frequency approach supports a range of KDD operations, providing a suitable foundation for knowledge discovery and exploration for collections of unstructured text.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Agrawal, R., Imielinski, T., and Swami, A. Mining association rules between sets of items in large databases. In Proceedings of the ACM SIGMOD Conference on Management of Data (pp. 207-216).
Anand, T. and Kahn, G. (1993). Opportunity explorer: Navigating large databases using knowledge discovery templates. In Proceedings of the 1993 workshop on Knowledge Discovery in Databases.
Apte, C., Damerau, F., and Weiss, S. (1994). Towards language independent automated learning of text categorization models. In Proceedings of ACM-SIGIR Conference on Information Retrieval.
Brachman, R., Selfridge, P., Terveen, L., Altman, B., Borgida, A., Halper, F., Kirk, T., Lazar, A., McGuinness, D., and Resnick, L. (1993). Integrated Support for Data Archeology. International Journal of Intelligent and Cooperative Information Systems.
Cover, T.M. and Thomas, J.A. (1991). Elements of Information Theory, John Wiley and Sons.
Cutting, C., Karger, D., and Pedersen, J. (1993). Constant interaction-time scatter/gather browsing of very large document collections. In Proceedings of ACM-SIGIR Conference on Information Retrieval.
Dagan, I., Pereira, F., and Lee, L. (1994). Similarity-based estimation of word co-occurrence probabilities. In Proceedings of the Annual Meeting of the ACL (pp. 272-278).
Dagan, I., Feldman, R., and Hirsh, H. (1996). Keyword-based browsing and analysis of large document sets. To appear In Proceedings of the Fifth Annual Symposium on Document Analysis and Information Retrieval (SDAIR-96). Las Vegas.
Ezawa, K. and Norton, S. (1995). Knowledge discovery in telecommunication services data using Bayesian Network Models. In Proceedings of the First International Conference on Knowledge Discovery (KDD-95).
Feldman, R. (1996). The KDT system-using prolog for KDD. To appear In Proceedings of PAP'96 (Practical Applications of Prolog). London, UK.
Feldman, R. and Dagan, I. (1995). KDT-Knowledge discovery in texts. In Proceedings of the First International Conference on Knowledge Discovery (KDD-95).
Feldman, R., Dagan, I., and Klöesgen, W. KDD tools for mining associations in textual databases. To appear. In Proceedings of the 9th International Symposium on Methodologies for Intelligent Systems.
Feldman, R., Dagan, I., and Klöesgen, W. (1996). Efficient algorithms for mining and manipulating associations in texts. To appear, Research and Cybernetics.
Finch, S. (1994). Exploiting sophisticated representations for document retrieval. In Proceedings of the 4th Conference on Applied Natural Language Processing.
Frawley, W.J., Piatetsky-Shapiro, G., and Matheus, C.J. (1991). Knowledge Discovery in Databases: An Overview. In G. Piatetsky-Shapiro and W.J. Frawley (Eds.), Knowledge Discovery in Databases. MIT Press, pp. 1-27.
Han, J. and Fu, Y. (1995). Discovery of multiple-level association rules from large databases. In Proc. of 1995 Int. Conf. on Very Large Data Bases (VLDB'95) (pp. 420-431). Zürich, Switzerland.
Hearst, M. (1995). Tilebars: Visualization of term distribution information in full text information access. In Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems. Denver, CO: ACM.
Google Scholar
Iwayama, M. and Tokunaga, T. (1994). A probabilistic model for text categorization based on a single random variable with multiple values. In Proceedings of the 4th Conference on Applied Natural Language Processing.
Jacobs, P. (1992). Joining statistics with NLP for text categorization. In Proceedings of the 3rd Conference on Applied Natural Language Processing.
Klösgen, W. (1992). Problems for Knowledge Discovery in Databases and Their Treatment in the Statistics Interpreter EXPLORA, International Journal for Intelligent Systems, 7(7), 649-673
Google Scholar
Klösgen, W. (1995a). EXPLORA: A Multipattern and Multistrategy Discovery Assistant. In U. Fayyad, G. Piatetsky-Shapiro, and R. Smyth (Eds.), Advances in Knowledge Discovery and Data Mining. Cambridge, MA: MIT Press, pp. 2249-271.
Google Scholar
Klösgen, W. (1995b). Efficient Discovery of Interesting Statements in Databases, Journal of Intelligent Information Systems, 4, 53-69.
Google Scholar
Lewis, D. (1992). An evaluation of phrasal and clustered representations on a text categorization problem. In Proceedings of ACM-SIGIR Conference on Information Retrieval.
Lewis, D. and Catlett, J. (1994). Heterogeneous uncertainty sampling for supervised learning. In Proceedings of the 11th International Conference on Machine Learning.
Mannila, H., Toivonen, H., and Verkamo, A. Efficient algorithms for discovering association rules. In KDD-94: AAAI workshop on Knowledge Discovery in Databases (pp. 181-192).
Salton, G. (1989). Automatic Text Processing, Addison-Wesley Publishing Company.
Srikant, R. and Agrawal, R. 1995.Mining generalized association rules. In Proc. of the 21st Int'l Conference on Very Large Databases. Zurich, Switzerland, Sept. 1995. Expanded version available as IBM Research Report RJ 9963.
Toivonen, H., Klemettinen, M., Ronkainen, P., Hatonen, K., and Mannila, H., Pruning and grouping discovered association rules. In Worksop Notes Statistics, Machine Learning and Knowledge Discovery in Databases, ECML-95.
Williamson, C. and Shneiderman, B. (1992). The dynamic HomeFinder: Evaluating dynamic queries in a real-estate information exploration system. In Proceedings of ACM-SIGIR.

Download references

Author information

Authors and Affiliations

Department of Mathematics and Computer Science Department, Bar-Ilan University, Ramat-Gan, ISRAEL
Ronen Feldman & Ido Dagan
Deptartment of Computer Science, Rutgers University, Piscataway, NJ, USA, 08855
Haym Hirsh

Authors

Ronen Feldman
View author publications
You can also search for this author in PubMed Google Scholar
Ido Dagan
View author publications
You can also search for this author in PubMed Google Scholar
Haym Hirsh
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Feldman, R., Dagan, I. & Hirsh, H. Mining Text Using Keyword Distributions. Journal of Intelligent Information Systems 10, 281–300 (1998). https://doi.org/10.1023/A:1008623632443

Download citation

Issue Date: May 1998
DOI: https://doi.org/10.1023/A:1008623632443

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mining Text Using Keyword Distributions

Abstract

Access this article

Similar content being viewed by others

Introduction to Pattern Mining

A Review on Knowledge Discovery from Databases

An Automatic Construction of Concept Maps Based on Statistical Text Mining

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Mining Text Using Keyword Distributions

Abstract

Access this article

Similar content being viewed by others

Introduction to Pattern Mining

A Review on Knowledge Discovery from Databases

An Automatic Construction of Concept Maps Based on Statistical Text Mining

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation