Abstract
In the general context of Knowledge Discovery, specific techniques, called Text Mining techniques, are necessary to extract information from unstructured textual data. The extracted information can then be used for the classification of the content of large textual bases. In this paper, we present two examples of information that can be automatically extracted from text collections: probabilistic associations of key-words and prototypical document instances. The Natural Language Processing (NLP) tools necessary for such extractions are also presented.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Becue M., Peiro R. (1993). Les quasi-segments pour une classification au-tomatique des réponses ouvertes, in Actes des secondes journées inter-nationales d’analyse des données textuelles, (Montpellier), ENST, Paris, 310 - 325.
Brill E. (1992). A simple Rule-Based Part-of-Speech Tagger, in Proc. Of the 3rd Conf. on Applied Natural Language Processing.
Daille B. (1994). Study and Implementation of Combined Techniques for Automatic Extraction of Terminology, in Proc. of the 32nd Annual Meet¬ing of the Association for Computational Linguistics.
Feldman R., Dagan I. and Kloegsen W. (1996). Efficient Algorithm for Mining and Manipulating Associations in Texts, in Proc. of the 13th European Meeting on Cybernetics and Research.
Frawley W.J., Piatetsky-Shapiro G., and Matheus C.J. (1991). Knowlegde Discovery in Databases: An Overview, in Knowledge Discovery in atabases, MIT Press, pages 1 - 27.
Lafon P. (1980). Sur la variabilité de la fréquence des formes dans uncorpus, Mots, 1, 127 - 165.
Lafon P. (1981). Dépouillements et statistiques en lexicométrie, Slatkine-Champion, 1984, Paris.
Lebart L., Salem A., Berry L. (1998). Exploring Textual Data, Kluwer Academic Publishers, Dordrecht.
MITRE NLP Group (1997). Alembic Language Processing for Intelligence Applications. At URL: http://www.mitre.org/resources/centers/advanced_info/g04h/nl-index.html
Rajman M. and Besançon R. (1997). A Lattice Based Algorithm for Text Mining. Technical Report TR-LIA-LN1/97, Swiss Federal Institute of Technology.
Salem A. (1987). Pratique des segments répétés, Essai de statistique textuelle, Klincksieck, Paris.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin · Heidelberg
About this paper
Cite this paper
Rajman, M., Besançon, R. (1998). Text Mining - Knowledge extraction from unstructured textual data. In: Rizzi, A., Vichi, M., Bock, HH. (eds) Advances in Data Science and Classification. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-72253-0_64
Download citation
DOI: https://doi.org/10.1007/978-3-642-72253-0_64
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-64641-9
Online ISBN: 978-3-642-72253-0
eBook Packages: Springer Book Archive