Text Mining - Knowledge extraction from unstructured textual data

Rajman, Martin; Besançon, Romaric

doi:10.1007/978-3-642-72253-0_64

Martin Rajman⁸ &
Romaric Besançon⁸

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

997 Accesses
26 Citations

Abstract

In the general context of Knowledge Discovery, specific techniques, called Text Mining techniques, are necessary to extract information from unstructured textual data. The extracted information can then be used for the classification of the content of large textual bases. In this paper, we present two examples of information that can be automatically extracted from text collections: probabilistic associations of key-words and prototypical document instances. The Natural Language Processing (NLP) tools necessary for such extractions are also presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Becue M., Peiro R. (1993). Les quasi-segments pour une classification au-tomatique des réponses ouvertes, in Actes des secondes journées inter-nationales d’analyse des données textuelles, (Montpellier), ENST, Paris, 310 - 325.
Google Scholar
Brill E. (1992). A simple Rule-Based Part-of-Speech Tagger, in Proc. Of the 3rd Conf. on Applied Natural Language Processing.
Google Scholar
Daille B. (1994). Study and Implementation of Combined Techniques for Automatic Extraction of Terminology, in Proc. of the 32nd Annual Meet¬ing of the Association for Computational Linguistics.
Google Scholar
Feldman R., Dagan I. and Kloegsen W. (1996). Efficient Algorithm for Mining and Manipulating Associations in Texts, in Proc. of the 13th European Meeting on Cybernetics and Research.
Google Scholar
Frawley W.J., Piatetsky-Shapiro G., and Matheus C.J. (1991). Knowlegde Discovery in Databases: An Overview, in Knowledge Discovery in atabases, MIT Press, pages 1 - 27.
Google Scholar
Lafon P. (1980). Sur la variabilité de la fréquence des formes dans uncorpus, Mots, 1, 127 - 165.
Article Google Scholar
Lafon P. (1981). Dépouillements et statistiques en lexicométrie, Slatkine-Champion, 1984, Paris.
Google Scholar
Lebart L., Salem A., Berry L. (1998). Exploring Textual Data, Kluwer Academic Publishers, Dordrecht.
Google Scholar
MITRE NLP Group (1997). Alembic Language Processing for Intelligence Applications. At URL: http://www.mitre.org/resources/centers/advanced_info/g04h/nl-index.html
Rajman M. and Besançon R. (1997). A Lattice Based Algorithm for Text Mining. Technical Report TR-LIA-LN1/97, Swiss Federal Institute of Technology.
Google Scholar
Salem A. (1987). Pratique des segments répétés, Essai de statistique textuelle, Klincksieck, Paris.
Google Scholar

Download references

Author information

Authors and Affiliations

Artificial Intelligence Laboratory Computer Science Dpt, Swiss Federal Institute of Technology, Switzerland
Martin Rajman & Romaric Besançon

Authors

Martin Rajman
View author publications
You can also search for this author in PubMed Google Scholar
Romaric Besançon
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dipartimento di Statisticà, Probabilità e Statistiche Applicate, Università di Roma “La Sapienza”, Piazzale Aldo Moro 5, I-00185, Roma, Italia
Alfredo Rizzi
Dipartimento di Metodi Quantitativi e Teoria Economica, Università “G. D’Annunzio” di Chieti, Viale Pindaro 42, I-65127, Pescara, Italia
Maurizio Vichi
Institut für Statistik und Wirtschaftsmathematik, Rheinisch-Westfälische Technische Hochschule (RWTH) Aachen, Wüllnerstraße 3, D-52056, Aachen, Germany
Hans-Hermann Bock

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rajman, M., Besançon, R. (1998). Text Mining - Knowledge extraction from unstructured textual data. In: Rizzi, A., Vichi, M., Bock, HH. (eds) Advances in Data Science and Classification. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-72253-0_64

Download citation

DOI: https://doi.org/10.1007/978-3-642-72253-0_64
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-64641-9
Online ISBN: 978-3-642-72253-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics