Skip to main content

Text Mining - Knowledge extraction from unstructured textual data

  • Conference paper
Advances in Data Science and Classification

Abstract

In the general context of Knowledge Discovery, specific techniques, called Text Mining techniques, are necessary to extract information from unstructured textual data. The extracted information can then be used for the classification of the content of large textual bases. In this paper, we present two examples of information that can be automatically extracted from text collections: probabilistic associations of key-words and prototypical document instances. The Natural Language Processing (NLP) tools necessary for such extractions are also presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Becue M., Peiro R. (1993). Les quasi-segments pour une classification au-tomatique des réponses ouvertes, in Actes des secondes journées inter-nationales d’analyse des données textuelles, (Montpellier), ENST, Paris, 310 - 325.

    Google Scholar 

  • Brill E. (1992). A simple Rule-Based Part-of-Speech Tagger, in Proc. Of the 3rd Conf. on Applied Natural Language Processing.

    Google Scholar 

  • Daille B. (1994). Study and Implementation of Combined Techniques for Automatic Extraction of Terminology, in Proc. of the 32nd Annual Meet¬ing of the Association for Computational Linguistics.

    Google Scholar 

  • Feldman R., Dagan I. and Kloegsen W. (1996). Efficient Algorithm for Mining and Manipulating Associations in Texts, in Proc. of the 13th European Meeting on Cybernetics and Research.

    Google Scholar 

  • Frawley W.J., Piatetsky-Shapiro G., and Matheus C.J. (1991). Knowlegde Discovery in Databases: An Overview, in Knowledge Discovery in atabases, MIT Press, pages 1 - 27.

    Google Scholar 

  • Lafon P. (1980). Sur la variabilité de la fréquence des formes dans uncorpus, Mots, 1, 127 - 165.

    Article  Google Scholar 

  • Lafon P. (1981). Dépouillements et statistiques en lexicométrie, Slatkine-Champion, 1984, Paris.

    Google Scholar 

  • Lebart L., Salem A., Berry L. (1998). Exploring Textual Data, Kluwer Academic Publishers, Dordrecht.

    Google Scholar 

  • MITRE NLP Group (1997). Alembic Language Processing for Intelligence Applications. At URL: http://www.mitre.org/resources/centers/advanced_info/g04h/nl-index.html

  • Rajman M. and Besançon R. (1997). A Lattice Based Algorithm for Text Mining. Technical Report TR-LIA-LN1/97, Swiss Federal Institute of Technology.

    Google Scholar 

  • Salem A. (1987). Pratique des segments répétés, Essai de statistique textuelle, Klincksieck, Paris.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer-Verlag Berlin · Heidelberg

About this paper

Cite this paper

Rajman, M., Besançon, R. (1998). Text Mining - Knowledge extraction from unstructured textual data. In: Rizzi, A., Vichi, M., Bock, HH. (eds) Advances in Data Science and Classification. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-72253-0_64

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-72253-0_64

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-64641-9

  • Online ISBN: 978-3-642-72253-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics