Skip to main content

Automatic Topic Identification Using Ontology Hierarchy

  • Conference paper
  • First Online:
Computational Linguistics and Intelligent Text Processing (CICLing 2001)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2004))

Abstract

This paper proposes a method of using ontology hierarchy in automatic topic identification. The fundamental idea behind this work is to exploit an ontology hierarchical structure in order to find a topic of a text. The keywords that are extracted from a given text will be mapped onto their corresponding concepts in the ontology. By optimizing the corresponding concepts, we will pick a single node among the concepts nodes that we believe is the topic of the target text. However, a limited vocabulary problem is encountered while mapping the keywords onto their corresponding concepts. This situation forces us to extend the ontology by enriching each of its concepts with new concepts using the external linguistics knowledge-base (WordNet). Our intuition of a high number keywords mapped onto the ontology concepts is that our topic identification technique can perform at its best.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Banerjee, S., Mittal, V. O.: On the Use of Linguistics Ontologies for Acessing Distributed Digital Libraries. Proceeding of the First Annual Conference on Theory and Practice of Digital Libraries (1994)

    Google Scholar 

  2. Chakrabarti, S., Dom, B., Indyk, P.: Enhanced Hypertext Categorization Using Hyperlinks. ACM SIGMIND, Seattle, Washington (1998)

    Google Scholar 

  3. Chekuri, C., Goldwasser, M. H, Raghavan, P., Upfal, E.: Web Search Using Automated Classification. Poster at the Sixth International World Wide Web Conference (WWW6) (1997)

    Google Scholar 

  4. D’ Alessio, D., Murray, K., Schiaffino, R., Kreshenbaum, A.: Hierarchical Text Categorization. Proceeding RIAO2000 (2000)

    Google Scholar 

  5. D’ Alessio, D., Murray, K., Schiaffino, R., Kreshenbaum, A.: The effect of Topological Structure on Hierarchical Text Categorization. Proceeding of the Sixth Workshop on Very Large Corpora, COLLING ACL’ 98 (1998)

    Google Scholar 

  6. Gövert, N., Lalmas, M., Fuhr, N.: A Probabilistic Description-Oriented Approach for Categorizing Web Document. Proceeding of the Eighth International Conference on Information Knowledge Management, Kansas City, MO USA (1999) 475–482

    Google Scholar 

  7. Gelbukh, A., Sidorov, G., Guzman, A.: A Method of Describing Document Contents through Topic Selection. In Proc. of International Symposium on String Processing and Information Retrieval, Cancun, Mexico. Library of Congress 99-64139, IEEE Computer Society Press (1999)

    Google Scholar 

  8. Gelbukh, A., Sidorov, G., Guzman, A.: Use of a Weighted Topic Hierarchy for Document Classification. In Václav Matoušek et al (eds.): Text, Speech and Dialogue in Poc. 2nd International Workshop. Lecture Notes in Artificial Intelligence, No.92, ISBN 3-540-66494-7, Springer-Verlag., Czech Republic (1999) 130–135

    Google Scholar 

  9. Gelbukh, A., Sidorov, G., Guzman, A.,: Text Categorization Using a Hierarchical Topic Dictionary. Proc. Text Mining Workshop at 16th International Joint Conference on Artificial Intelligence (IJCAI’99), Stockholm, Sweden (1999)

    Google Scholar 

  10. Greiner, R., Grove, A, Schuurmans, D.: On learning hierarchical Classifications (1997)

    Google Scholar 

  11. Grobelnik, M., Mladenic, D.: Fast Categorization. In Proceedings of Third International Conference on Knowledge Discovery Data Mining (1998)

    Google Scholar 

  12. Guzman, A.: Finding the Main Themes in a Spanish Document. Journal Expert Systems with Application (1998) 139–148

    Google Scholar 

  13. Hoenkamp, E.: Spotting Ontological Lacunae through Spectrum Analysis Of Retrieved Documents. 13th European Conference On Artificial Intelligent, ECAI98, Brighton, England (1998)

    Google Scholar 

  14. Koller, D., Sahami, M.: Hierarchically Classifying Documents Using Very Few Words. In the Proceeding of Machine Learning (ICML-97) (1997) 170–176

    Google Scholar 

  15. Lee, J. Shin, D.: Multilevel Automatic Categorization for Webpages. The INET Proceeding’ 98 (1998)

    Google Scholar 

  16. Lin, C. Y, Hovy, E.: Identifying Topics by Position. In the Proceeding of The Workshop of Intelligent Scalable Text Summarization’ 97 (1997)

    Google Scholar 

  17. Lin, C. Y: Knowledge-based Automatic Topic Identification. In the Proceeding of The 33rd Annual Meeting of the Association for Computational Linguistics’ 95 (1995)

    Google Scholar 

  18. McCallum, A., Rosenfeld, R., Mitchell, T., Ng, Y.A.: Improving Text Classification by Shrinkage in a Hierarchy of Classes. Proceeding of the 15th Conference on Machine Learning (ICML-98) (1998)

    Google Scholar 

  19. Miller, G. A, Beckwith, R., Fellbaum, C., Gross, D., Miller, K.: Introduction to WordNet: An-Online Lexical Database. Five Papers on WordNet (1993)

    Google Scholar 

  20. Quek, C. Y, Mitchell, T: Classification of World Wide Web Documents. Seniors Honors Thesis, School of Computer Science, Carnegie Melon University (1998)

    Google Scholar 

  21. Scott, S., Matwin, S.: Text Classification using WordNet Hypernyms. In the Proceeding of Workshop-Usage of WordNet in Natural Language Processing Systems, Montreal, Canada (1998)

    Google Scholar 

  22. Sense Tagger. UTMK Internal Paper. Universiti Sains Malaysia, Penang, Malaysia (1999)

    Google Scholar 

  23. Soderland, S.: Learning to extract text-based information from World Wide Web. In the Proceeding of the Third International Conference on Knowledge Discovery and Data-Mining (1997)

    Google Scholar 

  24. Voorhees, E. M.: On Expanding Query Vectors with Lexically Related Words. Proceeding of the Second Text REtrieval Conference (TREC-2), NIST Special Publication, Gatherburg, Maryland} (1993)

    Google Scholar 

  25. Weigned, A. S, Wiener, E. D, Pedersen, J. O.: Working Papers IS-98-22. Dept. of Info. System, Leonard N. Stern, School Of Business, New York University (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tiun, S., Abdullah, R., Kong, T.E. (2001). Automatic Topic Identification Using Ontology Hierarchy. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2001. Lecture Notes in Computer Science, vol 2004. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44686-9_43

Download citation

  • DOI: https://doi.org/10.1007/3-540-44686-9_43

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-41687-6

  • Online ISBN: 978-3-540-44686-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics