skip to main content
10.1145/1410140.1410180acmconferencesArticle/Chapter ViewAbstractPublication PagesdocengConference Proceedingsconference-collections
research-article

Automatic keyphrase extraction from scientific documents using N-gram filtration technique

Published:16 September 2008Publication History

ABSTRACT

In this paper we present an automatic Keyphrase extraction technique for English documents of scientific domain. The devised algorithm uses n-gram filtration technique, which filters sophisticated n-grams {1dnd4} along with their weight from the words of input document. To develop n-gram filtration technique, we have used (1) LZ78 data compression based technique, (2) a simple refinement step, (3) A simple Pattern Filtration algorithm and, (4) a term weighting scheme. In term weighting scheme, we have introduced the importance of position of sentence (where given phrase occurs first) in document and position of phrase in sentence for documents of scientific domain (which is literally more organized than other domains). The entire system is based upon statistical observations, simple grammatical facts, heuristics, and lexical information of English language. We remark that the devised system does not require a learning phase. Our experimental results with publically available text dataset, shows that the devised system is comparable with other known algorithms.

References

  1. Khalid Sayood, Introduction to Data Compression, ELSEVIER, 2nd Edition 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Didier Bourigault, "Surface Grammatical Analysis for the Extraction of Keyphrase Terminological Noun Phrases", the 14th International Conference on Computational Linguistics, 1992 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Moro, A. (1997). The raising of predicates. Predicative noun phrases and the theory of clause structure, Cambridge Studies in Linguistics, Cambridge University Press, Cambridge, England.Google ScholarGoogle ScholarCross RefCross Ref
  4. I.H. Witten, G.W. Paynter, E. Frank, C. Gutwin and C.G. Nevill-Manning, "KEA: Practical automatic Keyphrase Extraction," in proceedings of Digital Libraries '99: The Fourth ACM Conference on Digital Libraries, pp. 254--255, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Silva, J. F. and Lopes, G. P. "A Local Maxima Method and a Fair Dispersion Normalization for Extracting Multiword Units". the 6th Meeting on the Mathematics of Language, 1999.Google ScholarGoogle Scholar
  6. K. T. Frantzi and S. Ananiadou, "The C-value / NC-value domain independent method for multiword for multiword keyphrase extraction". Journal of Natural Language Processing, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  7. P.D. Turney, "Learning algorithms for keyphrase extraction." Information Retrieval, vol. 2, no. 4, pp. 303--336, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. P.D. Turney, "Learning to Extract Keyphrases from Text," National Research Council,Institute for Information Technology,Technical Report ERB-1057, 1999.Google ScholarGoogle Scholar
  9. Hale, K.; Keyser, J. (2002). "Prolegomena to a theory of argument structure", Linguistic Inquiry Monograph, 39, MIT Press, Cambridge, Massachusetts.Google ScholarGoogle Scholar
  10. Sui Zhifang, Chen Yirong, and Wei Zhouchao, "Automatic Recognition of Chinese Scientific and technological Keyphrases Using Integrated Linguistic Knowledge", IEEE Conference on Natural Language Processing and Knowledge Engineering, 2003Google ScholarGoogle Scholar
  11. M. Chen, , J.-T Sun, H.-J Zeng, and K.-Y Lam. "A practical system of keyphrase extraction for web pages," in proceedings of CIKM'05, pp 277--278, Bremen, Germany, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D. Kelleher and S. Luz, "Automatic Hypertext Keyphrase Detection," in proceedings of IJCAI'05, Edinburgh, UK, 2005 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Ida m. Pu. Fundamental data Compression, ELSEVIER, 1st edition 2006 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Samhaa R. El-Beltagy; "KP-Miner: A Simple System for Effective Keyphrase Extraction" Innovations in Information Technology, 2006, Nov. 2006 Page(s):1 - 5 Digital Object Identifier 10. 1109/INNOVATIONS.2006.301948Google ScholarGoogle Scholar
  15. English Vocabulary: Regular Verbs List (EnglishClub.com)Google ScholarGoogle Scholar
  16. "Irregular verbs:English -- Wiktionary", http://en.wiktionary.org/wiki/Appendix:Irregular_verbs:EnglishGoogle ScholarGoogle Scholar
  17. Porter Stemming Algorithm for suffix stripping, web -link http://telemat.det.unifi.it/book/2001/wchange/download/stem_porter.htmlGoogle ScholarGoogle Scholar
  18. Web link for KEA5.0 source code: http://www.nzdl.org./Kea/download.htmlGoogle ScholarGoogle Scholar
  19. Yijiang CHEN, Xiaodong SHI, Changle ZHOU, Chang SU Automatic Keyphrase Extraction from Chinese Books, Eighth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing 2007 IEEE DOI 10.1109/SNPD.2007.193 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Medelyan, O., Witten I. H. (2006) "Thesaurus Based Automatic Keyphrase Indexing." In Proc. of the Joint Conference on Digital Libraries 2006, Chapel Hill, NC, USA, pp. 296--297. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Medelyan, O., Witten I. H. (2005)"Thesaurus-based index term extraction for agricultural documents." In: Proc. of the 6th Agricultural Ontology Service (AOS) workshop at EFITA/WCCA 2005, Vila Real, Portugal.Google ScholarGoogle Scholar

Index Terms

  1. Automatic keyphrase extraction from scientific documents using N-gram filtration technique

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          DocEng '08: Proceedings of the eighth ACM symposium on Document engineering
          September 2008
          312 pages
          ISBN:9781605580814
          DOI:10.1145/1410140

          Copyright © 2008 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 16 September 2008

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          DocEng '08 Paper Acceptance Rate21of62submissions,34%Overall Acceptance Rate178of537submissions,33%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader