ABSTRACT
In this paper we present an automatic Keyphrase extraction technique for English documents of scientific domain. The devised algorithm uses n-gram filtration technique, which filters sophisticated n-grams {1dnd4} along with their weight from the words of input document. To develop n-gram filtration technique, we have used (1) LZ78 data compression based technique, (2) a simple refinement step, (3) A simple Pattern Filtration algorithm and, (4) a term weighting scheme. In term weighting scheme, we have introduced the importance of position of sentence (where given phrase occurs first) in document and position of phrase in sentence for documents of scientific domain (which is literally more organized than other domains). The entire system is based upon statistical observations, simple grammatical facts, heuristics, and lexical information of English language. We remark that the devised system does not require a learning phase. Our experimental results with publically available text dataset, shows that the devised system is comparable with other known algorithms.
- Khalid Sayood, Introduction to Data Compression, ELSEVIER, 2nd Edition 2000. Google ScholarDigital Library
- Didier Bourigault, "Surface Grammatical Analysis for the Extraction of Keyphrase Terminological Noun Phrases", the 14th International Conference on Computational Linguistics, 1992 Google ScholarDigital Library
- Moro, A. (1997). The raising of predicates. Predicative noun phrases and the theory of clause structure, Cambridge Studies in Linguistics, Cambridge University Press, Cambridge, England.Google ScholarCross Ref
- I.H. Witten, G.W. Paynter, E. Frank, C. Gutwin and C.G. Nevill-Manning, "KEA: Practical automatic Keyphrase Extraction," in proceedings of Digital Libraries '99: The Fourth ACM Conference on Digital Libraries, pp. 254--255, 1999. Google ScholarDigital Library
- Silva, J. F. and Lopes, G. P. "A Local Maxima Method and a Fair Dispersion Normalization for Extracting Multiword Units". the 6th Meeting on the Mathematics of Language, 1999.Google Scholar
- K. T. Frantzi and S. Ananiadou, "The C-value / NC-value domain independent method for multiword for multiword keyphrase extraction". Journal of Natural Language Processing, 1999.Google ScholarCross Ref
- P.D. Turney, "Learning algorithms for keyphrase extraction." Information Retrieval, vol. 2, no. 4, pp. 303--336, 2000. Google ScholarDigital Library
- P.D. Turney, "Learning to Extract Keyphrases from Text," National Research Council,Institute for Information Technology,Technical Report ERB-1057, 1999.Google Scholar
- Hale, K.; Keyser, J. (2002). "Prolegomena to a theory of argument structure", Linguistic Inquiry Monograph, 39, MIT Press, Cambridge, Massachusetts.Google Scholar
- Sui Zhifang, Chen Yirong, and Wei Zhouchao, "Automatic Recognition of Chinese Scientific and technological Keyphrases Using Integrated Linguistic Knowledge", IEEE Conference on Natural Language Processing and Knowledge Engineering, 2003Google Scholar
- M. Chen, , J.-T Sun, H.-J Zeng, and K.-Y Lam. "A practical system of keyphrase extraction for web pages," in proceedings of CIKM'05, pp 277--278, Bremen, Germany, 2005. Google ScholarDigital Library
- D. Kelleher and S. Luz, "Automatic Hypertext Keyphrase Detection," in proceedings of IJCAI'05, Edinburgh, UK, 2005 Google ScholarDigital Library
- Ida m. Pu. Fundamental data Compression, ELSEVIER, 1st edition 2006 Google ScholarDigital Library
- Samhaa R. El-Beltagy; "KP-Miner: A Simple System for Effective Keyphrase Extraction" Innovations in Information Technology, 2006, Nov. 2006 Page(s):1 - 5 Digital Object Identifier 10. 1109/INNOVATIONS.2006.301948Google Scholar
- English Vocabulary: Regular Verbs List (EnglishClub.com)Google Scholar
- "Irregular verbs:English -- Wiktionary", http://en.wiktionary.org/wiki/Appendix:Irregular_verbs:EnglishGoogle Scholar
- Porter Stemming Algorithm for suffix stripping, web -link http://telemat.det.unifi.it/book/2001/wchange/download/stem_porter.htmlGoogle Scholar
- Web link for KEA5.0 source code: http://www.nzdl.org./Kea/download.htmlGoogle Scholar
- Yijiang CHEN, Xiaodong SHI, Changle ZHOU, Chang SU Automatic Keyphrase Extraction from Chinese Books, Eighth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing 2007 IEEE DOI 10.1109/SNPD.2007.193 Google ScholarDigital Library
- Medelyan, O., Witten I. H. (2006) "Thesaurus Based Automatic Keyphrase Indexing." In Proc. of the Joint Conference on Digital Libraries 2006, Chapel Hill, NC, USA, pp. 296--297. Google ScholarDigital Library
- Medelyan, O., Witten I. H. (2005)"Thesaurus-based index term extraction for agricultural documents." In: Proc. of the 6th Agricultural Ontology Service (AOS) workshop at EFITA/WCCA 2005, Vila Real, Portugal.Google Scholar
Index Terms
- Automatic keyphrase extraction from scientific documents using N-gram filtration technique
Recommendations
Domain-specific keyphrase extraction
CIKM '05: Proceedings of the 14th ACM international conference on Information and knowledge managementDocument keyphrases provide semantic metadata characterizing documents and producing an overview of the content of a document. They can be used in many text-mining and knowledge management related applications. This paper describes a Keyphrase ...
Automatic keyphrase extraction for Arabic news documents based on KEA system
A keyphrase is a sequence of words that play an important role in the identification of the topics that are embedded in a given document. Keyphrase extraction is a process which extracts such phrases. This has many important applications such as document ...
Automatic Keyphrase Extraction from Bengali Documents: A Preliminary Study
EAIT '11: Proceedings of the 2011 Second International Conference on Emerging Applications of Information TechnologyKey phrases are sequence of words that capture the main topics covered in a document. The key phrases help readers rapidly understand, organize, access and share information of a document. In this paper, we present a preliminary study on key phrase ...
Comments