Keyword Extraction from Hindi Documents Using Statistical Approach

Sharan, Aditi; Siddiqi, Sifatullah; Singh, Jagendra

doi:10.1007/978-81-322-2009-1_57

Aditi Sharan⁵,
Sifatullah Siddiqi⁵ &
Jagendra Singh⁵

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 309))

1046 Accesses
3 Citations

Abstract

Keywords of a document give us an idea about its important points without going through the whole text. In this paper, we propose an unsupervised, domain-independent, and corpus-independent approach for automatic keyword extraction. The approach is general and can be applied to any language. However, we have tested the approach on Hindi language. Our approach combines the information contained in frequency and spatial distribution of a word in order to extract keywords from a document. Our work is specially significant in the light that it has been implemented and tested on Hindi which is a resource poor and underrepresented language.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Salton, G., Buckley, C.: Weighting approaches in automatic text retrieval. Inf. Process. Manage. 24(5), 513–523 (1988)
Article Google Scholar
Luhn, H.P.: A statistical approach to mechanized encoding and searching of literary information. IBM J. Res. Dev. 1(4), 309–317 (1957)
Article MathSciNet Google Scholar
Jones, K.: A statistical interpretation of term specificity and its application in retrieval. J. Documentation 28(1), 11–21 (1972)
Article Google Scholar
Herrera, J.P., Pury, P.A.: Statistical keyword detection in literary corpora. Eur. Phys. J. B. 63(1), 135–146 (2008)
Google Scholar
Ortuño, M., Carpena, P., Bernaola-Galván, P., Muñoz, E., Somoza, A.M.: Keyword detection in natural languages and DNA. Europhys. Lett. 57, 759–764 (2002)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Jawaharlal Nehru University, New Delhi, India
Aditi Sharan, Sifatullah Siddiqi & Jagendra Singh

Authors

Aditi Sharan
View author publications
You can also search for this author in PubMed Google Scholar
Sifatullah Siddiqi
View author publications
You can also search for this author in PubMed Google Scholar
Jagendra Singh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aditi Sharan .

Editor information

Editors and Affiliations

University of Canberra, Faculty of Education, Science, Technology and Mathematics, Canberra, Australia, and University of South Australia, Adelaide, South Australia, Australia
Lakhmi C. Jain
Department of Computer Science and Engin, SOA University, Bhubaneswar, Odisha, India
Srikanta Patnaik
Department of Premier and Cabinet, Office of the Chief Information Officer, Adelaide, South Australia, Australia
Nikhil Ichalkaranje

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sharan, A., Siddiqi, S., Singh, J. (2015). Keyword Extraction from Hindi Documents Using Statistical Approach. In: Jain, L., Patnaik, S., Ichalkaranje, N. (eds) Intelligent Computing, Communication and Devices. Advances in Intelligent Systems and Computing, vol 309. Springer, New Delhi. https://doi.org/10.1007/978-81-322-2009-1_57

Download citation

DOI: https://doi.org/10.1007/978-81-322-2009-1_57
Published: 29 August 2014
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-2008-4
Online ISBN: 978-81-322-2009-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics