A Heuristic Approach for Designing Regional Language Based Raw–Text Extractor and Unicode Font–Mapping Tool

Bhattacharyya, Debnath; Das, Poulami; Ganguly, Debashis; Mitra, Kheyali; Mukherjee, Swarnendu; Bandyopadhyay, Samir Kumar; Kim, Tai-hoon

doi:10.1007/978-3-642-10238-7_1

Debnath Bhattacharyya¹⁰,
Poulami Das¹⁰,
Debashis Ganguly¹⁰,
Kheyali Mitra¹⁰,
Swarnendu Mukherjee¹⁰,
Samir Kumar Bandyopadhyay¹¹ &
…
Tai-hoon Kim¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 28))

Included in the following conference series:

International Conference on Future Generation Communication and Networking

322 Accesses

Abstract

Information Extraction (IE) is a type of information retrieval meant for extracting structured information. In general, the information on the web is well structured in HTML or XML format. And IE will be there to structure these documents, by using learning techniques for pattern matching in the content. A typical application of IE is to scan a set of documents written in a natural language and populate a database with the information extracted. In this paper, we have concentrated our research work to give a heuristic approach for interactive information extraction technique where the information is in Indian Regional Language. This enables any naive user to extract regional language (Indian) based document from a web document efficiently. It is just similar to a pre-programmed information extraction engine.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

http://en.wikipedia.org/wiki/ISO_8859 (visited on 12/12/2008)
http://aspell.net/charsets/iso8859.html (visited on 12/12/2008)
http://www.terena.org/activities/multiling/ml-docs/iso-8859.html (visited on 13/12/2008)
Raj, A.A., Prahallad, K.: Identification and Conversion on Font-Data in Indian Languages. Tech. Reports, International Institute of Information Technology, http://www.iiit.net/techreports/2008_1.pdf
Madhavi, G., Balakrishnan, M., Balakrishnan, N., Reddy, R.: Om: One tool for many (Indian) Languages. J. Zhejiang University Science 6A(11), 1348–1353 (2005), tera-3.ul.cs.cmu.edu/conference/2005/16.pdf
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science and Engineering Department, Heritage Institute of Technology, Kolkata, 700107, India
Debnath Bhattacharyya, Poulami Das, Debashis Ganguly, Kheyali Mitra & Swarnendu Mukherjee
Department of Computer Science and Engineering, University of Calcutta, Kolkata, 700009, India
Samir Kumar Bandyopadhyay
Hannam University, Daejeon, 306791, Korea
Tai-hoon Kim

Authors

Debnath Bhattacharyya
View author publications
You can also search for this author in PubMed Google Scholar
Poulami Das
View author publications
You can also search for this author in PubMed Google Scholar
Debashis Ganguly
View author publications
You can also search for this author in PubMed Google Scholar
Kheyali Mitra
View author publications
You can also search for this author in PubMed Google Scholar
Swarnendu Mukherjee
View author publications
You can also search for this author in PubMed Google Scholar
Samir Kumar Bandyopadhyay
View author publications
You can also search for this author in PubMed Google Scholar
Tai-hoon Kim
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Hannam University, 306-791, Daejeon, South Korea
Tai-hoon Kim
Department of Computer Science, St. Francis Xavier University, Antigonish, NS, Canada
Laurence T. Yang
Department of Computer Science and Engineering, Masan, Kyungnam University, Kyungnam, South Korea
Jong Hyuk Park
National Chung Cheng University, Chiayi County, Taiwan
Alan Chin-Chen Chang
University of Western Macedonia, West Macedonia, Greece
Thanos Vasilakos
University of Western Sydney, Penrith South, NSW, Australia
Yan Zhang
University of Limoges/CNRS, Site Jidé, 83 rue d’Isle, 87000, Limoges, France
Damien Sauveron
University of Plymouth, Plymouth, UK
Xingang Wang
Wonkwang University, Iksan Chonbuk, South Korea
Young-Sik Jeong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bhattacharyya, D. et al. (2009). A Heuristic Approach for Designing Regional Language Based Raw–Text Extractor and Unicode Font–Mapping Tool. In: Kim, Th., et al. Advances in Computational Science and Engineering. FGCN 2008. Communications in Computer and Information Science, vol 28. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10238-7_1

Download citation

DOI: https://doi.org/10.1007/978-3-642-10238-7_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-10237-0
Online ISBN: 978-3-642-10238-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics