Advanced Applications on Bilingual Document Analysis and Processing Systems

Advanced Applications on Bilingual Document Analysis and Processing Systems

Shalini Puri, Satya Prakash Singh
Copyright: © 2022 |Pages: 50
ISBN13: 9781668436905|ISBN10: 1668436906|EISBN13: 9781668436912
DOI: 10.4018/978-1-6684-3690-5.ch032
Cite Chapter Cite Chapter

MLA

Puri, Shalini, and Satya Prakash Singh. "Advanced Applications on Bilingual Document Analysis and Processing Systems." Research Anthology on Bilingual and Multilingual Education, edited by Information Resources Management Association, IGI Global, 2022, pp. 625-674. https://doi.org/10.4018/978-1-6684-3690-5.ch032

APA

Puri, S. & Singh, S. P. (2022). Advanced Applications on Bilingual Document Analysis and Processing Systems. In I. Management Association (Ed.), Research Anthology on Bilingual and Multilingual Education (pp. 625-674). IGI Global. https://doi.org/10.4018/978-1-6684-3690-5.ch032

Chicago

Puri, Shalini, and Satya Prakash Singh. "Advanced Applications on Bilingual Document Analysis and Processing Systems." In Research Anthology on Bilingual and Multilingual Education, edited by Information Resources Management Association, 625-674. Hershey, PA: IGI Global, 2022. https://doi.org/10.4018/978-1-6684-3690-5.ch032

Export Reference

Mendeley
Favorite

Abstract

Today, rapid digitization requires efficient bilingual non-image and image document classification systems. Although many bilingual NLP and image-based systems provide solutions for real-world problems, they primarily focus on text extraction, identification, and recognition tasks with limited document types. This article discusses a journey of these systems and provides an overview of their methods, feature extraction techniques, document sets, classifiers, and accuracy for English-Hindi and other language pairs. The gaps found lead toward the idea of a generic and integrated bilingual English-Hindi document classification system, which classifies heterogeneous documents using a dual class feeder and two character corpora. Its non-image and image modules include pre- and post-processing stages and pre-and post-segmentation stages to classify documents into predefined classes. This article discusses many real-life applications on societal and commercial issues. The analytical results show important findings of existing and proposed systems.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.