research-article

Open Access

BCE-Arabic-v1 dataset: Towards interpreting Arabic document images for people with visual impairments

Authors:
Rana S.M. Saad

Department of Computers and Systems, Electronics Research Institute, Cairo, Egypt

Department of Computers and Systems, Electronics Research Institute, Cairo, Egypt
View Profile

,
Randa I. Elanwar

Department of Computers and Systems, Electronics Research Institute, Cairo, Egypt and Department of Computer Science, Boston University, USA

Department of Computers and Systems, Electronics Research Institute, Cairo, Egypt and Department of Computer Science, Boston University, USA
View Profile

,
N. S. Abdel Kader

Department of Electronics and Communications Engineering, Cairo University, Egypt

Department of Electronics and Communications Engineering, Cairo University, Egypt
View Profile

,
Samia Mashali

Department of Computer Science, Boston University, USA

Department of Computer Science, Boston University, USA
View Profile

,
Margrit Betke

Department of Computer Science, Boston University, USA

Department of Computer Science, Boston University, USA
View Profile

PETRA '16: Proceedings of the 9th ACM International Conference on PErvasive Technologies Related to Assistive EnvironmentsJune 2016Article No.: 25Pages 1–8https://doi.org/10.1145/2910674.2910725

Published:29 June 2016Publication History

PETRA '16: Proceedings of the 9th ACM International Conference on PErvasive Technologies Related to Assistive Environments

Pages 1–8

ABSTRACT

Millions of individuals in the Arab world have significant visual impairments that make it difficult for them to access printed text. Assistive technologies such as scanners and screen readers often fail to turn text into speech because optical character recognition software (OCR) has difficulty to interpret the textual content of Arabic documents. In this paper, we show that the inaccessibility of scanned PDF documents is in large part due to the failure of the OCR engine to understand the layout of an Arabic document. Arabic document layout analysis (DLA) is therefore an urgent research topic, motivated by the goal to provide assistive technology that serves people with visual impairments. We announce the launching of a large annotated dataset of Arabic document images, called BCE-Arabic-v1, to be used as a benchmark for DLA, OCR and text-to-speech research. Our dataset contains 1,833 images of pages scanned from 180 books and represents a variety of page content and layout, in particular, Arabic text in various fonts and sizes, photographs, tables, diagrams, and charts in single or multiple columns. We report the results of a formative study that investigated the performance of state-of-the-art document annotation tools. We found significant differences and limitations in the functionality and labeling speed of these tools, and selected the best-performing tool for annotating our benchmark BCE-Arabic-v1.

References

A. Alarifi, M. Alghamdi, M. Zarour, B. Aloqail, H. Alraqibah, K. Alsadhan, and L. Alkwai. Estimating the size of Arabic indexed web content. Scientific Research and Essays, 7(28):2472--2483, July 2012.Google Scholar
A. M. AlMasoud and H. S. Al-Khalifa. Investigating accessibility problems of Arabic PDF documents. In Fourth IEEE International Conference on Information and Communication Technology and Accessibility (ICTA), 2013.Google ScholarCross Ref
A. Alshameri, S. Abdou, and K. Mostafa. A combined algorithm for layout analysis of Arabic document images and text lines extraction. International Journal of Computer Applications, 49(23), 2012.Google ScholarCross Ref
Arabic Collections Online, New Year University. http://dlib.nyu.edu/aco, 2016.Google Scholar
S. Bukhari, F. Shafait, and T. M. Breuel. High performance layout analysis of Arabic and Urdu document images. In International Conference on Document Analysis and Recognition (ICDAR), pages 1275--1279, Sept. 2011. Google ScholarDigital Library
K. Chen, M. Seuret, H. Wei, M. Liwicki, J. Hennebert, and R. Ingold. Ground truth model, tool, and dataset for layout analysis of historical documents. In Proc. SPIE 9402, Document Recognition and Retrieval XXII, Feb. 2015.Google Scholar
C. Clausner, S. Pletschacher, and A. Antonacopoulos. Aletheia -- an advanced document layout and text ground-truthing system for production environments. In IEEE International Conference on Document Analysis and Recognition (ICDAR), pages 48--52, Sept. 2011. Google ScholarDigital Library
D. Doermann, E. Zotkina, and H. Li. GEDI -- a GroundTruthing Environment for Document Images. In Ninth IAPR International Workshop on Document Analysis Systems, June 2010. http://lampsrv02.umiacs.ugmd.edu/projdb/project.php?id=53.Google ScholarDigital Library
Eye-Pal ROL portable scanner and reader, blindness solutions by FreedomScientific. http://freedom-scientific.com/Products/Blindness, 2016.Google Scholar
T. Fruchterman. DAFS: A standard for document and image understanding. In Proceedings of Symposium on Document Image Understanding Technology, pages 94--100, Oct. 1995.Google Scholar
K. Hadjar and R. Ingold. Arabic newspaper page segmentation. In International Conference on Document Analysis and Recognition (ICDAR), Aug. 2003. Google ScholarDigital Library
K. Hadjar and R. Ingold. Physical layout analysis of complex structured arabic documents using artificial neural nets. In Document Analysis Systems VI, pages 170--178, 2004.Google ScholarCross Ref
K. Hadjar and R. Ingold. Logical labeling of Arabic newspapers using artificial neural nets. In International Conference on Document Analysis and Recognition (ICDAR), pages 426--430, Aug. 2005. Google ScholarDigital Library
S. M. Hanif and L. Prevost. Texture based text detection in natural scene images -- a help to blind and visually impaired persons. In Conference on Assistive Technologies for People with Vision & Hearing Impairments, Aug. 2007.Google Scholar
Islamic Heritage Project, Harvard University. http://ocp.hul.harvard.edu/ihp/scope.html, 2016.Google Scholar
R. Kasturi, L. O'Gorman, and V. Govindaraju. Document image analysis: A primer. Sandhana, 27(1):3--22, 2002.Google ScholarCross Ref
W. S. Lasecki, P. Thiha, Y. Zhong, E. Brady, and J. P. Bigham. Answering visual questions with conversational crowd assistants. In Proceedings of the 15th International ACM SIGACCESS Conference on Computers and Accessibility, page 18, 2013. Google ScholarDigital Library
C. H. Lee and T. Kanunugo. The architecture of TrueViz: a groundTRUth/metadata editing and VisualiZing tool. Pattern Recognition, 36(3), 2003. http://www.kanungo.com/software/software.html#trueviz.Google Scholar
C. Liu, F. Yin, D. Wang, and Q. Wang. CASIA online and offline Chinese handwriting databases. In International Conference on Document Analysis and Recognition (ICDAR), pages 37--41, Sept. 2011. Google ScholarDigital Library
OrCam-MyEye, wearable device with a smart camera designed to assist people who are visually impaired. http://www.orcam.com, 2016.Google Scholar
Pdf accessibility. http://webaim.org/techniques/ acrobat, 2016.Google Scholar
M. Pechwitz, S. Maddouri, V. Märgner, N. Ellouze, and H. Amiri. IFN/ENIT-database of handwritten Arabic words. In Colloque lnternational francophone sur l'ecrit et le document (CIFED), Hammamet, Tunisie, pages 127--136, Oct. 2002.Google Scholar
D. Perez, L. Tarazon, S. N., C. F., O. Ramos Terrades, and J. A. The GERMANA database. In International Conference on Document Analysis and Recognition (ICDAR), pages 301--305, 2009. Google ScholarDigital Library
S. Pletschacher and A. Antonacopoulos. The PAGE (Page Analysis and Ground-Truth Elements) format framework. In 20th International Conference on Pattern Recognition (ICPR), pages 257--260, 2010. Google ScholarDigital Library
E. Saund, J. Lin, and P. Sarkar. Pixlabeler: User interface for pixel-level labeling of elements in document images. In 10th IEEE International Conference on Document Analysis and Recognition (ICDAR), pages 646--650, July 2009. Google ScholarDigital Library
S. Schlosser. ERIM Arabic database. document processing research program, information and materials applications laboratory. Technical report, Environmental Research Institute of Michigan, 1995.Google Scholar
N. Serrano, F. Castro, and A. Juan. The RODRIGO database. In International Conference on Language Resources, pages 2709--2712, May 2010.Google Scholar
F. Shafait. Geometric Layout Analysis of scanned documents. PhD thesis, Technical University Kaiserslautern, 2008.Google Scholar
R. Shilkrot, J. Huber, C. Liu, P. Maes, and S. C. Nanayakkara. FingerReader: a wearable device to support text reading on the go. In CHI'14 Extended Abstracts on Human Factors in Computing Systems, pages 2359--2364, 2014. Google ScholarDigital Library
S. Strassel. Linguistic resources for Arabic handwriting recognition. In The Second International Conference on Arabic Language Resources and Tools, Cairo, Egypt, Apr. 2009.Google Scholar
S. Tan and J. Zhang. An empirical study of sentiment analysis for Chinese documents. Expert Systems with Applications, 34(4):2622--2629, 2008. Google ScholarDigital Library
Text Detective by Blindsight, an app for the iPhone and Android that can detect text and read it out aloud. http://blindsight.com/textdetective, 2016.Google Scholar

Index Terms

BCE-Arabic-v1 dataset: Towards interpreting Arabic document images for people with visual impairments
1. Applied computing
  1. Document management and text processing
    1. Document capture

Recommendations

Making scanned Arabic documents machine accessible using an ensemble of SVM classifiers

Raster-image PDF files originating from scanning or photographing paper documents are inaccessible to both text search engines and screen readers that people with visual impairments use. We here focus on the relatively less-researched problem of ...
Read More
Isolated Handwritten Arabic Character Recognition Using Freeman Chain Code and Tangent Line
RACS '17: Proceedings of the International Conference on Research in Adaptive and Convergent Systems

Recognition of handwritten Arabic text is a difficult task since there are many challenges and obstacles that face any handwritten Arabic OCR system. Some of them include, but are not limited to: different handwriting styles, different characters that ...
Read More
Recognition of Handwritten Arabic Characters using Histograms of Oriented Gradient (HOG)

Optical Character Recognition (OCR) is the process of recognizing printed or handwritten text on paper documents. This paper proposes an OCR system for Arabic characters. In addition to the preprocessing phase, the proposed recognition system consists ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

PETRA '16: Proceedings of the 9th ACM International Conference on PErvasive Technologies Related to Assistive Environments
June 2016
455 pages
ISBN:9781450343374
DOI:10.1145/2910674

Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 29 June 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Arabic document analysis
Assistive technology for blind users
annotation tools
benchmark
image analysis
optical character recognition (OCR)
page layout analysis
performance evaluation
screen readers
training data
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 7
  Total Citations
  View Citations
- 500
  Total Downloads
- Downloads (Last 12 months)74
- Downloads (Last 6 weeks)19
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

BCE-Arabic-v1 dataset: Towards interpreting Arabic document images for people with visual impairments

PETRA '16: Proceedings of the 9th ACM International Conference on PErvasive Technologies Related to Assistive Environments

ABSTRACT

References

Cited By

Index Terms

Recommendations

Making scanned Arabic documents machine accessible using an ensemble of SVM classifiers

Isolated Handwritten Arabic Character Recognition Using Freeman Chain Code and Tangent Line

Recognition of Handwritten Arabic Characters using Histograms of Oriented Gradient (HOG)

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

BCE-Arabic-v1 dataset: Towards interpreting Arabic document images for people with visual impairments

PETRA '16: Proceedings of the 9th ACM International Conference on PErvasive Technologies Related to Assistive Environments

ABSTRACT

References

Cited By

Index Terms

Recommendations

Making scanned Arabic documents machine accessible using an ensemble of SVM classifiers

Isolated Handwritten Arabic Character Recognition Using Freeman Chain Code and Tangent Line

Recognition of Handwritten Arabic Characters using Histograms of Oriented Gradient (HOG)

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media