ABSTRACT
Millions of individuals in the Arab world have significant visual impairments that make it difficult for them to access printed text. Assistive technologies such as scanners and screen readers often fail to turn text into speech because optical character recognition software (OCR) has difficulty to interpret the textual content of Arabic documents. In this paper, we show that the inaccessibility of scanned PDF documents is in large part due to the failure of the OCR engine to understand the layout of an Arabic document. Arabic document layout analysis (DLA) is therefore an urgent research topic, motivated by the goal to provide assistive technology that serves people with visual impairments. We announce the launching of a large annotated dataset of Arabic document images, called BCE-Arabic-v1, to be used as a benchmark for DLA, OCR and text-to-speech research. Our dataset contains 1,833 images of pages scanned from 180 books and represents a variety of page content and layout, in particular, Arabic text in various fonts and sizes, photographs, tables, diagrams, and charts in single or multiple columns. We report the results of a formative study that investigated the performance of state-of-the-art document annotation tools. We found significant differences and limitations in the functionality and labeling speed of these tools, and selected the best-performing tool for annotating our benchmark BCE-Arabic-v1.
- A. Alarifi, M. Alghamdi, M. Zarour, B. Aloqail, H. Alraqibah, K. Alsadhan, and L. Alkwai. Estimating the size of Arabic indexed web content. Scientific Research and Essays, 7(28):2472--2483, July 2012.Google Scholar
- A. M. AlMasoud and H. S. Al-Khalifa. Investigating accessibility problems of Arabic PDF documents. In Fourth IEEE International Conference on Information and Communication Technology and Accessibility (ICTA), 2013.Google ScholarCross Ref
- A. Alshameri, S. Abdou, and K. Mostafa. A combined algorithm for layout analysis of Arabic document images and text lines extraction. International Journal of Computer Applications, 49(23), 2012.Google ScholarCross Ref
- Arabic Collections Online, New Year University. http://dlib.nyu.edu/aco, 2016.Google Scholar
- S. Bukhari, F. Shafait, and T. M. Breuel. High performance layout analysis of Arabic and Urdu document images. In International Conference on Document Analysis and Recognition (ICDAR), pages 1275--1279, Sept. 2011. Google ScholarDigital Library
- K. Chen, M. Seuret, H. Wei, M. Liwicki, J. Hennebert, and R. Ingold. Ground truth model, tool, and dataset for layout analysis of historical documents. In Proc. SPIE 9402, Document Recognition and Retrieval XXII, Feb. 2015.Google Scholar
- C. Clausner, S. Pletschacher, and A. Antonacopoulos. Aletheia -- an advanced document layout and text ground-truthing system for production environments. In IEEE International Conference on Document Analysis and Recognition (ICDAR), pages 48--52, Sept. 2011. Google ScholarDigital Library
- D. Doermann, E. Zotkina, and H. Li. GEDI -- a GroundTruthing Environment for Document Images. In Ninth IAPR International Workshop on Document Analysis Systems, June 2010. http://lampsrv02.umiacs.ugmd.edu/projdb/project.php?id=53.Google ScholarDigital Library
- Eye-Pal ROL portable scanner and reader, blindness solutions by FreedomScientific. http://freedom-scientific.com/Products/Blindness, 2016.Google Scholar
- T. Fruchterman. DAFS: A standard for document and image understanding. In Proceedings of Symposium on Document Image Understanding Technology, pages 94--100, Oct. 1995.Google Scholar
- K. Hadjar and R. Ingold. Arabic newspaper page segmentation. In International Conference on Document Analysis and Recognition (ICDAR), Aug. 2003. Google ScholarDigital Library
- K. Hadjar and R. Ingold. Physical layout analysis of complex structured arabic documents using artificial neural nets. In Document Analysis Systems VI, pages 170--178, 2004.Google ScholarCross Ref
- K. Hadjar and R. Ingold. Logical labeling of Arabic newspapers using artificial neural nets. In International Conference on Document Analysis and Recognition (ICDAR), pages 426--430, Aug. 2005. Google ScholarDigital Library
- S. M. Hanif and L. Prevost. Texture based text detection in natural scene images -- a help to blind and visually impaired persons. In Conference on Assistive Technologies for People with Vision & Hearing Impairments, Aug. 2007.Google Scholar
- Islamic Heritage Project, Harvard University. http://ocp.hul.harvard.edu/ihp/scope.html, 2016.Google Scholar
- R. Kasturi, L. O'Gorman, and V. Govindaraju. Document image analysis: A primer. Sandhana, 27(1):3--22, 2002.Google ScholarCross Ref
- W. S. Lasecki, P. Thiha, Y. Zhong, E. Brady, and J. P. Bigham. Answering visual questions with conversational crowd assistants. In Proceedings of the 15th International ACM SIGACCESS Conference on Computers and Accessibility, page 18, 2013. Google ScholarDigital Library
- C. H. Lee and T. Kanunugo. The architecture of TrueViz: a groundTRUth/metadata editing and VisualiZing tool. Pattern Recognition, 36(3), 2003. http://www.kanungo.com/software/software.html#trueviz.Google Scholar
- C. Liu, F. Yin, D. Wang, and Q. Wang. CASIA online and offline Chinese handwriting databases. In International Conference on Document Analysis and Recognition (ICDAR), pages 37--41, Sept. 2011. Google ScholarDigital Library
- OrCam-MyEye, wearable device with a smart camera designed to assist people who are visually impaired. http://www.orcam.com, 2016.Google Scholar
- Pdf accessibility. http://webaim.org/techniques/ acrobat, 2016.Google Scholar
- M. Pechwitz, S. Maddouri, V. Märgner, N. Ellouze, and H. Amiri. IFN/ENIT-database of handwritten Arabic words. In Colloque lnternational francophone sur l'ecrit et le document (CIFED), Hammamet, Tunisie, pages 127--136, Oct. 2002.Google Scholar
- D. Perez, L. Tarazon, S. N., C. F., O. Ramos Terrades, and J. A. The GERMANA database. In International Conference on Document Analysis and Recognition (ICDAR), pages 301--305, 2009. Google ScholarDigital Library
- S. Pletschacher and A. Antonacopoulos. The PAGE (Page Analysis and Ground-Truth Elements) format framework. In 20th International Conference on Pattern Recognition (ICPR), pages 257--260, 2010. Google ScholarDigital Library
- E. Saund, J. Lin, and P. Sarkar. Pixlabeler: User interface for pixel-level labeling of elements in document images. In 10th IEEE International Conference on Document Analysis and Recognition (ICDAR), pages 646--650, July 2009. Google ScholarDigital Library
- S. Schlosser. ERIM Arabic database. document processing research program, information and materials applications laboratory. Technical report, Environmental Research Institute of Michigan, 1995.Google Scholar
- N. Serrano, F. Castro, and A. Juan. The RODRIGO database. In International Conference on Language Resources, pages 2709--2712, May 2010.Google Scholar
- F. Shafait. Geometric Layout Analysis of scanned documents. PhD thesis, Technical University Kaiserslautern, 2008.Google Scholar
- R. Shilkrot, J. Huber, C. Liu, P. Maes, and S. C. Nanayakkara. FingerReader: a wearable device to support text reading on the go. In CHI'14 Extended Abstracts on Human Factors in Computing Systems, pages 2359--2364, 2014. Google ScholarDigital Library
- S. Strassel. Linguistic resources for Arabic handwriting recognition. In The Second International Conference on Arabic Language Resources and Tools, Cairo, Egypt, Apr. 2009.Google Scholar
- S. Tan and J. Zhang. An empirical study of sentiment analysis for Chinese documents. Expert Systems with Applications, 34(4):2622--2629, 2008. Google ScholarDigital Library
- Text Detective by Blindsight, an app for the iPhone and Android that can detect text and read it out aloud. http://blindsight.com/textdetective, 2016.Google Scholar
Index Terms
- BCE-Arabic-v1 dataset: Towards interpreting Arabic document images for people with visual impairments
Recommendations
Making scanned Arabic documents machine accessible using an ensemble of SVM classifiers
Raster-image PDF files originating from scanning or photographing paper documents are inaccessible to both text search engines and screen readers that people with visual impairments use. We here focus on the relatively less-researched problem of ...
Isolated Handwritten Arabic Character Recognition Using Freeman Chain Code and Tangent Line
RACS '17: Proceedings of the International Conference on Research in Adaptive and Convergent SystemsRecognition of handwritten Arabic text is a difficult task since there are many challenges and obstacles that face any handwritten Arabic OCR system. Some of them include, but are not limited to: different handwriting styles, different characters that ...
Recognition of Handwritten Arabic Characters using Histograms of Oriented Gradient (HOG)
Optical Character Recognition (OCR) is the process of recognizing printed or handwritten text on paper documents. This paper proposes an OCR system for Arabic characters. In addition to the preprocessing phase, the proposed recognition system consists ...
Comments