skip to main content
10.1145/2910674.2910725acmotherconferencesArticle/Chapter ViewAbstractPublication PagespetraConference Proceedingsconference-collections
research-article
Open Access

BCE-Arabic-v1 dataset: Towards interpreting Arabic document images for people with visual impairments

Authors Info & Claims
Published:29 June 2016Publication History

ABSTRACT

Millions of individuals in the Arab world have significant visual impairments that make it difficult for them to access printed text. Assistive technologies such as scanners and screen readers often fail to turn text into speech because optical character recognition software (OCR) has difficulty to interpret the textual content of Arabic documents. In this paper, we show that the inaccessibility of scanned PDF documents is in large part due to the failure of the OCR engine to understand the layout of an Arabic document. Arabic document layout analysis (DLA) is therefore an urgent research topic, motivated by the goal to provide assistive technology that serves people with visual impairments. We announce the launching of a large annotated dataset of Arabic document images, called BCE-Arabic-v1, to be used as a benchmark for DLA, OCR and text-to-speech research. Our dataset contains 1,833 images of pages scanned from 180 books and represents a variety of page content and layout, in particular, Arabic text in various fonts and sizes, photographs, tables, diagrams, and charts in single or multiple columns. We report the results of a formative study that investigated the performance of state-of-the-art document annotation tools. We found significant differences and limitations in the functionality and labeling speed of these tools, and selected the best-performing tool for annotating our benchmark BCE-Arabic-v1.

References

  1. A. Alarifi, M. Alghamdi, M. Zarour, B. Aloqail, H. Alraqibah, K. Alsadhan, and L. Alkwai. Estimating the size of Arabic indexed web content. Scientific Research and Essays, 7(28):2472--2483, July 2012.Google ScholarGoogle Scholar
  2. A. M. AlMasoud and H. S. Al-Khalifa. Investigating accessibility problems of Arabic PDF documents. In Fourth IEEE International Conference on Information and Communication Technology and Accessibility (ICTA), 2013.Google ScholarGoogle ScholarCross RefCross Ref
  3. A. Alshameri, S. Abdou, and K. Mostafa. A combined algorithm for layout analysis of Arabic document images and text lines extraction. International Journal of Computer Applications, 49(23), 2012.Google ScholarGoogle ScholarCross RefCross Ref
  4. Arabic Collections Online, New Year University. http://dlib.nyu.edu/aco, 2016.Google ScholarGoogle Scholar
  5. S. Bukhari, F. Shafait, and T. M. Breuel. High performance layout analysis of Arabic and Urdu document images. In International Conference on Document Analysis and Recognition (ICDAR), pages 1275--1279, Sept. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. K. Chen, M. Seuret, H. Wei, M. Liwicki, J. Hennebert, and R. Ingold. Ground truth model, tool, and dataset for layout analysis of historical documents. In Proc. SPIE 9402, Document Recognition and Retrieval XXII, Feb. 2015.Google ScholarGoogle Scholar
  7. C. Clausner, S. Pletschacher, and A. Antonacopoulos. Aletheia -- an advanced document layout and text ground-truthing system for production environments. In IEEE International Conference on Document Analysis and Recognition (ICDAR), pages 48--52, Sept. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. Doermann, E. Zotkina, and H. Li. GEDI -- a GroundTruthing Environment for Document Images. In Ninth IAPR International Workshop on Document Analysis Systems, June 2010. http://lampsrv02.umiacs.ugmd.edu/projdb/project.php?id=53.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Eye-Pal ROL portable scanner and reader, blindness solutions by FreedomScientific. http://freedom-scientific.com/Products/Blindness, 2016.Google ScholarGoogle Scholar
  10. T. Fruchterman. DAFS: A standard for document and image understanding. In Proceedings of Symposium on Document Image Understanding Technology, pages 94--100, Oct. 1995.Google ScholarGoogle Scholar
  11. K. Hadjar and R. Ingold. Arabic newspaper page segmentation. In International Conference on Document Analysis and Recognition (ICDAR), Aug. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. K. Hadjar and R. Ingold. Physical layout analysis of complex structured arabic documents using artificial neural nets. In Document Analysis Systems VI, pages 170--178, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  13. K. Hadjar and R. Ingold. Logical labeling of Arabic newspapers using artificial neural nets. In International Conference on Document Analysis and Recognition (ICDAR), pages 426--430, Aug. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. M. Hanif and L. Prevost. Texture based text detection in natural scene images -- a help to blind and visually impaired persons. In Conference on Assistive Technologies for People with Vision & Hearing Impairments, Aug. 2007.Google ScholarGoogle Scholar
  15. Islamic Heritage Project, Harvard University. http://ocp.hul.harvard.edu/ihp/scope.html, 2016.Google ScholarGoogle Scholar
  16. R. Kasturi, L. O'Gorman, and V. Govindaraju. Document image analysis: A primer. Sandhana, 27(1):3--22, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  17. W. S. Lasecki, P. Thiha, Y. Zhong, E. Brady, and J. P. Bigham. Answering visual questions with conversational crowd assistants. In Proceedings of the 15th International ACM SIGACCESS Conference on Computers and Accessibility, page 18, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C. H. Lee and T. Kanunugo. The architecture of TrueViz: a groundTRUth/metadata editing and VisualiZing tool. Pattern Recognition, 36(3), 2003. http://www.kanungo.com/software/software.html#trueviz.Google ScholarGoogle Scholar
  19. C. Liu, F. Yin, D. Wang, and Q. Wang. CASIA online and offline Chinese handwriting databases. In International Conference on Document Analysis and Recognition (ICDAR), pages 37--41, Sept. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. OrCam-MyEye, wearable device with a smart camera designed to assist people who are visually impaired. http://www.orcam.com, 2016.Google ScholarGoogle Scholar
  21. Pdf accessibility. http://webaim.org/techniques/ acrobat, 2016.Google ScholarGoogle Scholar
  22. M. Pechwitz, S. Maddouri, V. Märgner, N. Ellouze, and H. Amiri. IFN/ENIT-database of handwritten Arabic words. In Colloque lnternational francophone sur l'ecrit et le document (CIFED), Hammamet, Tunisie, pages 127--136, Oct. 2002.Google ScholarGoogle Scholar
  23. D. Perez, L. Tarazon, S. N., C. F., O. Ramos Terrades, and J. A. The GERMANA database. In International Conference on Document Analysis and Recognition (ICDAR), pages 301--305, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. S. Pletschacher and A. Antonacopoulos. The PAGE (Page Analysis and Ground-Truth Elements) format framework. In 20th International Conference on Pattern Recognition (ICPR), pages 257--260, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. E. Saund, J. Lin, and P. Sarkar. Pixlabeler: User interface for pixel-level labeling of elements in document images. In 10th IEEE International Conference on Document Analysis and Recognition (ICDAR), pages 646--650, July 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. Schlosser. ERIM Arabic database. document processing research program, information and materials applications laboratory. Technical report, Environmental Research Institute of Michigan, 1995.Google ScholarGoogle Scholar
  27. N. Serrano, F. Castro, and A. Juan. The RODRIGO database. In International Conference on Language Resources, pages 2709--2712, May 2010.Google ScholarGoogle Scholar
  28. F. Shafait. Geometric Layout Analysis of scanned documents. PhD thesis, Technical University Kaiserslautern, 2008.Google ScholarGoogle Scholar
  29. R. Shilkrot, J. Huber, C. Liu, P. Maes, and S. C. Nanayakkara. FingerReader: a wearable device to support text reading on the go. In CHI'14 Extended Abstracts on Human Factors in Computing Systems, pages 2359--2364, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. S. Strassel. Linguistic resources for Arabic handwriting recognition. In The Second International Conference on Arabic Language Resources and Tools, Cairo, Egypt, Apr. 2009.Google ScholarGoogle Scholar
  31. S. Tan and J. Zhang. An empirical study of sentiment analysis for Chinese documents. Expert Systems with Applications, 34(4):2622--2629, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Text Detective by Blindsight, an app for the iPhone and Android that can detect text and read it out aloud. http://blindsight.com/textdetective, 2016.Google ScholarGoogle Scholar

Index Terms

  1. BCE-Arabic-v1 dataset: Towards interpreting Arabic document images for people with visual impairments

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      PETRA '16: Proceedings of the 9th ACM International Conference on PErvasive Technologies Related to Assistive Environments
      June 2016
      455 pages
      ISBN:9781450343374
      DOI:10.1145/2910674

      Copyright © 2016 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 29 June 2016

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader