research-article

VizWiz: nearly real-time answers to visual questions

Authors:
Jeffrey P. Bigham

University of Rochester, Rochester, NY, USA

University of Rochester, Rochester, NY, USA
View Profile

,
Chandrika Jayant

University of Washington, Seattle, WA, USA

University of Washington, Seattle, WA, USA
View Profile

,
Hanjie Ji

University of Rochester, Rochester, NY, USA

University of Rochester, Rochester, NY, USA
View Profile

,
Greg Little

Massachusettes Institute of Technology, Cambridge, MA, USA

Massachusettes Institute of Technology, Cambridge, MA, USA
View Profile

,
Andrew Miller

University of Central Florida, Orlando, FL, USA

University of Central Florida, Orlando, FL, USA
View Profile

,
Robert C. Miller

Massachusettes Institute of Technology, Cambridge, USA

Massachusettes Institute of Technology, Cambridge, USA
View Profile

,
Robin Miller

University of Rochester, Rochester, NY, USA

University of Rochester, Rochester, NY, USA
View Profile

,
Aubrey Tatarowicz

Massachusettes Institute of Technology, Cambridge, USA

Massachusettes Institute of Technology, Cambridge, USA
View Profile

,
Brandyn White

University of Maryland, College Park, MD, USA

University of Maryland, College Park, MD, USA
View Profile

,
Samual White

University of Rochester, Rochester, NY, USA

University of Rochester, Rochester, NY, USA
View Profile

,
Tom Yeh

University of Maryland, College Park, MD, USA

University of Maryland, College Park, MD, USA
View Profile

UIST '10: Proceedings of the 23nd annual ACM symposium on User interface software and technologyOctober 2010Pages 333–342https://doi.org/10.1145/1866029.1866080

Published:03 October 2010Publication History

UIST '10: Proceedings of the 23nd annual ACM symposium on User interface software and technology

Pages 333–342

ABSTRACT

The lack of access to visual information like text labels, icons, and colors can cause frustration and decrease independence for blind people. Current access technology uses automatic approaches to address some problems in this space, but the technology is error-prone, limited in scope, and quite expensive. In this paper, we introduce VizWiz, a talking application for mobile phones that offers a new alternative to answering visual questions in nearly real-time - asking multiple people on the web. To support answering questions quickly, we introduce a general approach for intelligently recruiting human workers in advance called quikTurkit so that workers are available when new questions arrive. A field deployment with 11 blind participants illustrates that blind people can effectively use VizWiz to cheaply answer questions in their everyday lives, highlighting issues that automatic approaches will need to address to be useful. Finally, we illustrate the potential of using VizWiz as part of the participatory design of advanced tools by using it to build and evaluate VizWiz::LocateIt, an interactive mobile tool that helps blind people solve general visual search problems.

References

}}Amazon Mechanical Turk. http://www.mturk.com/. 2010.Google Scholar
}}Amazon Remembers. http://www.amazon.com/gp/. 2010.Google Scholar
}}Bay, H., A. Ess, T. Tuytelaars, and L. Van Gool. Surf: Speeded up robust features. Proc. of CVIU 2008, v. 110, 346--359, 2008. Google ScholarDigital Library
}}Blind with camera: Changing lives with photography. http://blindwithcamera.org/. 2009.Google Scholar
}}Chacha. http://www.chacha.com/. 2010.Google Scholar
}}Eyes-free. http://code.google.com/p/eyes-free/. 2010.Google Scholar
}}Gifford, S., J. Knox, J. James, and A. Prakash. Introduction to the talking points project. Proc. of ASSETS 2008, 271--272, 2008. Google ScholarDigital Library
}}Google Goggles, 2010. http://www.google.com/mobile/goggles/.Google Scholar
}}Hong, D., S. Kimmel, R. Boehling, N. Camoriano, W. Cardwell, G. Jannaman, A. Purcell, D. Ross, an, E. Russel. Development of a semi-autonomous vehicle operable by the visually-impaired. IEEE Intl. Conf. on Multisensor Fusion and Integration for Intelligent Systems, 539--544, 2008.Google ScholarCross Ref
}}Hsueh, P., P. Melville, and V. Sindhwani. Data quality from crowdsourcing: a study of annotation selection criteria. Proc. of the HLT 2009 Workshop on Active Learning for NLP, 27--35, 2009. Google ScholarDigital Library
}}Intel reader. http://www.intel.com/healthcare/reader/. 2009.Google Scholar
}}Kane, S. K., J. P. Bigham, and J. O. Wobbrock. Slide rule: making mobile touch screens accessible to blind people using multi-touch interaction techniques. ASSETS 2008, 73--80, 2008. Google ScholarDigital Library
}}Kane, S. K., C. Jayant, J. O. Wobbrock, and R. E. Ladner. Freedom to roam: a study of mobile device adoption and accessibility for people with visual and motor disabilities. ASSETS 2009, 115--122, 2009. Google ScholarDigital Library
}}KGB, 2010. http://www.kgb.com.Google Scholar
}}Kittur A., E. H. Chi, and B. Suh. Crowdsourcing user studies with mechanical turk. In Proc. of the SIGCHI Conf. on Human Factors in Computing Systems (CHI 2008), pages 453--456, 2008. Google ScholarDigital Library
}}16. kNFB reader. knfb Reading Technology, Inc., 2008. http://www.knfbreader.com/.Google Scholar
}}Knocking live video. ustream, 2010. http://knockinglive.com/.Google Scholar
}}Ko, J., and C. Kim. Low cost blur image detection and estimation for mobile devices. ICACT 2009, 1605--1610, 2009. Google ScholarDigital Library
}}Little, G., L. Chilton, M. Goldman, and R. C. Miller. TurKit: Human Computation Algorithms on Mechanical Turk. UIST 2010, 2010. Google ScholarDigital Library
}}Liu, X. A camera phone based currency reader for the visually impaired. ASSETS 2008, 305--306, 2008. Google ScholarDigital Library
}}Looktel, 2010. http://www.looktel.com.Google Scholar
}}Matthews, T., S. Carter, C. Pai, J. Fong, and J. Mankoff. Scribe4me: Evaluating a mobile sound transcription tool for the deaf. UbiComp 2006, 159--176, 2006. Google ScholarDigital Library
}}Matthews, T., J. Fong, F. W.-L. Ho-Ching, and J. Mankoff. Evaluating visualizations of non-speech sounds for the deaf. Behavior and Information Technology, 25(4):333--351, 2006.Google ScholarCross Ref
}}Miniguide us. http://www.gdp-research.com.au/minig_4.htm/.Google Scholar
}}Mobile speak screen readers. Code Factory, 2008. http://www.codefactory.es/en/products.asp?id=16.Google Scholar
}}Ringel-Morris, M., J. Teevan, and K. Panovich. What do people ask their social networks, and why? a survey study of status message q&a behavior. CHI 2010, 1739--1748, 2010. Google ScholarDigital Library
}}Power, M. R., Power, D., and Horstmanshof, L. Deaf people communicating via sms, tty, relay service, fax, and computers in australia. Journal of Deaf Studies and Deaf Education, v. 12, i. 1, 2006.Google Scholar
}}Rangin, H. B. Anatomy of a large-scale social search engine. WWW 2010), 431--440, 2010. Google ScholarDigital Library
}}Solona, 2010. http://www.solona.net/.Google Scholar
}}Sorokin, A., and D. Forsyth. Utility data annotation with amazon mechanical turk. CVPRW 2008, 1--8, 2008.Google ScholarCross Ref
}}Takagi, H., S. Kawanaka, M. Kobayashi, T. Itoh, and C. Asakawa. Social accessibility: achieving accessibility through collaborative metadata authoring. ASSETS 2008, 193--200, 2008. Google ScholarDigital Library
}}Talking signs. http://www.talkingsigns.com/, 2008.Google Scholar
}}Testscout- your mobile reader, 2010. http://www.textscout.eu/en/.Google Scholar
}}Lanigan, P., A. M. Paulos, A. W. Williams, and P. Narasimhan. Trinetra: Assistive Technologies for the Blind. Carnegie Mellon University, CyLab, 2006.Google Scholar
}}UStream. ustream, 2010. http://www.ustream.tv/.Google Scholar
}}Voiceover: Macintosh OS X, 2007. http://www.apple.com/accessibility/voiceover/.Google Scholar
}}voice for android, 2010. www.seeingwithsound.com/android.htm.Google Scholar
}}von Ahn, L., and L. Dabbish. Labeling images with a computer game. CHI 2004, 319--326, 2004. Google ScholarDigital Library
}}Yeh, T., J. J. Lee, and T. Darrell. Photo-based question answering. MM 2008, 389--398, 2008. Google ScholarDigital Library

Index Terms

VizWiz: nearly real-time answers to visual questions
1. Human-centered computing
  1. Human computer interaction (HCI)

Recommendations

Visual challenges in the everyday lives of blind people
CHI '13: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

The challenges faced by blind people in their everyday lives are not well understood. In this paper, we report on the findings of a large-scale study of the visual questions that blind people would like to have answered. As part of this year-long study, ...
Read More
VizWiz: nearly real-time answers to visual questions
W4A '10: Proceedings of the 2010 International Cross Disciplinary Conference on Web Accessibility (W4A)

Visual information pervades our environment. Vision is used to decide everything from what we want to eat at a restaurant and which bus route to take to whether our clothes match and how long until the milk expires. Individually, the inability to ...
Read More
Crowdsourcing subjective fashion advice using VizWiz: challenges and opportunities
ASSETS '12: Proceedings of the 14th international ACM SIGACCESS conference on Computers and accessibility

Fashion is a language. How we dress signals to others who we are and how we want to be perceived. However, this language is primarily visual, making it inaccessible to people with vision impairments. Someone who is low-vision or completely blind cannot ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
UIST '10: Proceedings of the 23nd annual ACM symposium on User interface software and technology
October 2010
476 pages
ISBN:9781450302715
DOI:10.1145/1866029
General Chair:
Ken Perlin
New York University
,
Program Chairs:
Mary Czerwinski
Microsoft Research
,
Rob Miller
MIT CSAIL
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 October 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
blind users
non-visual interfaces
real-time human computation
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate842of3,967submissions,21%
Upcoming Conference
UIST '24

Sponsor:

sigchi

sigchi

UIST '24: The 37th Annual ACM Symposium on User Interface Software and Technology

October 13 - 16, 2024

Pittsburgh , PA , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 458
  Total Citations
  View Citations
- 5,027
  Total Downloads
- Downloads (Last 12 months)494
- Downloads (Last 6 weeks)76
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

VizWiz: nearly real-time answers to visual questions

UIST '10: Proceedings of the 23nd annual ACM symposium on User interface software and technology

ABSTRACT

References

Cited By

Index Terms

Recommendations

Visual challenges in the everyday lives of blind people

VizWiz: nearly real-time answers to visual questions

Crowdsourcing subjective fashion advice using VizWiz: challenges and opportunities