ABSTRACT
Finding our way out at a large university campus is a problem. We developed VTQuest, http://sunfish.cs.vt.edu/VTQuestV, as a web-based software system to solve this problem for the campus of Virginia Tech (http://www.vt.edu/). VTQuest enables (a) multimodal interaction with voice, mouse, and keyboard, (b) browsing the campus map, (c) locating a building by name, abbreviation, category, or within a distance on the campus map, (d) locating a room on the floor plan of a building, and (e) obtaining walking directions from one building to another. VTQuest provides these capabilities for 103 buildings with floor plans for most of the buildings. VTQuest is engineered based on Java 2 Platform, Enterprise Edition (J2EE) using Scalable Vector Graphics (SVG) and Speech Application Language Tags (SALT). SVG enables zooming into the maps without losing image quality. The voice interface offers a variety of features including an extensive grammar and out-of-turn interaction.
- Barker, J., Cooke, M., and Ellis, D. Decoding speech in the presence of other sound sources. In Proceedings of the 6th International Conference on Spoken Language Processing, Beijing, China, October 16-20, 2000.Google Scholar
- Chino, T., Kazuhiro, F., and Suzuki, K. Gaze To Talk: a nonverbal interface with meta-communication facility. In Proceedings of the Symposium on Eye Tracking Research and Applications, (Palm Beach Gardens, FL, November 6-8, 2000). ACM Press, New York, NY, 2000, 111. Google ScholarDigital Library
- Cohen, P. R., and Oviatt, S. L. The role of voice input for human-machine communication. In Proceedings of the National Academy of Sciences 92, 22, 9921--9927, 1995.Google ScholarCross Ref
- Coin, E. Speech is NOT dialog. Speech Technology Magazine 7, 3 (May/June 2002) http://www.speechtechmag.com/issues/7_3/cover/744-1.htmlGoogle Scholar
- Green, P., Barker, J., Cooke, M., and Josifovski, L. Handling missing and unreliable information in speech recognition. In Proceedings of the International Conference on Artificial Intelligence and Statistics, Key West, Florida, 2001Google Scholar
- Hauptmann, A. G. Speech recognition in the informedia digital video library: uses and limitations. Retrieved 2005. http://zero.inf.cs.cmu.edu/alex/ictai95.pdfGoogle Scholar
- Hocek, A. VoiceXML and Next-Generation Voice Services. In Proceedings of the XML Conference and Exposition, Baltimore, MD, 2002.Google Scholar
- Microsoft. How to use speech recognition in Windows XP. Retrieved 2005. http://support.microsoft.com/default.aspx?scid=kb;en-us;306901&sd=tech#4Google Scholar
- Microsoft. SALT programmer's reference. Retrieved 2005. http://msdn.microsoft.com/library/default.asp?url=/library/en-us/sasdk_salt/html/ST_Programmers_Reference.aspGoogle Scholar
- Moraes, I. Architectural tools for enabling speech applications. XML Journal, SYS-CON Publications, Montvale, NJ, 2004.Google Scholar
- Potter, S., and Larson, J. A. VoiceXML and SALT: How are they different, and why. Speech Technology Magazine 7, 3 (May/June 2002) http://www.speechtechmag.com/issues/7_3/cover/742-1.htmlGoogle Scholar
- SaltForum. Speech Application Language Tags (SALT) forum. Retrieved 2005. http://www.saltforum.org/Google Scholar
- Stifelman, L. J., Arons, B., Schmandt, C., and Hulteen, E. A. VoiceNotes: a speech interface for a hand-held voice notetaker. In Proceedings of the ACM CHI 93 Human Factors in Computing Systems Conference (Amsterdam, The Netherlands, April 24-29, 1993). ACM Press, New York, NY, 1993, 179--186. Google ScholarDigital Library
- Sun Microsystems. Java 2 Platform, Enterprise Edition (J2EE). Retrieved 2005. http://java.sun.com/j2ee/Google Scholar
- Wilson, L. X+V is a markup language, not a Roman math expression. 19 August 2003. http://www-128.ibm.com/developerworks/library/wi-xvlanguage/Google Scholar
- World Wide Web Consortium. Scalable Vector Graphics (SVG) 1.1 Specification. Retrieved 2005. http://www.w3.org/TR/SVG/Google Scholar
Index Terms
- VTQuest: a voice-based multimodal web-based software system for maps and directions
Recommendations
Exploring Effects of Conversational Fillers on User Perception of Conversational Agents
CHI EA '19: Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing SystemsThrough technological advancements in various areas of our lives, Conversational Agents progressed in their human-likeness. In the field of HCI, however, the use of conversational fillers (e.g., "um," "uh," etc.) by Conversational Agents have not been ...
The Effect of Multimodal Feedback Presented via a Touch Screen on the Performance of Older Adults
HAID '09: Proceedings of the 4th International Conference on Haptic and Audio Interaction DesignMany IT devices --- such as mobile phones and PDAs --- have recently started to incorporate easy-to-use touch screens. There is an associated need for more effective user interfaces for touch screen devices that have a small screen area. One attempt to ...
WozARd: a wizard of oz tool for mobile AR
MobileHCI '13: Proceedings of the 15th international conference on Human-computer interaction with mobile devices and servicesWizard of Oz methodology is useful when conducting user studies of a system that is in early development. It is essential to be able to simulate part of the system and to collect feedback from potential users. Using a human to act as the system is one ...
Comments