Abstract
Research regarding multimodal interaction led to a multitude of proposals for suitable software architectures. With all architectures describing multimodal systems differently, interoperability is severely hindered. The W3C MMI architecture is a proposed recommendation for a common architecture. In this article, we describe our experiences integrating JVoiceXML into the W3C MMI architecture and identify general limitations with regard to the available design space.
Similar content being viewed by others
Notes
It is unfortunate and confusing that the W3C MMI framework describes concepts comparable in granularity to what is called architectures in related work, while the actual W3C MMI architecture describes only a subset.
A video of the scenario is available at http://www.youtube.com/watch?v=edXjU5ZVVnM.
References
Aitenbichler E, Kangasharju J, Mühlhäuser M (2007) MundoCore: A Light-weight Infrastructure for Pervasive Computing. Pervasive Mobile Comput. 332–361. doi:10.1016/j.pmcj.2007.04.002
Auburn R, Baggia P, Scott M (2011) Voice Browser Call Control: CCXML Version 1.0, W3C Recommendation. http://www.w3.org/TR/2011/REC-ccxml-20110705/
Baggia P, Burnett DC, Carter J, Dahl DA, McCobb G, Raggett D (2009) EMMA: extensible multiModal annotation markup language, W3C Recommendation. http://www.w3.org/TR/2009/REC-emma-20090210/
Bailly G (2001) Audiovisual speech synthesis. Int J Speech Technol 6:6–331
Barnett J, Akolkar R, Auburn R, Bodell M, Burnett DC, Carter J, McGlashan S, Lager T, Helbing M, Hosn R, Raman T, Reifenrath K, Rosenthal N (2012) State chart XML (SCXML): State machine notation for control abstraction. W3C working draft, W3C. http://www.w3.org/TR/2012/WD-scxml-20120216/.
Barnett J, Bodell M, Dahl D, Kliche I, Larson J, Porter B, Raggett D, Raman T, Rodriguez BH, Selvaraj M, Tumuluri R, Wahbe A, Wiechno P, Yudkowsky M (2012) Multimodal Architecture and Interfaces, W3C Proposed Recommendation. http://www.w3.org/TR/2012/PR-mmi-arch-20120814/
Bolt RA (1980) ”put-that-there”: Voice and gesture at the graphics interface. In: Proceedings of the 7th annual conference on Computer graphics and interactive techniques, SIGGRAPH ’80, ACM, New York, pp 262–270
Bondell M, Dahl D, Kliche I, Larson J, Porter B, Raggett D, Raman T, Rodriguez BH, Selvari M, Tumuluri R, Wahbe A, Wiechno P, Yudkowsky M (2012) Multimodal Architecture and Interfaces. W3C proposed recommendation, W3C http://www.w3.org/TR/2011/PR-mmi-arch-20120814/
Bulterman D, Jansen J, Cesar P, Mullender S, Hyche E, DeMeglio M, Quint J, Kawamura H, Weck D, Paeda XG, Melendi D, Cruz-Lara S, Hanclik M, Zucker DF, Michel T (2008) Synchronized Multimedia Integration Language (SMIL 3.0), W3C Recommendation. http://www.w3.org/TR/2008/REC-SMIL3-20081201/
Burnett DC, Walker MR, Hunt A (2004) Speech synthesis markup language (SSML) version 1.0, W3C Recommendation. http://www.w3.org/TR/2004/REC-speech-synthesis-20040907/
Chatty S (1994) Extending a graphical toolkit for two-handed interaction. In: ACM UIST 94 Symposium on User Interface Software and Technology, ACM Press, New York, pp 195–204
Cisco Systems Inc., Comverse Inc., Intel Corporation, Microsoft Corporation, Philips Electronics N.V., SpeechWorks International Inc. (2002) SALT - Speech Application Language Tags (SALT) 1.0 Specification. Specification, SALT Forum. http://www.wfmc.org/standards/docs/TC-1025_xpdl_2_2005-10-03.pdf
Courgeon M, Jacquemin C, Martin J (2008) Marc: a multimodal affective and reactive character. In: Proceedings of the International Workshop on Affective-Aware Virtual Agents and Social Robots, ACM
Coutaz J (1987) PAC: An object-oriented model for dialog design. In: Proceedings of INTERACT 87: The IFIP Conference on, Human Computer Interaction, pp 431–436
Dahlstrm E, Dengler P, Grasso A, Lilley C, McCormack C, Schepers D, Watt J (2011) Scalable vector graphics (SVG) 1.1, 2nd edn. W3C Recommendation. http://www.w3.org/TR/2011/REC-SVG11-20110816/
Dumas B (2009) Multimodal interfaces: a survey of principles, models and frameworks. Human Mach Interact 1–25. http://www.springerlink.com/index/65J39M5P56341N49.pdf
Gabriel R, Sandsjö J, Shahrokni A, Fjeld M (2008) Bounceslider: actuated sliders for music performance and composition. In: Proceedings of the 2nd international conference on Tangible and embedded interaction, TEI ’08, ACM, New York, pp 127–130
Harel D, Politi M (1998) Modeling reactive systems with statecharts: the statemate approach. McGraw-Hill, Inc., New York
Kasten O, Miche M, Schreiber D, Hartmann M, Hadjakos A, Hugeus P, Uren V, Dadzie AS, Kantorovitch J, Vildjiounaite E, Ilkka N, Mascolo J, Luitjens S, Nikolov A (2012) Smart products-D12.1.3: rolling report on use cases and trials. http://www.smartproducts-project.eu/mainpage/publications
Katsurada K, Nakamura Y, Yamada H, Nitta T (2003) Xisl: a language for describing multimodal interaction scenarios. In: Proceedings of the 5th international conference on Multimodal interfaces, ICMI ’03, ACM, New York, pp 281–284. http://doi.acm.org/10.1145/958432.958483
Kawamoto SI, Shimodaira H, Nitta T, Nishimoto T, Nakamura S, Itou K, Morishima S, Yotsukura T, Kai A, Lee A et al (2003) Galatea: open-source software for developing anthropomorphic spoken dialog agents. LifeLike Characters Tools Affective Functions and Applications, pp 1–25
Lalanne D, Nigay L, Palanque P, Robinson P, Vanderdonckt J, Ladry JF (2009) Fusion engines for multimodal input: a survey. ICMI-MLMI ’09. ACM, New York
Larson JA, Raman T, Raggett D, Bodell M, Johnston M, Kumar S, Potter S, Waters K (2003) Multimodal interaction framework, W3C Note. http://www.w3.org/TR/2003/NOTE-mmi-framework-20030506/
Martin, DL, Cheyer AJ, Moran DB (1999) The open agent architecture: A framework for building distributed software systems. Appl Artif Intell 13(1–2):91–128
Maximilien EM, Campos P (2012) Facts, trends and challenges in modern software development. Int J Agile Extrem Softw Dev 1(1/2012):1–5
Maybury MT, Wahlster W (eds) (1998) Readings in intelligent user interfaces. Morgan Kaufmann Publishers Inc., San Francisco
McCarron S, Ishikawa M, Altheim M (2011) XHTML 1.1 - Module-based XHTML, 2nd edn, W3C Recommendation. http://www.w3.org/TR/2010/REC-xhtml11-20101123/
McGlashan S, Burnett DC, Akolkar R, Auburn R, Baggia P, Barnett J, Bodell M, Carter J, Oshry M, Rehor K, Yang X, Young M, Hosn R (2010) Voice extensible markup language (VoiceXML) Version 3.0, W3C Working Draft. http://www.w3.org/TR/voicexml30/
Micrososft (2012) Kinect. http://www.xbox.com/en-us/kinect/. Accessed 26 Aug 2012
Moran DB, Cheyer AJ, Julia LE, Martin DL (1997) Multimodal user interfaces in the open agent architecture. In: Proceedings of the 1997 International Conference on Intelligent User, Interfaces, IUI97, pp 61–68
Nigay L, Coutaz J (1993) A design space for multimodal systems: concurrent processing and data fusion. In: Proceedings of the INTERACT ’93 and CHI ’93 conference on Human factors in computing systems, CHI ’93, ACM, New York, pp 172–178
Norman DA (2002) The design of everyday things, reprint paperback edn. Basic Books, New York
Oshry M, Auburn R, Baggia P, Bodell M, Burke D, Burnett DC, Candell E, Carter J, McGlashan S, Lee A, Porter B, Rehor K (2007) Voice Extensible Markup Language (VoiceXML) Version 2.1, W3C Recommendation. http://www.w3.org/TR/voicexml21/
Oviatt S (2003) Advances in robust multimodal interface design. IEEE Comput Graph Appl 23(5):62–68. doi:10.1109/MCG.2003.1231179
Oviatt S (2003) Multimodal interfaces. In: Jacko JA, Sears A (eds) The human-computer interaction handbook: fundamentals, evolving technologies and emerging applications, chap. multimodal interfaces, L. Erlbaum Associates Inc., Hillsdale, pp 286–304. http://portal.acm.org/citation.cfm?id=772072.772093
Paternó F, Santoro C, Spano LD (2009) Maria:a universal, declarative, multiple abstraction-level language for service-oriented applications in ubiquitous environments. ACM Trans. Comput.-Hum. Interact 16(4): 19:1–19:30
Phanouriou C (2002) Uiml: a device-independent user interface markup language. Ph.D. thesis
Raisamo R (1999) Multimodal human-computer interaction : a constructive and empirical study. Ph.D. thesis, Tampere
Sun Microsystems (1988) RPC: remote procedure call protocol specification: Version 2. RFC 1057 (Informational). http://www.ietf.org/rfc/rfc1057.txt
Turunen M, Hakulinen J, Räihä KJ, Salonen EP, Kainulainen A, Prusi P (2005) Jaspis an architecture and applications for speech-based accessibility systems. IBM Syst J 44(3):485–504
Vilhjálmsson H, Cantelmo N, Cassell JE, Chafai N, Kipp M, Kopp S, Mancini M, Marsella S, Marshall A, Pelachaud C et al (2007) The behavior markup language: Recent developments and challenges. In: Intelligent Virtual Agents, Springer, Berlin, pp 99–111
Workflow Management Coalition (2005) WfMC: Process Definition Language: XPDL 2.0. Specification TC-1025, Workflow Management Coalition. http://www.wfmc.org/standards/docs/TC-1025_xpdl_2_2005-10-03.pdf
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Schnelle-Walka, D., Radomski, S. & Mühlhäuser, M. JVoiceXML as a modality component in the W3C multimodal architecture. J Multimodal User Interfaces 7, 183–194 (2013). https://doi.org/10.1007/s12193-013-0119-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12193-013-0119-y