Abstract
As dialog systems become increasingly multimodal and distributed in nature with advances in technology and computing power, they become that much more complicated to design and implement. However, open industry and W3C standards provide a silver lining here, allowing the distributed design of different components that are nonetheless compliant with each other. In this chapter we examine how an open-source, modular, multimodal dialog system—HALEF—can be seamlessly assembled, much like a jigsaw puzzle, by putting together multiple distributed components that are compliant with the W3C recommendations or other open industry standards. We highlight the specific standards that HALEF currently uses along with a perspective on other useful standards that could be included in the future. HALEF has an open codebase to encourage progressive community contribution and a common standard testbed for multimodal dialog system development and benchmarking.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
Find a comprehensive list at https://www.w3.org/Voice/voice-implementations.html.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
Partially Observable Markov Decision Processes.
- 15.
- 16.
- 17.
- 18.
- 19.
- 20.
- 21.
- 22.
- 23.
- 24.
- 25.
- 26.
- 27.
- 28.
JSGF (see http://www.w3.org/TR/jsgf/) is technically not a W3C standard. It is a member submission and is published as a W3C note.
- 29.
- 30.
- 31.
- 32.
- 33.
- 34.
- 35.
- 36.
References
Baggia, P., Burnett, D., Marchand, R., & Matula, V. (2016, to appear). The role and importance of speech standards. In Multimodal interaction with W3C standards: Towards natural user interfaces to everything. Springer.
Baumann, T., BuĂź, O., & Schlangen, D. (2010). Inprotk in action: Open-source software for building German-speaking incremental spoken dialogue systems. Fachbereich Informatik: Hamburg.
Bohus, D., & Horvitz, E. (2010). Facilitating multiparty dialog with gaze, gesture, and speech. In International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction (ICMI-MLMI'10), November 8–12, 2010, Beijing, China (p. 5). ACM.
Bohus, D., Raux, A., Harris, T., Eskenazi, M., & Rudnicky, A.: Olympus: An open-source framework for conversational spoken language interface research. In Proceedings of the HLT-NAACL, Rochester (2007).
Damnati, G., Béchet, F., & De Mori, R. (2007). Experiments on the France telecom 3000 voice agency corpus: Academic research on an industrial spoken dialog system. In Proceedings of the Workshop on Bridging the Gap: Academic and Industrial Research in Dialog Technologies, NAACL-HLT, Rochester, NY, April 2007 (pp. 48–55). Association for Computational Linguistics.
DeMara, R. F., Gonzalez, A. J., Jones, S., Johnson, A., Hung, V., Leon-Barth, C., et al. (2008). Towards interactive training with an avatar-based human-computer interface. In The Interservice Industry Training, Simulation & Education Conference, ITSEC (December 2008). Citeseer.
Gorostiza, J. F., Barber, R., Khamis, A. M., Pacheco, M., Rivas, R., Corrales, A., et al. (2006). Multimodal human-robot interaction framework for a personal robot. In The 15th IEEE International Symposium on Robot and Human Interactive Communication, 2006. ROMAN 2006 (pp. 39–44). Hatfield, UK: IEEE.
Harel, D., & Politi, M. (1998). Modeling reactive systems with statecharts: The STATEMATE approach. New York: McGraw-Hill, Inc.
Hartholt, A., Traum, D., Marsella, S.C., Shapiro, A., Stratou, G., Leuski, A., et al. (2013). All together now. In Proceedings of the 13th International Conference on Intelligent Virtual Agents, IVA 2013, Edinburgh, UK, August 29–31, 2013 (pp. 368–381). Berlin/Heidelberg: Springer.
Hastie, H. W., Johnston, M., & Ehlen, P. (2002). Context-sensitive help for multimodal dialogue. In Proceedings of the 4th IEEE International Conference on Multimodal Interfaces (p. 93). Washington, DC, USA, IEEE Computer Society.
Johnston, M., Bangalore, S., Vasireddy, G., Stent, A., Ehlen, P., Walker, M., et al. (2002). Match: An architecture for multimodal dialogue systems. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL), Philadelphia, July 2002 (pp. 376–383).
JurÄŤĂÄŤek, F., Dušek, O., Plátek, O., & Ĺ˝ilka, L. (2014). Alex: A statistical dialogue systems framework. In Proceedings of the 17th International Conference on Text, Speech and Dialogue, TSD 2014, Brno, Czech Republic, September 8–12, 2014 (pp. 587–594). Switzerland: Springer.
Lamere, P., Kwok, P., Gouvea, E., Raj, B., Singh, R., Walker, W., et al. (2003). The CMU SPHINX-4 speech recognition system. In Proceedings of the ICASSP’03, Hong Kong, China.
Lison, P. (2013). Structured probabilistic modelling for dialogue management. Ph.D. thesis, University of Oslo.
LĂłpez-CĂłzar, R., Callejas, Z., Griol, D., & Quesada, J. F. (2015). Review of spoken dialogue systems. Loquens, 1(2), e012.
Minessale, A., & Schreiber, D. (2012). FreeSWITCH Cookbook. Packt Publishing Ltd.
Neßelrath, R., & Alexandersson, J. (2009). A 3D gesture recognition system for multimodal dialog systems. In 6th IJCAI Workshop on Knowledge and Reasoning in Practical Dialogue Systems (pp. 46–51).
Pieraccini, R., & Huerta, J. (2005). Where do we go from here? Research and commercial spoken dialog systems. In 6th SIGdial Workshop on Discourse and Dialogue.
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., et al. (2011). The Kaldi speech recognition toolkit. In Proceedings of the ASRU, HI, USA.
Prylipko, D., Schnelle-Walka, D., Lord, S., & Wendemuth, A. (2011). Zanzibar openIVR: An open-source framework for development of spoken dialog systems. In Proceedings of the TSD, Pilsen, Czech Republic.
Ramanarayanan, V., Suendermann-Oeft, D., Ivanov, A., & Evanini, K. (2015). A distributed cloud-based dialog system for conversational application development. In 16th Annual SIGdial Meeting on Discourse and Dialogue (SIGDIAL 2015), Prague, Czech Republic.
Schnelle-Walka, D., Radomski, S., & Mühlhäuser, M. (2013). JVoiceXML as a modality component in the W3C multimodal architecture. Journal on Multimodal User Interfaces 7(3), 183–194.
Schröder, M., & Trouvain, J. (2003). The German text-to-speech synthesis system MARY: A tool for research, development and teaching. International Journal of Speech Technology, 6 (4), 365–377.
Suendermann-Oeft, D., Ramanarayanan, V., Teckenbrock, M., Neutatz, F., & Schmidt, D. (2015). HALEF: An open-source standard-compliant telephony-based modular spoken dialog system—A review and an outlook. In Proceedings of the IWSDS Workshop 2015, Busan, South Korea.
Swartout, W., Artstein, R., Forbell, E., Foutz, S., Lane, H.C., Lange, B., et al. (2013). Virtual humans for learning. AI Magazine, 34(4), 13–30.
Swartout, W., Traum, D., Artstein, R., Noren, D., Debevec, P., Bronnenkant, K., et al. (2010). Ada and grace: Toward realistic and engaging virtual museum guides. In Proceedings of the 10th International Conference on Intelligent Virtual Agents, IVA 2010, Philadelphia, PA, USA, September 20–22, 2010. Lecture Notes in Computer Science (pp. 286–300). Berlin/Heidelberg: Springer.
Taylor, P., Black, A., & Caley, R. (1998). The architecture of the festival speech synthesis system. In Proceedings of the ESCA Workshop on Speech Synthesis, Jenolan Caves.
van Meggelen, J., Smith, J., & Madsen, L. (2009). Asterisk: The future of telephony. Sebastopol: O’Reilly.
Yu, Z., Bohus, D., & Horvitz, E. (2015). Incremental coordination: Attention-centric speech production in a physically situated conversational agent. In 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue (p. 402).
Yu, Z., Ramanarayanan, V., Mundkowsky, R., Lange, P., Ivanov, A., Black, A.W., et al. (2016). Multimodal HALEF: An open-source modular web-based multimodal dialog framework. In Proceedings of the IWSDS Workshop 2016, Saariselka, Finland.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Ramanarayanan, V. et al. (2017). Assembling the Jigsaw: How Multiple Open Standards Are Synergistically Combined in the HALEF Multimodal Dialog System. In: Dahl, D. (eds) Multimodal Interaction with W3C Standards. Springer, Cham. https://doi.org/10.1007/978-3-319-42816-1_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-42816-1_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42814-7
Online ISBN: 978-3-319-42816-1
eBook Packages: EngineeringEngineering (R0)