Assembling the Jigsaw: How Multiple Open Standards Are Synergistically Combined in the HALEF Multimodal Dialog System

Ramanarayanan, Vikram; Suendermann-Oeft, David; Lange, Patrick; Mundkowsky, Robert; Ivanov, Alexei V.; Yu, Zhou; Qian, Yao; Evanini, Keelan

doi:10.1007/978-3-319-42816-1_13

Vikram Ramanarayanan²,
David Suendermann-Oeft²,
Patrick Lange²,
Robert Mundkowsky³,
Alexei V. Ivanov²,
Zhou Yu⁴,
Yao Qian² &
…
Keelan Evanini³

796 Accesses
10 Citations

Abstract

As dialog systems become increasingly multimodal and distributed in nature with advances in technology and computing power, they become that much more complicated to design and implement. However, open industry and W3C standards provide a silver lining here, allowing the distributed design of different components that are nonetheless compliant with each other. In this chapter we examine how an open-source, modular, multimodal dialog system—HALEF—can be seamlessly assembled, much like a jigsaw puzzle, by putting together multiple distributed components that are compliant with the W3C recommendations or other open industry standards. We highlight the specific standards that HALEF currently uses along with a perspective on other useful standards that could be included in the future. HALEF has an open codebase to encourage progressive community contribution and a common standard testbed for multimodal dialog system development and benchmarking.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://www.w3.org/TR/2000/NOTE-voicexml-20000505.
2.
https://voxeo.com/prophecy/.
3.
https://studio.tellme.com/.
4.
http://www.plumvoice.com/products/plum-d-e-v/.
5.
http://www.cisco.com/c/en/us/products/customer-collaboration/unified-customer-voice-portal.
6.
https://support.avaya.com/products/P0979/voice-portal.
7.
Find a comprehensive list at https://www.w3.org/Voice/voice-implementations.html.
8.
https://github.com/JVoiceXML/JVoiceXML.
9.
https://github.com/UFAL-DSG/alex.
10.
https://bitbucket.org/inpro/inprotk.
11.
http://www.opendial-toolkit.net.
12.
http://www.metalogue.eu.
13.
http://www.iristk.net.
14.
Partially Observable Markov Decision Processes.
15.
http://tomcat.apache.org/.
16.
https://www.mysql.com/.
17.
https://www.w3.org/TR/mediacapture-streams.
18.
http://www.w3.org/TR/webrtc/.
19.
http://www.ipkall.com/.
20.
http://peers.sourceforge.net/.
21.
http://www.3cx.com/voip/sip-phone/.
22.
https://www.doubango.org/sipml5/.
23.
http://www.jssip.net/.
24.
http://webrtc2sip.org/.
25.
http://www.w3.org/TR/voicexml20/.
26.
https://github.com/OpenMethods/OpenVXML.
27.
www.eclipse.org.
28.
JSGF (see http://www.w3.org/TR/jsgf/) is technically not a W3C standard. It is a member submission and is published as a W3C note.
29.
http://www.w3.org/TR/speech-grammar/.
30.
See http://www.w3.org/TR/webrtc/ and https://webrtc.org/.
31.
http://www.w3.org/TR/emma.
32.
https://www.w3.org/TR/emotionml/.
33.
https://www.w3.org/TR/scxml/.
34.
https://www.w3.org/TR/ccxml/.
35.
https://www.w3.org/TR/speech-synthesis/.
36.
http://halef.org.

References

Baggia, P., Burnett, D., Marchand, R., & Matula, V. (2016, to appear). The role and importance of speech standards. In Multimodal interaction with W3C standards: Towards natural user interfaces to everything. Springer.
Google Scholar
Baumann, T., Buß, O., & Schlangen, D. (2010). Inprotk in action: Open-source software for building German-speaking incremental spoken dialogue systems. Fachbereich Informatik: Hamburg.
Google Scholar
Bohus, D., & Horvitz, E. (2010). Facilitating multiparty dialog with gaze, gesture, and speech. In International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction (ICMI-MLMI'10), November 8–12, 2010, Beijing, China (p. 5). ACM.
Google Scholar
Bohus, D., Raux, A., Harris, T., Eskenazi, M., & Rudnicky, A.: Olympus: An open-source framework for conversational spoken language interface research. In Proceedings of the HLT-NAACL, Rochester (2007).
Google Scholar
Damnati, G., Béchet, F., & De Mori, R. (2007). Experiments on the France telecom 3000 voice agency corpus: Academic research on an industrial spoken dialog system. In Proceedings of the Workshop on Bridging the Gap: Academic and Industrial Research in Dialog Technologies, NAACL-HLT, Rochester, NY, April 2007 (pp. 48–55). Association for Computational Linguistics.
Google Scholar
DeMara, R. F., Gonzalez, A. J., Jones, S., Johnson, A., Hung, V., Leon-Barth, C., et al. (2008). Towards interactive training with an avatar-based human-computer interface. In The Interservice Industry Training, Simulation & Education Conference, ITSEC (December 2008). Citeseer.
Google Scholar
Gorostiza, J. F., Barber, R., Khamis, A. M., Pacheco, M., Rivas, R., Corrales, A., et al. (2006). Multimodal human-robot interaction framework for a personal robot. In The 15th IEEE International Symposium on Robot and Human Interactive Communication, 2006. ROMAN 2006 (pp. 39–44). Hatfield, UK: IEEE.
Google Scholar
Harel, D., & Politi, M. (1998). Modeling reactive systems with statecharts: The STATEMATE approach. New York: McGraw-Hill, Inc.
Google Scholar
Hartholt, A., Traum, D., Marsella, S.C., Shapiro, A., Stratou, G., Leuski, A., et al. (2013). All together now. In Proceedings of the 13th International Conference on Intelligent Virtual Agents, IVA 2013, Edinburgh, UK, August 29–31, 2013 (pp. 368–381). Berlin/Heidelberg: Springer.
Google Scholar
Hastie, H. W., Johnston, M., & Ehlen, P. (2002). Context-sensitive help for multimodal dialogue. In Proceedings of the 4th IEEE International Conference on Multimodal Interfaces (p. 93). Washington, DC, USA, IEEE Computer Society.
Google Scholar
Johnston, M., Bangalore, S., Vasireddy, G., Stent, A., Ehlen, P., Walker, M., et al. (2002). Match: An architecture for multimodal dialogue systems. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL), Philadelphia, July 2002 (pp. 376–383).
Google Scholar
Jurčíček, F., Dušek, O., Plátek, O., & Žilka, L. (2014). Alex: A statistical dialogue systems framework. In Proceedings of the 17th International Conference on Text, Speech and Dialogue, TSD 2014, Brno, Czech Republic, September 8–12, 2014 (pp. 587–594). Switzerland: Springer.
Google Scholar
Lamere, P., Kwok, P., Gouvea, E., Raj, B., Singh, R., Walker, W., et al. (2003). The CMU SPHINX-4 speech recognition system. In Proceedings of the ICASSP’03, Hong Kong, China.
Google Scholar
Lison, P. (2013). Structured probabilistic modelling for dialogue management. Ph.D. thesis, University of Oslo.
Google Scholar
López-Cózar, R., Callejas, Z., Griol, D., & Quesada, J. F. (2015). Review of spoken dialogue systems. Loquens, 1(2), e012.
Article Google Scholar
Minessale, A., & Schreiber, D. (2012). FreeSWITCH Cookbook. Packt Publishing Ltd.
Google Scholar
Neßelrath, R., & Alexandersson, J. (2009). A 3D gesture recognition system for multimodal dialog systems. In 6th IJCAI Workshop on Knowledge and Reasoning in Practical Dialogue Systems (pp. 46–51).
Google Scholar
Pieraccini, R., & Huerta, J. (2005). Where do we go from here? Research and commercial spoken dialog systems. In 6th SIGdial Workshop on Discourse and Dialogue.
Google Scholar
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., et al. (2011). The Kaldi speech recognition toolkit. In Proceedings of the ASRU, HI, USA.
Google Scholar
Prylipko, D., Schnelle-Walka, D., Lord, S., & Wendemuth, A. (2011). Zanzibar openIVR: An open-source framework for development of spoken dialog systems. In Proceedings of the TSD, Pilsen, Czech Republic.
Google Scholar
Ramanarayanan, V., Suendermann-Oeft, D., Ivanov, A., & Evanini, K. (2015). A distributed cloud-based dialog system for conversational application development. In 16th Annual SIGdial Meeting on Discourse and Dialogue (SIGDIAL 2015), Prague, Czech Republic.
Google Scholar
Schnelle-Walka, D., Radomski, S., & Mühlhäuser, M. (2013). JVoiceXML as a modality component in the W3C multimodal architecture. Journal on Multimodal User Interfaces 7(3), 183–194.
Google Scholar
Schröder, M., & Trouvain, J. (2003). The German text-to-speech synthesis system MARY: A tool for research, development and teaching. International Journal of Speech Technology, 6 (4), 365–377.
Article Google Scholar
Suendermann-Oeft, D., Ramanarayanan, V., Teckenbrock, M., Neutatz, F., & Schmidt, D. (2015). HALEF: An open-source standard-compliant telephony-based modular spoken dialog system—A review and an outlook. In Proceedings of the IWSDS Workshop 2015, Busan, South Korea.
Google Scholar
Swartout, W., Artstein, R., Forbell, E., Foutz, S., Lane, H.C., Lange, B., et al. (2013). Virtual humans for learning. AI Magazine, 34(4), 13–30.
Google Scholar
Swartout, W., Traum, D., Artstein, R., Noren, D., Debevec, P., Bronnenkant, K., et al. (2010). Ada and grace: Toward realistic and engaging virtual museum guides. In Proceedings of the 10th International Conference on Intelligent Virtual Agents, IVA 2010, Philadelphia, PA, USA, September 20–22, 2010. Lecture Notes in Computer Science (pp. 286–300). Berlin/Heidelberg: Springer.
Google Scholar
Taylor, P., Black, A., & Caley, R. (1998). The architecture of the festival speech synthesis system. In Proceedings of the ESCA Workshop on Speech Synthesis, Jenolan Caves.
Google Scholar
van Meggelen, J., Smith, J., & Madsen, L. (2009). Asterisk: The future of telephony. Sebastopol: O’Reilly.
Google Scholar
Yu, Z., Bohus, D., & Horvitz, E. (2015). Incremental coordination: Attention-centric speech production in a physically situated conversational agent. In 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue (p. 402).
Google Scholar
Yu, Z., Ramanarayanan, V., Mundkowsky, R., Lange, P., Ivanov, A., Black, A.W., et al. (2016). Multimodal HALEF: An open-source modular web-based multimodal dialog framework. In Proceedings of the IWSDS Workshop 2016, Saariselka, Finland.
Google Scholar

Download references

Author information

Authors and Affiliations

Educational Testing Service (ETS) R&D, San Francisco, CA, USA
Vikram Ramanarayanan, David Suendermann-Oeft, Patrick Lange, Alexei V. Ivanov & Yao Qian
Educational Testing Service (ETS) R&D, Princeton, NJ, USA
Robert Mundkowsky & Keelan Evanini
Carnegie Mellon University, Pittsburgh, PA, USA
Zhou Yu

Authors

Vikram Ramanarayanan
View author publications
You can also search for this author in PubMed Google Scholar
David Suendermann-Oeft
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Lange
View author publications
You can also search for this author in PubMed Google Scholar
Robert Mundkowsky
View author publications
You can also search for this author in PubMed Google Scholar
Alexei V. Ivanov
View author publications
You can also search for this author in PubMed Google Scholar
Zhou Yu
View author publications
You can also search for this author in PubMed Google Scholar
Yao Qian
View author publications
You can also search for this author in PubMed Google Scholar
Keelan Evanini
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vikram Ramanarayanan .

Editor information

Editors and Affiliations

Conversational Technologies, Plymouth Meeting, Pennsylvania, USA
Deborah A. Dahl

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Ramanarayanan, V. et al. (2017). Assembling the Jigsaw: How Multiple Open Standards Are Synergistically Combined in the HALEF Multimodal Dialog System. In: Dahl, D. (eds) Multimodal Interaction with W3C Standards. Springer, Cham. https://doi.org/10.1007/978-3-319-42816-1_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-42816-1_13
Published: 18 November 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42814-7
Online ISBN: 978-3-319-42816-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics