Skip to main content

Assembling the Jigsaw: How Multiple Open Standards Are Synergistically Combined in the HALEF Multimodal Dialog System

  • Chapter
  • First Online:
Multimodal Interaction with W3C Standards

Abstract

As dialog systems become increasingly multimodal and distributed in nature with advances in technology and computing power, they become that much more complicated to design and implement. However, open industry and W3C standards provide a silver lining here, allowing the distributed design of different components that are nonetheless compliant with each other. In this chapter we examine how an open-source, modular, multimodal dialog system—HALEF—can be seamlessly assembled, much like a jigsaw puzzle, by putting together multiple distributed components that are compliant with the W3C recommendations or other open industry standards. We highlight the specific standards that HALEF currently uses along with a perspective on other useful standards that could be included in the future. HALEF has an open codebase to encourage progressive community contribution and a common standard testbed for multimodal dialog system development and benchmarking.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.w3.org/TR/2000/NOTE-voicexml-20000505.

  2. 2.

    https://voxeo.com/prophecy/.

  3. 3.

    https://studio.tellme.com/.

  4. 4.

    http://www.plumvoice.com/products/plum-d-e-v/.

  5. 5.

    http://www.cisco.com/c/en/us/products/customer-collaboration/unified-customer-voice-portal.

  6. 6.

    https://support.avaya.com/products/P0979/voice-portal.

  7. 7.

    Find a comprehensive list at https://www.w3.org/Voice/voice-implementations.html.

  8. 8.

    https://github.com/JVoiceXML/JVoiceXML.

  9. 9.

    https://github.com/UFAL-DSG/alex.

  10. 10.

    https://bitbucket.org/inpro/inprotk.

  11. 11.

    http://www.opendial-toolkit.net.

  12. 12.

    http://www.metalogue.eu.

  13. 13.

    http://www.iristk.net.

  14. 14.

    Partially Observable Markov Decision Processes.

  15. 15.

    http://tomcat.apache.org/.

  16. 16.

    https://www.mysql.com/.

  17. 17.

    https://www.w3.org/TR/mediacapture-streams.

  18. 18.

    http://www.w3.org/TR/webrtc/.

  19. 19.

    http://www.ipkall.com/.

  20. 20.

    http://peers.sourceforge.net/.

  21. 21.

    http://www.3cx.com/voip/sip-phone/.

  22. 22.

    https://www.doubango.org/sipml5/.

  23. 23.

    http://www.jssip.net/.

  24. 24.

    http://webrtc2sip.org/.

  25. 25.

    http://www.w3.org/TR/voicexml20/.

  26. 26.

    https://github.com/OpenMethods/OpenVXML.

  27. 27.

    www.eclipse.org.

  28. 28.

    JSGF (see http://www.w3.org/TR/jsgf/) is technically not a W3C standard. It is a member submission and is published as a W3C note.

  29. 29.

    http://www.w3.org/TR/speech-grammar/.

  30. 30.

    See http://www.w3.org/TR/webrtc/ and https://webrtc.org/.

  31. 31.

    http://www.w3.org/TR/emma.

  32. 32.

    https://www.w3.org/TR/emotionml/.

  33. 33.

    https://www.w3.org/TR/scxml/.

  34. 34.

    https://www.w3.org/TR/ccxml/.

  35. 35.

    https://www.w3.org/TR/speech-synthesis/.

  36. 36.

    http://halef.org.

References

  1. Baggia, P., Burnett, D., Marchand, R., & Matula, V. (2016, to appear). The role and importance of speech standards. In Multimodal interaction with W3C standards: Towards natural user interfaces to everything. Springer.

    Google Scholar 

  2. Baumann, T., BuĂź, O., & Schlangen, D. (2010). Inprotk in action: Open-source software for building German-speaking incremental spoken dialogue systems. Fachbereich Informatik: Hamburg.

    Google Scholar 

  3. Bohus, D., & Horvitz, E. (2010). Facilitating multiparty dialog with gaze, gesture, and speech. In International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction (ICMI-MLMI'10), November 8–12, 2010, Beijing, China (p. 5). ACM.

    Google Scholar 

  4. Bohus, D., Raux, A., Harris, T., Eskenazi, M., & Rudnicky, A.: Olympus: An open-source framework for conversational spoken language interface research. In Proceedings of the HLT-NAACL, Rochester (2007).

    Google Scholar 

  5. Damnati, G., Béchet, F., & De Mori, R. (2007). Experiments on the France telecom 3000 voice agency corpus: Academic research on an industrial spoken dialog system. In Proceedings of the Workshop on Bridging the Gap: Academic and Industrial Research in Dialog Technologies, NAACL-HLT, Rochester, NY, April 2007 (pp. 48–55). Association for Computational Linguistics.

    Google Scholar 

  6. DeMara, R. F., Gonzalez, A. J., Jones, S., Johnson, A., Hung, V., Leon-Barth, C., et al. (2008). Towards interactive training with an avatar-based human-computer interface. In The Interservice Industry Training, Simulation & Education Conference, ITSEC (December 2008). Citeseer.

    Google Scholar 

  7. Gorostiza, J. F., Barber, R., Khamis, A. M., Pacheco, M., Rivas, R., Corrales, A., et al. (2006). Multimodal human-robot interaction framework for a personal robot. In The 15th IEEE International Symposium on Robot and Human Interactive Communication, 2006. ROMAN 2006 (pp. 39–44). Hatfield, UK: IEEE.

    Google Scholar 

  8. Harel, D., & Politi, M. (1998). Modeling reactive systems with statecharts: The STATEMATE approach. New York: McGraw-Hill, Inc.

    Google Scholar 

  9. Hartholt, A., Traum, D., Marsella, S.C., Shapiro, A., Stratou, G., Leuski, A., et al. (2013). All together now. In Proceedings of the 13th International Conference on Intelligent Virtual Agents, IVA 2013, Edinburgh, UK, August 29–31, 2013 (pp. 368–381). Berlin/Heidelberg: Springer.

    Google Scholar 

  10. Hastie, H. W., Johnston, M., & Ehlen, P. (2002). Context-sensitive help for multimodal dialogue. In Proceedings of the 4th IEEE International Conference on Multimodal Interfaces (p. 93). Washington, DC, USA, IEEE Computer Society.

    Google Scholar 

  11. Johnston, M., Bangalore, S., Vasireddy, G., Stent, A., Ehlen, P., Walker, M., et al. (2002). Match: An architecture for multimodal dialogue systems. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL), Philadelphia, July 2002 (pp. 376–383).

    Google Scholar 

  12. Jurčíček, F., Dušek, O., Plátek, O., & Žilka, L. (2014). Alex: A statistical dialogue systems framework. In Proceedings of the 17th International Conference on Text, Speech and Dialogue, TSD 2014, Brno, Czech Republic, September 8–12, 2014 (pp. 587–594). Switzerland: Springer.

    Google Scholar 

  13. Lamere, P., Kwok, P., Gouvea, E., Raj, B., Singh, R., Walker, W., et al. (2003). The CMU SPHINX-4 speech recognition system. In Proceedings of the ICASSP’03, Hong Kong, China.

    Google Scholar 

  14. Lison, P. (2013). Structured probabilistic modelling for dialogue management. Ph.D. thesis, University of Oslo.

    Google Scholar 

  15. LĂłpez-CĂłzar, R., Callejas, Z., Griol, D., & Quesada, J. F. (2015). Review of spoken dialogue systems. Loquens, 1(2), e012.

    Article  Google Scholar 

  16. Minessale, A., & Schreiber, D. (2012). FreeSWITCH Cookbook. Packt Publishing Ltd.

    Google Scholar 

  17. Neßelrath, R., & Alexandersson, J. (2009). A 3D gesture recognition system for multimodal dialog systems. In 6th IJCAI Workshop on Knowledge and Reasoning in Practical Dialogue Systems (pp. 46–51).

    Google Scholar 

  18. Pieraccini, R., & Huerta, J. (2005). Where do we go from here? Research and commercial spoken dialog systems. In 6th SIGdial Workshop on Discourse and Dialogue.

    Google Scholar 

  19. Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., et al. (2011). The Kaldi speech recognition toolkit. In Proceedings of the ASRU, HI, USA.

    Google Scholar 

  20. Prylipko, D., Schnelle-Walka, D., Lord, S., & Wendemuth, A. (2011). Zanzibar openIVR: An open-source framework for development of spoken dialog systems. In Proceedings of the TSD, Pilsen, Czech Republic.

    Google Scholar 

  21. Ramanarayanan, V., Suendermann-Oeft, D., Ivanov, A., & Evanini, K. (2015). A distributed cloud-based dialog system for conversational application development. In 16th Annual SIGdial Meeting on Discourse and Dialogue (SIGDIAL 2015), Prague, Czech Republic.

    Google Scholar 

  22. Schnelle-Walka, D., Radomski, S., & Mühlhäuser, M. (2013). JVoiceXML as a modality component in the W3C multimodal architecture. Journal on Multimodal User Interfaces 7(3), 183–194.

    Google Scholar 

  23. Schröder, M., & Trouvain, J. (2003). The German text-to-speech synthesis system MARY: A tool for research, development and teaching. International Journal of Speech Technology, 6 (4), 365–377.

    Article  Google Scholar 

  24. Suendermann-Oeft, D., Ramanarayanan, V., Teckenbrock, M., Neutatz, F., & Schmidt, D. (2015). HALEF: An open-source standard-compliant telephony-based modular spoken dialog system—A review and an outlook. In Proceedings of the IWSDS Workshop 2015, Busan, South Korea.

    Google Scholar 

  25. Swartout, W., Artstein, R., Forbell, E., Foutz, S., Lane, H.C., Lange, B., et al. (2013). Virtual humans for learning. AI Magazine, 34(4), 13–30.

    Google Scholar 

  26. Swartout, W., Traum, D., Artstein, R., Noren, D., Debevec, P., Bronnenkant, K., et al. (2010). Ada and grace: Toward realistic and engaging virtual museum guides. In Proceedings of the 10th International Conference on Intelligent Virtual Agents, IVA 2010, Philadelphia, PA, USA, September 20–22, 2010. Lecture Notes in Computer Science (pp. 286–300). Berlin/Heidelberg: Springer.

    Google Scholar 

  27. Taylor, P., Black, A., & Caley, R. (1998). The architecture of the festival speech synthesis system. In Proceedings of the ESCA Workshop on Speech Synthesis, Jenolan Caves.

    Google Scholar 

  28. van Meggelen, J., Smith, J., & Madsen, L. (2009). Asterisk: The future of telephony. Sebastopol: O’Reilly.

    Google Scholar 

  29. Yu, Z., Bohus, D., & Horvitz, E. (2015). Incremental coordination: Attention-centric speech production in a physically situated conversational agent. In 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue (p. 402).

    Google Scholar 

  30. Yu, Z., Ramanarayanan, V., Mundkowsky, R., Lange, P., Ivanov, A., Black, A.W., et al. (2016). Multimodal HALEF: An open-source modular web-based multimodal dialog framework. In Proceedings of the IWSDS Workshop 2016, Saariselka, Finland.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vikram Ramanarayanan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Ramanarayanan, V. et al. (2017). Assembling the Jigsaw: How Multiple Open Standards Are Synergistically Combined in the HALEF Multimodal Dialog System. In: Dahl, D. (eds) Multimodal Interaction with W3C Standards. Springer, Cham. https://doi.org/10.1007/978-3-319-42816-1_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-42816-1_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-42814-7

  • Online ISBN: 978-3-319-42816-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics