skip to main content
research-article

Gesture controllers

Published:26 July 2010Publication History
Skip Abstract Section

Abstract

We introduce gesture controllers, a method for animating the body language of avatars engaged in live spoken conversation. A gesture controller is an optimal-policy controller that schedules gesture animations in real time based on acoustic features in the user's speech. The controller consists of an inference layer, which infers a distribution over a set of hidden states from the speech signal, and a control layer, which selects the optimal motion based on the inferred state distribution. The inference layer, consisting of a specialized conditional random field, learns the hidden structure in body language style and associates it with acoustic features in speech. The control layer uses reinforcement learning to construct an optimal policy for selecting motion clips from a distribution over the learned hidden states. The modularity of the proposed method allows customization of a character's gesture repertoire, animation of non-human characters, and the use of additional inputs such as speech recognition or direct user control.

Skip Supplemental Material Section

Supplemental Material

tp070-10.mp4

mp4

32.9 MB

References

  1. Albrecht, I., Haber, J., and peter Seidel, H. 2002. Automatic generation of non-verbal facial expressions from speech. In Computer Graphics International, 283--293.Google ScholarGoogle Scholar
  2. Bailenson, J. N., Beall, A. C., Loomis, J., Blascovich, J., and Turk, M. 2004. Transformed social interaction: Decoupling representation from behavior and form in collaborative virtual environments. Presence: Teleoperators and Virtual Environments 13, 4, 428--441. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bertsekas, D. 2007. Dynamic Programming and Optimal Control, third ed. Athena Scientific. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Birdwhistell, R. 1952. Introduction to Kinesics. Department of State Foreign Service Institute, Washington, DC.Google ScholarGoogle Scholar
  5. Brand, M. 1999. Voice puppetry. In SIGGRAPH '99: ACM SIGGRAPH 1999 papers, ACM, New York, NY, USA, 21--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Bregler, C., Covell, M., and Slaney, M. 1997. Video rewrite: driving visual speech with audio. In SIGGRAPH '97: ACM SIGGRAPH 1997 Papers, ACM, New York, NY, USA, 353--360. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Cassell, J., Pelachaud, C., Badler, N., Steedman, M., Achorn, B., Becket, T., Douville, B., Prevost, S., and Stone, M. 1994. Animated conversation: rule-based generation of facial expression, gesture & spoken intonation for multiple conversational agents. In SIGGRAPH '94: ACM SIGGRAPH 1994 Papers, ACM, New York, NY, USA, 413--420. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Cassell, J., Vilhjálmsson, H. H., and Bickmore, T. 2001. Beat: the behavior expression animation toolkit. In SIGGRAPH '01: ACM SIGGRAPH 2001 papers, ACM, New York, NY, USA, 477--486. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Chuang, E., and Bregler, C. 2005. Mood swings: expressive speech animation. ACM Transactions on Graphics 24, 2, 331--347. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. de Meijer, M. 1989. The contribution of general features of body movement to the attribution of emotions. Journal of Nonverbal Behavior 13, 4, 247--268.Google ScholarGoogle ScholarCross RefCross Ref
  11. Deng, Z., and Neumann, U. 2007. Data-Driven 3D Facial Animation. Springer-Verlag Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Dobrogaev, S. M. 1931. Ucenie o reflekse v problemakh jazykovedenija. {Observations on reflex in aspects of language study.}. Jazykovedenie i Materializm 2, 105--173.Google ScholarGoogle Scholar
  13. Efron, D. 1972. Gesture, Race and Culture. The Hague: Mouton.Google ScholarGoogle Scholar
  14. Englebienne, G., Cootes, T., and Rattray, M. 2007. A probabilistic model for generating realistic lip movements from speech. In Neural Information Processing Systems (NIPS) 19, MIT Press.Google ScholarGoogle Scholar
  15. Feyereisen, P., and de Lannoy, J.-D. 1991. Gestures and Speech: Psychological Investigations. Cambridge University Press.Google ScholarGoogle Scholar
  16. Hartmann, B., Mancini, M., and Pelachaud, C. 2002. Formational parameters and adaptive prototype instantiation for mpeg-4 compliant gesture synthesis. In Proceedings on Computer Animation, IEEE Computer Society, Washington, DC, USA, 111. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Hartmann, B., Mancini, M., and Pelachaud, C. 2005. Implementing expressive gesture synthesis for embodied conversational agents. In Gesture Workshop, Springer, 188--199. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Kendon, A. 2004. Gesture -- Visible Action as Utterance. Cambridge University Press, New York, NY, USA.Google ScholarGoogle Scholar
  19. Kipp, M., Neff, M., and Albrecht, I. 2007. An annotation scheme for conversational gestures: How to economically capture timing and form. Language Resources and Evaluation 41, 3--4, 325--339.Google ScholarGoogle ScholarCross RefCross Ref
  20. Kopp, S., and Wachsmuth, I. 2004. Synthesizing multimodal utterances for conversational agents: Research articles. Computer Animation and Virtual Worlds 15, 1, 39--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Lafferty, J. D., McCallum, A., and Pereira, F. C. N. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proc. of the 18th International Conference on Machine Learning, Morgan Kaufmann Inc., 282--289. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Levine, S., Theobalt, C., and Koltun, V. 2009. Real-time prosody-driven synthesis of body language. In SIGGRAPH Asia '09: ACM SIGGRAPH Asia 2009 papers, ACM, New York, NY, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Li, Y., and Shum, H.-Y. 2006. Learning dynamic audio-visual mapping with input-output hidden Markov models. IEEE Transactions on Multimedia 8, 3, 542--549. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Loehr, D. 2007. Aspects of rhythm in gesture and speech. Gesture 7, 2, 179--214.Google ScholarGoogle ScholarCross RefCross Ref
  25. McCann, J., and Pollard, N. 2007. Responsive characters from motion fragments. In SIGGRAPH '07: ACM SIGGRAPH 2007 papers, ACM, New York, NY, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. McNeill, D. 1992. Hand and Mind: What Gestures Reveal About Thought. University Of Chicago Press.Google ScholarGoogle Scholar
  27. Morency, L.-P., Quattoni, A., and Darrell, T. 2007. Latent-dynamic discriminative models for continuous gesture recognition. In Proc. of IEEE Computer Vision and Pattern Recognition, 1--8.Google ScholarGoogle Scholar
  28. Neff, M., Kipp, M., Albrecht, I., and Seidel, H.-P. 2008. Gesture modeling and animation based on a probabilistic recreation of speaker style. ACM Transactions on Graphics 27, 1, 1--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Newlove, J. 1993. Laban for Actors and Dancers. Routledge Nick Hern Books, New York, NY, USA.Google ScholarGoogle Scholar
  30. Perlin, K., and Goldberg, A. 1996. Improv: a system for scripting interactive actors in virtual worlds. In SIGGRAPH '96: ACM SIGGRAPH 1996 Papers, ACM, 205--216. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Rabiner, L. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77, 2, 257--286.Google ScholarGoogle ScholarCross RefCross Ref
  32. Sargin, M. E., Yemez, Y., Erzin, E., and Tekalp, A. M. 2008. Analysis of head gesture and prosody patterns for prosody-driven head-gesture animation. IEEE Transactions on Pattern Analysis and Machine Intelligence 30, 8, 1330--1345. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Shröder, M. 2009. Expressive speech synthesis: Past, present, and possible futures. Affective Information Processing, 111--126.Google ScholarGoogle Scholar
  34. Stone, M., DeCarlo, D., Oh, I., Rodriguez, C., Stere, A., Lees, A., and Bregler, C. 2004. Speaking with hands: creating animated conversational characters from recordings of human performance. In SIGGRAPH '04: ACM SIGGRAPH 2004 Papers, ACM, New York, NY, USA, 506--513. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. The CMU Sphinx Group, 2007. Open source speech recognition engines.Google ScholarGoogle Scholar
  36. Treuille, A., Lee, Y., and Popović, Z. 2007. Near-optimal character animation with continuous control. In SIGGRAPH '07: ACM SIGGRAPH 2007 Papers, ACM, New York, NY, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Valbonesi, L., Ansari, R., McNeill, D., Quek, F., S. Duncan, K. E. M., and Bryll, R. 2002. Multimodal signal analysis of prosody and hand motion: Temporal correlation of speech and gestures. In EUSIPCO '02, vol. 1, 75--78.Google ScholarGoogle Scholar
  38. Wang, S. B., Quattoni, A., Morency, L.-P., Demirdjian, D., and Darrell, T. 2006. Hidden conditional random fields for gesture recognition. In Computer Vision and Pattern Recognition, 1521--1527. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Xue, J., Borgstrom, J., Jiang, J., Bernstein, L., and Alwan, A. 2006. Acoustically-driven talking face synthesis using dynamic Bayesian networks. IEEE International Conference on Multimedia and Expo, 1165--1168.Google ScholarGoogle Scholar
  40. Zhao, L., and Badler, N. I. 2005. Acquiring and validating motion qualities from live limb gestures. Graphical Models 67, 1, 1--16. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Gesture controllers

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Graphics
      ACM Transactions on Graphics  Volume 29, Issue 4
      July 2010
      942 pages
      ISSN:0730-0301
      EISSN:1557-7368
      DOI:10.1145/1778765
      Issue’s Table of Contents

      Copyright © 2010 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 26 July 2010
      Published in tog Volume 29, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader