research-article

Gesture controllers

Authors:
Sergey Levine

Stanford University

Stanford University
View Profile

,
Philipp Krähenbühl

Stanford University

Stanford University
View Profile

,
Sebastian Thrun

Stanford University

Stanford University
View Profile

,
Vladlen Koltun

Stanford University

Stanford University
View Profile

Authors Info & Claims

ACM Transactions on Graphics Volume 29 Issue 4Article No.: 124pp 1–11https://doi.org/10.1145/1778765.1778861

Published:26 July 2010Publication History

ACM Transactions on Graphics

Abstract

We introduce gesture controllers, a method for animating the body language of avatars engaged in live spoken conversation. A gesture controller is an optimal-policy controller that schedules gesture animations in real time based on acoustic features in the user's speech. The controller consists of an inference layer, which infers a distribution over a set of hidden states from the speech signal, and a control layer, which selects the optimal motion based on the inferred state distribution. The inference layer, consisting of a specialized conditional random field, learns the hidden structure in body language style and associates it with acoustic features in speech. The control layer uses reinforcement learning to construct an optimal policy for selecting motion clips from a distribution over the learned hidden states. The modularity of the proposed method allows customization of a character's gesture repertoire, animation of non-human characters, and the use of additional inputs such as speech recognition or direct user control.

Supplemental Material

tp070-10.mp4

mp4

32.9 MB

Download

Available for Download

zip

124.zip (200.9 MB)

The auxiliary material contains the accompanying video showing various gesture controllers.

References

Albrecht, I., Haber, J., and peter Seidel, H. 2002. Automatic generation of non-verbal facial expressions from speech. In Computer Graphics International, 283--293.Google Scholar
Bailenson, J. N., Beall, A. C., Loomis, J., Blascovich, J., and Turk, M. 2004. Transformed social interaction: Decoupling representation from behavior and form in collaborative virtual environments. Presence: Teleoperators and Virtual Environments 13, 4, 428--441. Google ScholarDigital Library
Bertsekas, D. 2007. Dynamic Programming and Optimal Control, third ed. Athena Scientific. Google ScholarDigital Library
Birdwhistell, R. 1952. Introduction to Kinesics. Department of State Foreign Service Institute, Washington, DC.Google Scholar
Brand, M. 1999. Voice puppetry. In SIGGRAPH '99: ACM SIGGRAPH 1999 papers, ACM, New York, NY, USA, 21--28. Google ScholarDigital Library
Bregler, C., Covell, M., and Slaney, M. 1997. Video rewrite: driving visual speech with audio. In SIGGRAPH '97: ACM SIGGRAPH 1997 Papers, ACM, New York, NY, USA, 353--360. Google ScholarDigital Library
Cassell, J., Pelachaud, C., Badler, N., Steedman, M., Achorn, B., Becket, T., Douville, B., Prevost, S., and Stone, M. 1994. Animated conversation: rule-based generation of facial expression, gesture & spoken intonation for multiple conversational agents. In SIGGRAPH '94: ACM SIGGRAPH 1994 Papers, ACM, New York, NY, USA, 413--420. Google ScholarDigital Library
Cassell, J., Vilhjálmsson, H. H., and Bickmore, T. 2001. Beat: the behavior expression animation toolkit. In SIGGRAPH '01: ACM SIGGRAPH 2001 papers, ACM, New York, NY, USA, 477--486. Google ScholarDigital Library
Chuang, E., and Bregler, C. 2005. Mood swings: expressive speech animation. ACM Transactions on Graphics 24, 2, 331--347. Google ScholarDigital Library
de Meijer, M. 1989. The contribution of general features of body movement to the attribution of emotions. Journal of Nonverbal Behavior 13, 4, 247--268.Google ScholarCross Ref
Deng, Z., and Neumann, U. 2007. Data-Driven 3D Facial Animation. Springer-Verlag Press. Google ScholarDigital Library
Dobrogaev, S. M. 1931. Ucenie o reflekse v problemakh jazykovedenija. {Observations on reflex in aspects of language study.}. Jazykovedenie i Materializm 2, 105--173.Google Scholar
Efron, D. 1972. Gesture, Race and Culture. The Hague: Mouton.Google Scholar
Englebienne, G., Cootes, T., and Rattray, M. 2007. A probabilistic model for generating realistic lip movements from speech. In Neural Information Processing Systems (NIPS) 19, MIT Press.Google Scholar
Feyereisen, P., and de Lannoy, J.-D. 1991. Gestures and Speech: Psychological Investigations. Cambridge University Press.Google Scholar
Hartmann, B., Mancini, M., and Pelachaud, C. 2002. Formational parameters and adaptive prototype instantiation for mpeg-4 compliant gesture synthesis. In Proceedings on Computer Animation, IEEE Computer Society, Washington, DC, USA, 111. Google ScholarDigital Library
Hartmann, B., Mancini, M., and Pelachaud, C. 2005. Implementing expressive gesture synthesis for embodied conversational agents. In Gesture Workshop, Springer, 188--199. Google ScholarDigital Library
Kendon, A. 2004. Gesture -- Visible Action as Utterance. Cambridge University Press, New York, NY, USA.Google Scholar
Kipp, M., Neff, M., and Albrecht, I. 2007. An annotation scheme for conversational gestures: How to economically capture timing and form. Language Resources and Evaluation 41, 3--4, 325--339.Google ScholarCross Ref
Kopp, S., and Wachsmuth, I. 2004. Synthesizing multimodal utterances for conversational agents: Research articles. Computer Animation and Virtual Worlds 15, 1, 39--52. Google ScholarDigital Library
Lafferty, J. D., McCallum, A., and Pereira, F. C. N. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proc. of the 18th International Conference on Machine Learning, Morgan Kaufmann Inc., 282--289. Google ScholarDigital Library
Levine, S., Theobalt, C., and Koltun, V. 2009. Real-time prosody-driven synthesis of body language. In SIGGRAPH Asia '09: ACM SIGGRAPH Asia 2009 papers, ACM, New York, NY, USA. Google ScholarDigital Library
Li, Y., and Shum, H.-Y. 2006. Learning dynamic audio-visual mapping with input-output hidden Markov models. IEEE Transactions on Multimedia 8, 3, 542--549. Google ScholarDigital Library
Loehr, D. 2007. Aspects of rhythm in gesture and speech. Gesture 7, 2, 179--214.Google ScholarCross Ref
McCann, J., and Pollard, N. 2007. Responsive characters from motion fragments. In SIGGRAPH '07: ACM SIGGRAPH 2007 papers, ACM, New York, NY, USA. Google ScholarDigital Library
McNeill, D. 1992. Hand and Mind: What Gestures Reveal About Thought. University Of Chicago Press.Google Scholar
Morency, L.-P., Quattoni, A., and Darrell, T. 2007. Latent-dynamic discriminative models for continuous gesture recognition. In Proc. of IEEE Computer Vision and Pattern Recognition, 1--8.Google Scholar
Neff, M., Kipp, M., Albrecht, I., and Seidel, H.-P. 2008. Gesture modeling and animation based on a probabilistic recreation of speaker style. ACM Transactions on Graphics 27, 1, 1--24. Google ScholarDigital Library
Newlove, J. 1993. Laban for Actors and Dancers. Routledge Nick Hern Books, New York, NY, USA.Google Scholar
Perlin, K., and Goldberg, A. 1996. Improv: a system for scripting interactive actors in virtual worlds. In SIGGRAPH '96: ACM SIGGRAPH 1996 Papers, ACM, 205--216. Google ScholarDigital Library
Rabiner, L. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77, 2, 257--286.Google ScholarCross Ref
Sargin, M. E., Yemez, Y., Erzin, E., and Tekalp, A. M. 2008. Analysis of head gesture and prosody patterns for prosody-driven head-gesture animation. IEEE Transactions on Pattern Analysis and Machine Intelligence 30, 8, 1330--1345. Google ScholarDigital Library
Shröder, M. 2009. Expressive speech synthesis: Past, present, and possible futures. Affective Information Processing, 111--126.Google Scholar
Stone, M., DeCarlo, D., Oh, I., Rodriguez, C., Stere, A., Lees, A., and Bregler, C. 2004. Speaking with hands: creating animated conversational characters from recordings of human performance. In SIGGRAPH '04: ACM SIGGRAPH 2004 Papers, ACM, New York, NY, USA, 506--513. Google ScholarDigital Library
The CMU Sphinx Group, 2007. Open source speech recognition engines.Google Scholar
Treuille, A., Lee, Y., and Popović, Z. 2007. Near-optimal character animation with continuous control. In SIGGRAPH '07: ACM SIGGRAPH 2007 Papers, ACM, New York, NY, USA. Google ScholarDigital Library
Valbonesi, L., Ansari, R., McNeill, D., Quek, F., S. Duncan, K. E. M., and Bryll, R. 2002. Multimodal signal analysis of prosody and hand motion: Temporal correlation of speech and gestures. In EUSIPCO '02, vol. 1, 75--78.Google Scholar
Wang, S. B., Quattoni, A., Morency, L.-P., Demirdjian, D., and Darrell, T. 2006. Hidden conditional random fields for gesture recognition. In Computer Vision and Pattern Recognition, 1521--1527. Google ScholarDigital Library
Xue, J., Borgstrom, J., Jiang, J., Bernstein, L., and Alwan, A. 2006. Acoustically-driven talking face synthesis using dynamic Bayesian networks. IEEE International Conference on Multimedia and Expo, 1165--1168.Google Scholar
Zhao, L., and Badler, N. I. 2005. Acquiring and validating motion qualities from live limb gestures. Graphical Models 67, 1, 1--16. Google ScholarDigital Library

Index Terms

Gesture controllers
1. Computing methodologies
  1. Computer graphics
    1. Animation

Recommendations

Gesture controllers
SIGGRAPH '10: ACM SIGGRAPH 2010 papers

We introduce gesture controllers, a method for animating the body language of avatars engaged in live spoken conversation. A gesture controller is an optimal-policy controller that schedules gesture animations in real time based on acoustic features in ...
Read More
Real-time prosody-driven synthesis of body language
SIGGRAPH Asia '09: ACM SIGGRAPH Asia 2009 papers

Human communication involves not only speech, but also a wide variety of gestures and body motions. Interactions in virtual environments often lack this multi-modal aspect of communication. We present a method for automatically synthesizing body ...
Read More
Real-time prosody-driven synthesis of body language

Human communication involves not only speech, but also a wide variety of gestures and body motions. Interactions in virtual environments often lack this multi-modal aspect of communication. We present a method for automatically synthesizing body ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Graphics Volume 29, Issue 4
July 2010
942 pages
ISSN:0730-0301
EISSN:1557-7368
DOI:10.1145/1778765
Issue’s Table of Contents

Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 July 2010
Published in tog Volume 29, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
data-driven animation
gesture synthesis
human animation
nonverbal behavior generation
optimal control
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 83
  Total Citations
  View Citations
- 1,712
  Total Downloads
- Downloads (Last 12 months)16
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Gesture controllers

ACM Transactions on Graphics

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Gesture controllers

Real-time prosody-driven synthesis of body language

Real-time prosody-driven synthesis of body language

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Gesture controllers

ACM Transactions on Graphics

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Gesture controllers

Real-time prosody-driven synthesis of body language

Real-time prosody-driven synthesis of body language

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media