Elsevier

Speech Communication

Volume 51, Issue 10, October 2009, Pages 1024-1037
Speech Communication

Embodied conversational agents in computer assisted language learning

https://doi.org/10.1016/j.specom.2009.05.006Get rights and content

Abstract

This paper describes two systems using embodied conversational agents (ECAs) for language learning. The first system, called Ville, is a virtual language teacher for vocabulary and pronunciation training. The second system, a dialogue system called DEAL, is a role-playing game for practicing conversational skills. Whereas DEAL acts as a conversational partner with the objective of creating and keeping an interesting dialogue, Ville takes the role of a teacher who guides, encourages and gives feedback to the students.

Introduction

An important aspect of research on computer assisted language learning (CALL) and computer assisted pronunciation training (CAPT) at the Centre for Speech Technology (CTT), KTH, is its focus on using embodied conversational agents (ECA) for language learning. Using the person metaphor rather than the desktop metaphor as an instructional interface could be beneficial in CAPT for several reasons:

  • Users interacting with animated agents have been shown to spend more time with the system, think that it performs better, and enjoy the interaction more compared to interaction with a desktop interface (Walker et al., 1994, Koda and Maes, 1996, Lester and Stone, 1997, van Mulken and Andre, 1998).

  • Speech is multimodal and we communicate more than just verbally through our facial expression. It is well established that visual information supports speech perception (Sumby and Pollack, 1954). Since acoustic and visual speech are complementary modalities, introducing an ECA could make the learning more robust and efficient.

  • Subjects listening to a foreign language often make use of visual information to a greater extent than subjects listening to their own language (Burnham and Lau, 1999, Granström et al., 1999).

  • The efficiency of ECAs for language training of deaf children has been demonstrated by Massaro and Light (2004). Bosseler and Massaro (2003) have also shown that using an ECA as an automatic tutor for vocabulary and language learning is advantageous for children with autism.

  • ECAs are able to give feedback on articulations that a human tutor cannot easily demonstrate. Augmented reality display of the face that shows the position and movement of intra-oral articulators together with the speech signal may improve the learner’s perception and production of new language sounds by internalizing the relationships between speech sounds and the gestures (Engwall, 2008).

We believe that using ECAs for language learning holds great promise for the future of CALL and CAPT. The challenge of making a virtual complement to a human tutor, or classroom teacher, that is infinitely patient, always available, and yet affordable, is an intriguing prospect.

This article describes two systems in which ECAs or computer-animated talking heads are used. Both systems are designed for language learning, but because of their very different roles and agendas, they behave differently, and have different design criteria.

The first system, called Ville, is a virtual teacher, guiding, encouraging and giving corrections and feedback on a student’s pronunciation and language use. Ville and the underlying design criteria for the system are described in Section 3. A version of Ville without pronunciation analysis, but with logging and data collection abilities, has also been used by foreign students at KTH; this version is described in Section 4.

The feedback a learner of a new language (L2) receives when talking to a language teacher differs dramatically from the feedback one usually gets when talking to a native speaker. When a student makes a pronunciation error, a teacher or a proficient CAPT system should give explicit feedback on the L2 learner’s utterance. In a communicative context however, when two people meet in a real dialogue, the pragmatic content of the exchange is what matters, and if there are pronunciation errors, they are usually not commented on.

The second system, called DEAL, is a role-play dialogue system for conversation training. The ECA in DEAL does not comment on a user’s performance, but acts as a conversational partner, with the objective of creating and maintaining an interesting conversation. The feedback one can expect from a conversational exchange is back-channels and, when necessary, clarification questions. Mutual intelligibility is what is sought, and when it fails, communication breakdown is the result.

DEAL can also be seen as a meta-task, within the framework of Ville, of being a diagnostic gate-keeper to the next level of training. (A gate-keeper is a gaming term referring to entities protecting, or guarding gates to new levels in the game that the player must overcome in order to pass to the next level). If the student is able to successfully interact with the ECA in DEAL, it is a sign both to the student and to the system that the student is able to use the content of the previous lesson in a communicative context. The DEAL system is described in the end of the article in Section 6.

Section snippets

Expressive abilities of the ECAs

The ECAs, or talking heads, that are developed at KTH (Beskow, 2003), have the ability to link phonemes to visemes, thus synchronizing speech with lip movements. The architecture supports both synthetic speech from text (TTS) and pre-recorded utterances. TTS still has some limitations that may affect the CAPT program in a negative way, considering the fact that Ville is supposed to be a pronunciation model for the students. Ville’s voice is therefore created from pre-recorded utterances.

Ville – the virtual language teacher

Ville is a virtual language teacher, who guides, encourages and gives feedback to students who wish to develop or improve their language skills. Ville takes on the teacher’s role, selecting the words the students should say. This is a great advantage for the analysis stage in Ville, facilitating the task of correcting pronunciation errors, by having an implicit hypothesis of what kind of pronunciation errors a student is likely to make. The focus of Ville is on pronunciation and perception

Ville 1.0

Foreign students at KTH can study Swedish as a second language at the Unit for Language and Communication. The growing demand for classes, due to the large increase in foreign Master students in recent years, have inspired the language unit to seek new methods in their language training strategies. As a complementary self-study resource the unit has created a web-based beginner course in Swedish called SWELL (Swedish for Elementary Learners). For pronunciation and vocabulary practice, a version

Motivations for language learning

Learning a language requires a substantial effort, and the motivation for doing so varies both over time and between individuals. People learn a language for different reasons: A wish to be like the speakers of the language (integrative motivation) is often a strong motivating factor for younger learners, whereas the utility of what is learnt (instrumental motivation) is often a stronger motivator for adults. Motivation can also come from the pleasure of learning (intrinsic motivation), or from

DEAL – a role-playing dialogue system for L2 learners

The same design principles that are used by game developers are starting to find their way into other fields as well. The notion of ‘serious games’ is an initiative focusing on using game design principles for purposes other than solely to entertain, for example training, advertising, simulation, or education (Iuppa and Borst, 2007). Good gameplay adds to any existing motivation to learn if there is one, and may otherwise create motivation by itself. The idea of transforming education and

Conclusions

Both the Ville and the Deal systems described in this article are ongoing projects, under constant revision and development. We intend them to serve as platforms for research in narrower sub-disciplines such as utterance generation, error handling and turn-taking in dialogue systems, and as testbeds for research on feedback strategies, pronunciation detectors, and other CALL-related research areas.

The release of Ville to real students has been quite successful. Although it has officially only

Acknowledgements

This research was carried out at the Centre for Speech Technology, KTH. The research is also supported by the Graduate School for Language Technology (GSLT). Many thanks to Jenny Brusk for her work in the DEAL project. We would also like to thank Björn Granström, Rolf Carlson, Olov Engwall, Jens Edlund, and Julia Hirschberg for their valuable comments.

References (40)

  • Aist, G., Allen, J.F., Campana, E., Galescu, L., Gómez Gallo, C.A., Stoness, S.C., Swift, M., Tanenhaus, M., 2006....
  • Bannert, R., 2004. På väg mot svenskt uttal. Studentlitteratur...
  • Beskow, J., 2003. Talking heads – models and applications for multimodal speech synthesis. Doctoral Dissertation, KTH,...
  • A. Bosseler et al.

    Development and evaluation of a computer-animated tutor for vocabulary and language learning for children with autism

    J. Autism Develop. Disorders

    (2003)
  • Brennan, S., 2000. Processes that shape conversation and their implications for computational. In: Proc. 38th Annual...
  • Brusk, J., Lager, T., Hjalmarsson, A., Wik, P., 2007. DEAL – dialogue management in SCXML for believable game...
  • Burnham, D., Lau, S., 1999. The integration of auditory and visual speech information with foreign speakers: the role...
  • Carlson, R., Granström, B., Heldner, M., House, D., Megyesi, B., Strangert, E., Swerts, M., 2002. Boundaries and...
  • R. Ellis

    The Study of Second Language Acquisition

    (1994)
  • O. Engwall et al.

    Pronunciation feedback from real and virtual language teachers

    J. Comput. Assist. Lang. Learn.

    (2007)
  • Engwall, O., Wik, P., Beskow, J., Granström, G., 2004. Design strategies for a virtual language tutor. In: Kim, S.H.,...
  • Engwall, O., 2008. Can audio-visual instructions help learners improve their articulation? – an ultrasound study of...
  • M. Eskenazi

    Using automatic speech processing for foreign language pronunciation tutoring: some issues and a prototype

    Lang. Learn. Technol.

    (1999)
  • Flege, J.E., 1998. Second-language learning: the role of subject and phonetic variables. In: STiLL-Speech Technology in...
  • J.P. Gee

    What Video Games Have to Teach Us About Literacy and Learning

    (2003)
  • Granström, B., House, D., Lundeberg, M., 1999. Prosodic cues in multi-modal speech perception. In: Proc. ICPhS-99, pp....
  • Gustafson, J., Bell, L., Boye, J., Lindström, A., Wirén, M., 2004. The NICE fairy-tale game system. In: Proc....
  • Hjalmarsson, A., Wik, P., Brusk, J., 2007. Dealing with DEAL: a dialogue system for conversation training. In: Proc....
  • Hjalmarsson, A., 2008. Speaking without knowing what to say… or when to end. In: Proc. SIGDial 2008, Columbus, Ohio,...
  • N. Iuppa et al.

    Story and Simulations for Serious Games: Tales from the Trenches

    (2007)
  • Cited by (102)

    • Ladderbot—A conversational agent for human-like online laddering interviews

      2023, International Journal of Human Computer Studies
    • Captivating: Avatars as therapeutic agents for children with intellectual and developmental disabilities

      2021, International Review of Research in Developmental Disabilities
      Citation Excerpt :

      Furthermore, eye-tracking research indicates that humans treat agents as social, conversational partners (Louwerse, Graesser, McNamara, & Lu, 2009). In the educational context, agents are effective for diverse uses and designs, including learning foreign languages (Wik & Hjalmarsson, 2009) and solving math problems (Tamayo-Moreno & Pérez-Marín, 2017). In contrast to students in a no learning support or a nonconversational support condition, students using an agent-based e-learning program demonstrated improvements of one full letter grade or more (Kumar & Rose, 2010).

    • Data fusion methods in multimodal human computer dialog

      2019, Virtual Reality and Intelligent Hardware
    View all citing articles on Scopus
    1

    Tel.: +46 8 7906293; fax: +46 8 7907854.

    View full text