research-article

PTeacher: a Computer-Aided Personalized Pronunciation Training System with Exaggerated Audio-Visual Corrective Feedback

Authors:
Yaohua Bu

Academy of Arts & Design Tsinghua University, China

Academy of Arts & Design Tsinghua University, China
View Profile

,
Tianyi Ma

Department of Computer Science and Technology Tsinghua University, China

Department of Computer Science and Technology Tsinghua University, China
View Profile

,
Weijun Li

School of Information Science and Technology Northeast Normal University, China

School of Information Science and Technology Northeast Normal University, China
View Profile

,
Hang Zhou

Electronic Engineering The Chinese University of Hong Kong, China

Electronic Engineering The Chinese University of Hong Kong, China
View Profile

,
Jia Jia

Department of Computer Science and Technology Tsinghua University, China

Department of Computer Science and Technology Tsinghua University, China
View Profile

,
Shengqi Chen

Department of Computer Science and Technology Tsinghua University, China

Department of Computer Science and Technology Tsinghua University, China
View Profile

,
Kaiyuan Xu

Department of Computer Science and Technology Tsinghua University, China

Department of Computer Science and Technology Tsinghua University, China
View Profile

,
Dachuan Shi

Department of Computer Science and Technology Tsinghua University, China

Department of Computer Science and Technology Tsinghua University, China
View Profile

,
Haozhe Wu

Department of Computer Science and Technology Tsinghua University, China

Department of Computer Science and Technology Tsinghua University, China
View Profile

,
Zhihan Yang

Department of Computer Science and Technology Tsinghua University, China

Department of Computer Science and Technology Tsinghua University, China
View Profile

,
Kun Li

Speech X Limited, China

Speech X Limited, China
View Profile

,
Zhiyong Wu

Tsinghua University, China

Tsinghua University, China
View Profile

,
Yuanchun Shi

Department of Computer science and Technology Tsinghua University, China

Department of Computer science and Technology Tsinghua University, China
View Profile

,
Xiaobo Lu

Academy of Arts & Design Tsinghua University, China

Academy of Arts & Design Tsinghua University, China
View Profile

,
Ziwei Liu

The Chinese University of Hong Kong, China

The Chinese University of Hong Kong, China
View Profile

CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing SystemsMay 2021Article No.: 676Pages 1–14https://doi.org/10.1145/3411764.3445490

Published:07 May 2021Publication History

CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems

Pages 1–14

ABSTRACT

Second language (L2) English learners often find it difficult to improve their pronunciations due to the lack of expressive and personalized corrective feedback. In this paper, we present Pronunciation Teacher (PTeacher), a Computer-Aided Pronunciation Training (CAPT) system that provides personalized exaggerated audio-visual corrective feedback for mispronunciations. Though the effectiveness of exaggerated feedback has been demonstrated, it is still unclear how to define the appropriate degrees of exaggeration when interacting with individual learners. To fill in this gap, we interview 100 L2 English learners and 22 professional native teachers to understand their needs and experiences. Three critical metrics are proposed for both learners and teachers to identify the best exaggeration levels in both audio and visual modalities. Additionally, we incorporate the personalized dynamic feedback mechanism given the English proficiency of learners. Based on the obtained insights, a comprehensive interactive pronunciation training course is designed to help L2 learners rectify mispronunciations in a more perceptible, understandable, and discriminative manner. Extensive user studies demonstrate that our system significantly promotes the learners’ learning efficiency.

Supplemental Material

3411764.3445490_videofigure.mp4

mp4

42.8 MB

Download

Available for Download

zip

Supplementary Materials (1.6 MB)

References

Najwa Alghamdi, Steve Maddock, Jon Barker, and Guy J Brown. 2017. The impact of automatic exaggeration of the visual articulatory features of a talker on the intelligibility of spectrally distorted speech. Speech Communication 95(2017), 127–136.Google ScholarDigital Library
Pierre Badin, Atef Ben Youssef, Gérard Bailly, Frédéric Elisei, and Thomas Hueber. 2010. Visual articulatory feedback for phonetic correction in second language learning. In Second Language Studies: Acquisition, Learning, Education and Technology.Google Scholar
Heather Bliss, Jennifer Abel, and Bryan Gick. 2018. Computer-assisted visual articulation feedback in L2 pronunciation instruction: A review. Journal of Second Language Pronunciation 4, 1 (2018), 129–153.Google ScholarCross Ref
Ann R Bradlow, David B Pisoni, Reiko Akahane-Yamada, and Yoh’ichi Tohkura. 1997. Training Japanese listeners to identify English/r/and/l: IV. Some effects of perceptual learning on speech production. The Journal of the Acoustical Society of America 101, 4 (1997), 2299–2310.Google ScholarCross Ref
Catherine P Browman and Louis Goldstein. 1992. Articulatory phonology: An overview. Phonetica 49, 3-4 (1992), 155–180.Google ScholarCross Ref
Matthew I Brown and Avi E Cieplinski. 2020. Device, method, and graphical user interface for providing audiovisual feedback. US Patent 10,599,394.Google Scholar
Yaohua Bu, Jia Jia, Xiang Li, Suping Zhou, and Xiaobo Lu. 2018. IcooBook: when the picture book for children encounters aesthetics of interaction. In Proceedings of the 26th ACM international conference on Multimedia. 1260–1262.Google ScholarDigital Library
Yaohua Bu, Weijun Li, Tianyi Ma, Shengqi Chen, Jia Jia, Kun Li, and Xiaobo Lu. 2020. Visual-speech Synthesis of Exaggerated Corrective Feedback. In Proceedings of the 28th ACM International Conference on Multimedia. 4521–4523.Google ScholarDigital Library
Eva Cerviño-Povedano and Joan C Mora. 2010. Investigating Catalan learners of English over-reliance on duration: Vowel cue weighting and phonological short-term memory. Achievements and perspectives in the acquisition of second language speech: New Sounds (2010), 53–64.Google Scholar
Pierre Chalfoun and Claude Frasson. 2011. Subliminal cues while teaching: HCI technique for enhanced learning. Advances in Human-Computer Interaction 2011 (2011).Google Scholar
Bay-Wei Chang and David Ungar. 1993. Animation: from cartoons to the user interface. In Proceedings of the 6th annual ACM symposium on User interface software and technology. 45–55.Google ScholarDigital Library
Tsuhan Chen and Ram R Rao. 1998. Audio-visual integration in multimodal communication. Proc. IEEE 86, 5 (1998), 837–852.Google ScholarCross Ref
Bing Cheng, Xiaojuan Zhang, Siying Fan, and Yang Zhang. 2019. The role of temporal acoustic exaggeration in high variability phonetic training: A behavioral and ERP study. Frontiers in psychology 10 (2019), 1178.Google Scholar
Bing Cheng, Xiaojuan Zhang, and Yang Zhang. 2019. Temporal exaggeration facilitates second language phonetic training: The case of syllable-final nasal contrast. The Journal of the Acoustical Society of America 146, 4 (2019), 2844–2844.Google ScholarCross Ref
Laura Colantoni, Jeffrey Steele, Paola Escudero, and Paola Rocío Escudero Neyra. 2015. Second language speech. Cambridge University Press.Google Scholar
Juliet Corbin and Anselm Strauss. 2014. Basics of qualitative research: Techniques and procedures for developing grounded theory. Sage publications.Google Scholar
Nuria Calvo Cortés. 2005. Negative language transfer when learning Spanish as a foreign language. Interlingüística16 (2005), 237–248.Google Scholar
British Council. 2013. The English Effect. Retrieved March 22(2013), 2015.Google Scholar
David Crystal. 2011. A dictionary of linguistics and phonetics. Vol. 30. John Wiley & Sons.Google Scholar
Tracey M Derwing and Murray J Munro. 2005. Second language accent and pronunciation teaching: A research-based approach. TESOL quarterly 39, 3 (2005), 379–397.Google Scholar
Tracey M Derwing and Marian J Rossiter. 2002. ESL learners’ perceptions of their pronunciation needs and strategies. System 30, 2 (2002), 155–166.Google ScholarCross Ref
Paola Escudero. 2001. The role of the input in the development of L1 and L2 sound contrasts: language-specific cue weighting for vowels. In Proceedings of the 25th annual Boston University conference on language development, Vol. 1. Citeseer, 250–261.Google Scholar
Paola Rocío Escudero Neyra. 2005. Linguistic perception and second language acquisition: explaining the attainment of optimal phonological categorization. Ph.D. Dissertation. Utrecht University & LOT.Google Scholar
Tony Ezzat and Tomaso Poggio. 2000. Visual speech synthesis by morphing visemes. International Journal of Computer Vision 38, 1 (2000), 45–57.Google ScholarDigital Library
Christina Garcia, Mark Kolat, and Terrell A Morgan. 2018. SELF-CORRECTION OF SECOND-LANGUAGE PRONUNCIATION VIA ONLINE, REAL-TIME, VISUAL FEEDBACK. In PRONUNCIATION IN SECOND LANGUAGE LEARNING AND TEACHING CONFERENCE (ISSN 2380-9566). 54.Google Scholar
Patrick H Geoghegan, C Spence, Wei H Ho, X Lu, M Jermy, P Hunter, and J Cater. 2012. Stereoscopic PIV measurement of airflow in human speech during pronunciation of fricatives. In 16th International Symposium of Laser Techniques to Fluid Mechanics, Lisbon, Portugal, 9th-12th July.Google Scholar
Ewa M Golonka, Anita R Bowles, Victor M Frank, Dorna L Richardson, and Suzanne Freynik. 2014. Technologies for foreign language learning: a review of technology types and their effectiveness. Computer assisted language learning 27, 1 (2014), 70–105.Google Scholar
Antti Granqvist, Tapio Takala, Jari Takatalo, and Perttu Hämäläinen. 2018. Exaggeration of Avatar Flexibility in Virtual Reality. In Proceedings of the 2018 Annual Symposium on Computer-Human Interaction in Play. 201–209.Google ScholarDigital Library
Joshua Hailpern, Karrie Karahalios, and James Halle. 2009. Creating a spoken impact: encouraging vocalization through audio visual feedback in children with ASD. In Proceedings of the SIGCHI conference on human factors in computing systems. 453–462.Google ScholarDigital Library
Morris Halle, Bert Vaux, and Andrew Wolfe. 2000. On feature spreading and the representation of place of articulation. Linguistic inquiry 31, 3 (2000), 387–444.Google Scholar
CC Hsu. [n.d.]. Python-wrapper-for-world-vocoder.Google Scholar
Philip Hubbard. 2002. Interactive participatory dramas for language learning. Simulation & Gaming 33, 2 (2002), 210–216.Google ScholarDigital Library
Yurie Iribe, Silasak Manosavanh, Kouichi Katsurada, Ryoko Hayashi, Chunyue Zhu, and Tsuneo Nitta. 2011. Generating animated pronunciation from speech through articulatory feature extraction. In Twelfth Annual Conference of the International Speech Communication Association.Google ScholarCross Ref
D Kalikow and J Swets. 1972. Experiments with computer-controlled displays in second-language learning. IEEE Transactions on Audio and Electroacoustics 20, 1(1972), 23–28.Google ScholarCross Ref
Natalia Kartushina and Ulrich H Frauenfelder. 2014. On the effects of L2 perception and of individual differences in L1 production on L2 pronunciation. Frontiers in psychology 5 (2014), 1246.Google Scholar
Natalia Kartushina, Alexis Hervais-Adelman, Ulrich Hans Frauenfelder, and Narly Golestani. 2015. The effect of phonetic production training with visual feedback on the perception and production of foreign speech sounds. The journal of the acoustical society of America 138, 2 (2015), 817–832.Google Scholar
Tatsuya Kawahara, Masatake Dantsuji, and Yasushi Tsubota. 2004. Practical use of English pronunciation system for Japanese students in the CALL classroom. In Eighth International Conference on Spoken Language Processing.Google ScholarCross Ref
Gerald Kelly. 2006. How To Teach Pronunciation (With Cd). Pearson Education India.Google Scholar
P Khul, K Williams, F Lacerda, and K Lindblom Stevens. [n.d.]. B.(1992). Linguistic Experience Alters Phonetic Perception in Infants by 6 Months of Age. Science 255([n. d.]).Google Scholar
AJ King and AR Palmer. 1985. Integration of visual and auditory information in bimodal neurones in the guinea-pig superior colliculus. Experimental brain research 60, 3 (1985), 492–500.Google Scholar
Valeri Aleksandrovich Kozhevnikov and Liudmila Andreevna Chistovich. 1967. Speech: articulation and perception. Vol. 30. US Department of Commerce, Clearinghouse for Federal Scientific and ….Google Scholar
John Lasseter. 1987. Principles of traditional animation applied to 3D computer animation. In Proceedings of the 14th annual conference on Computer graphics and interactive techniques. 35–44.Google ScholarDigital Library
Andrew H Lee and Roy Lyster. 2016. The effects of corrective feedback on instructed L2 speech perception. Studies in Second Language Acquisition 38, 1 (2016), 35.Google ScholarCross Ref
Bradford Lee, Luke Plonsky, and Kazuya Saito. 2020. The effects of perception-vs. production-based pronunciation instruction. System 88(2020), 102185.Google ScholarCross Ref
Wai-Kim Leung, Xunying Liu, and Helen Meng. 2019. CNN-RNN-CTC based end-to-end mispronunciation detection and diagnosis. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 8132–8136.Google ScholarCross Ref
Wai-Kim Leung, Ka-Wa Yuen, Ka-Ho Wong, and Helen Meng. 2013. Development of text-to-audiovisual speech synthesis to support interactive language learning on a mobile device. In 2013 IEEE 4th International Conference on Cognitive Infocommunications (CogInfoCom). IEEE, 583–588.Google ScholarCross Ref
Kun Li, Jing Li, Yufang Song, and Hewei Fu. 2015. Rating Algorithm for Pronunciation of English Based on Audio Feature Pattern Matching. In MATEC Web of Conferences, Vol. 22. EDP Sciences, 01032.Google Scholar
Kun Li, Xiaojun Qian, Shiyin Kang, Pengfei Liu, and Helen Meng. 2015. Integrating acoustic and state-transition models for free phone recognition in L2 English speech using multi-distribution deep neural networks.. In SLaTE. 119–124.Google Scholar
Kun Li, Xiaojun Qian, and Helen Meng. 2016. Mispronunciation detection and diagnosis in l2 english speech using multidistribution deep neural networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing 25, 1(2016), 193–207.Google ScholarDigital Library
Alvin M Liberman, Katherine Safford Harris, Howard S Hoffman, and Belver C Griffith. 1957. The discrimination of speech sounds within and across phoneme boundaries.Journal of experimental psychology 54, 5 (1957), 358.Google Scholar
Patsy M Lightbown and Nina Spada. 2000. Do they know what they’re doing? L2 learners’ awareness of L1 influence. Language Awareness 9, 4 (2000), 198–217.Google ScholarCross Ref
Guanhong Liu, Xianghua Ding, Chun Yu, Lan Gao, Xingyu Chi, and Yuanchun Shi. 2019. ” I Bought This for Me to Look More Ordinary” A Study of Blind People Doing Online Shopping. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–11.Google ScholarDigital Library
Pengfei Liu, Ka-Wa Yuen, Wai-Kim Leung, and Helen Meng. 2012. menunciate: Development of a computer-aided pronunciation training system on a cross-platform framework for mobile, speech-enabled application development. In 2012 8th International Symposium on Chinese Spoken Language Processing. IEEE, 170–173.Google ScholarCross Ref
Jingli Lu, Ruili Wang, and Liyanage C De Silva. 2012. Automatic stress exaggeration by prosody modification to assist language learners perceive sentence stress. International journal of speech technology 15, 2 (2012), 87–98.Google ScholarDigital Library
Jingli Lu, Ruili Wang, Liyanage C De Silva, Yang Gao, and Jia Liu. 2010. CASTLE: a computer-assisted stress teaching and learning environment for learners of English as a second language. In Eleventh Annual Conference of the International Speech Communication Association.Google ScholarCross Ref
Michael McAuliffe, Michaela Socolof, Sarah Mihuc, Michael Wagner, and Morgan Sonderegger. 2017. Montreal Forced Aligner: Trainable Text-Speech Alignment Using Kaldi.. In Interspeech, Vol. 2017. 498–502.Google Scholar
Fanbo Meng, Helen Meng, Zhiyong Wu, and Lianhong Cai. 2010. Synthesizing expressive speech to convey focus using a perturbation model for computer-aided pronunciation training. In Second Language Studies: Acquisition, Learning, Education and Technology.Google Scholar
Fanbo Meng, Zhiyong Wu, Jia Jia, Helen Meng, and Lianhong Cai. 2014. Synthesizing English emphatic speech for multimodal corrective feedback in computer-aided pronunciation training. Multimedia tools and applications 73, 1 (2014), 463–489.Google Scholar
Fanbo Meng, Zhiyong Wu, Helen Meng, Jia Jia, and Lianhong Cai. 2012. Hierarchical English emphatic speech synthesis based on HMM with limited training data. In Thirteenth Annual Conference of the International Speech Communication Association.Google ScholarCross Ref
Helen Meng, Yuen Yee Lo, Lan Wang, and Wing Yiu Lau. 2007. Deriving salient learners’ mispronunciations from cross-language phonological comparisons. In 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU). IEEE, 437–442.Google ScholarCross Ref
Richard I Miller. 1990. Major American Higher Education Issues and Challenges in the 1990s. Higher Education Policy Series 9.ERIC.Google Scholar
Joan C Mora and Isabelle Darcy. 2017. The relationship between cognitive control and pronunciation in a second language. Second language pronunciation assessment(2017), 95.Google Scholar
Murray J Munro, Tracey M Derwing, and James E Flege. 1999. Canadians in Alabama: A perceptual study of dialect acquisition in adults. Journal of Phonetics 27, 4 (1999), 385–403.Google ScholarCross Ref
Ambra Neri, Catia Cucchiarini, and Helmer Strik. 2006. ASR corrective feedback on pronunciation: Does it really work?(2006).Google Scholar
Ambra Neri, Catia Cucchiarini, Helmer Strik, and Lou Boves. 2002. The pedagogy-technology interface in computer assisted pronunciation training. Computer assisted language learning 15, 5 (2002), 441–467.Google Scholar
Ambra Neri, Ornella Mich, Matteo Gerosa, and Diego Giuliani. 2008. The effectiveness of computer assisted pronunciation training for foreign language learning by children. Computer Assisted Language Learning 21, 5 (2008), 393–408.Google ScholarCross Ref
Yishuang Ning, Zhiyong Wu, Jia Jia, Fanbo Meng, Helen Meng, and Lianhong Cai. 2015. HMM-based emphatic speech synthesis for corrective feedback in computer-aided pronunciation training. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 4934–4938.Google ScholarCross Ref
Richard Ogden. 2017. Introduction to English Phonetics. Edinburgh university press.Google Scholar
Mirian Oliveira, Claudia Bitencourt, Eduardo Teixeira, and Ana Clarissa Santos. 2013. Thematic content analysis: Is there a difference between the support provided by the MAXQDA® and NVivo® software packages. In Proceedings of the 12th European Conference on Research Methods for Business and Management Studies. 304–314.Google Scholar
Marta Ortega and Valerie Hazan. 1999. Enhancing acoustic cues to aid L2 speech perception. In Proceedings of the International Congress of Phonetics Sciences. 117–120.Google Scholar
Martha C Pennington. 1999. Computer-aided pronunciation pedagogy: Promise, limitations, directions. Computer Assisted Language Learning 12, 5 (1999), 427–440.Google ScholarCross Ref
Janet Breckenridge Pierrehumbert. 1980. The phonology and phonetics of English intonation. Ph.D. Dissertation. Massachusetts Institute of Technology.Google Scholar
Linda Polka and Janet F Werker. 1994. Developmental changes in perception of nonnative vowel contrasts.Journal of Experimental Psychology: Human perception and performance 20, 2(1994), 421.Google Scholar
Yi Ren, Yangjun Ruan, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, and Tie-Yan Liu. 2019. Fastspeech: Fast, robust and controllable text to speech. In Advances in Neural Information Processing Systems. 3171–3180.Google Scholar
Tiago Ribeiro and Ana Paiva. 2012. The illusion of robotic life: principles and practices of animation for robots. In Proceedings of the seventh annual ACM/IEEE international conference on Human-Robot Interaction. 383–390.Google ScholarDigital Library
Ellen Ricard. 1986. Beyond Fossilization: A Course in Strategies and Techniques in Pronunciation for Advanced Adult Learners.TESL Canada Journal (1986), 243–253.Google Scholar
Sean Robertson, Cosmin Munteanu, and Gerald Penn. 2018. Designing Pronunciation Learning Tools: The Case for Interactivity against Over-Engineering. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 1–13.Google ScholarDigital Library
Pamela Rogerson-Revell. 2011. English phonology and pronunciation teaching. Bloomsbury Publishing.Google Scholar
Winifred Strange. 1995. Speech perception and linguistic experience: Theoretical and methodological issues.Google Scholar
Winifred Strange, Valerie L Shafer, 2008. Speech perception in second language learners: The re-education of selective perception. Phonology and second language acquisition 36 (2008), 153–192.Google Scholar
Frank Thomas, Ollie Johnston, and Frank Thomas. 1995. The illusion of life: Disney animation. Hyperion New York.Google Scholar
Ingo R Titze and Daniel W Martin. 1998. Principles of voice production.Google Scholar
Nikolai Sergeevich Trubetzkoy. 1969. Principles of phonology.(1969).Google Scholar
Ganna Veselovska. 2016. Teaching elements of English RP connected speech and CALL: Phonemic assimilation. Education and Information Technologies 21, 5 (2016), 1387–1400.Google ScholarCross Ref
Amy B Wohlert and Vicki L Hammen. 2000. Lip muscle activity related to speech rate and loudness. Journal of Speech, Language, and Hearing Research 43, 5 (2000), 1229–1239.Google ScholarCross Ref
Ka-Ho Wong, Wai-Kim Leung, Wai-Kit Lo, and Helen Meng. 2010. Development of an articulatory visual-speech synthesizer to support language learning. In 2010 7th International Symposium on Chinese Spoken Language Processing. IEEE, 139–143.Google ScholarCross Ref
Ka-Wa Yuen, Wai-Kim Leung, Peng-fei Liu, Ka-Ho Wong, Xiao-jun Qian, Wai-Kit Lo, and Helen Meng. 2011. Enunciate: An internet-accessible computer-aided pronunciation training system and related user evaluations. In 2011 International Conference on Speech Database and Assessments (Oriental COCOSDA). IEEE, 85–90.Google ScholarCross Ref
Fan-Gang Zeng, Kristina M Martino, Fred H Linthicum, and Sigfrid D Soli. 2000. Auditory perception in vestibular neurectomy subjects. Hearing research 142, 1-2 (2000), 102–112.Google Scholar
Junhong Zhao, Hua Yuan, Wai-Kim Leung, Helen Meng, Jia Liu, and Shanhong Xia. 2013. Audiovisual synthesis of exaggerated speech for corrective feedback in computer-assisted pronunciation training. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 8218–8222.Google ScholarCross Ref
Hang Zhou, Yu Liu, Ziwei Liu, Ping Luo, and Xiaogang Wang. 2019. Talking Face Generation by Adversarially Disentangled Audio-Visual Representation. In AAAI Conference on Artificial Intelligence (AAAI).Google ScholarDigital Library
Yang Zhou, Xintong Han, Eli Shechtman, Jose Echevarria, Evangelos Kalogerakis, and Dingzeyu Li. 2020. MakeItTalk: Speaker-Aware Talking-Head Animation. ACM Transactions on Graphics 39, 6 (2020).Google ScholarDigital Library

Index Terms

PTeacher: a Computer-Aided Personalized Pronunciation Training System with Exaggerated Audio-Visual Corrective Feedback

Index terms have been assigned to the content through auto-classification.

Recommendations

Visual-speech Synthesis of Exaggerated Corrective Feedback
MM '20: Proceedings of the 28th ACM International Conference on Multimedia

To provide more discriminative feedback for the second language (L2) learners to better identify their mispronunciation, we propose a method for exaggerated visual-speech feedback in computer-assisted pronunciation training (CAPT). The speech ...
Read More
Synthesizing English emphatic speech for multimodal corrective feedback in computer-aided pronunciation training

Emphasis plays an important role in expressive speech synthesis in highlighting the focus of an utterance to draw the attention of the listener. We present a hidden Markov model (HMM)-based emphatic speech synthesis model. The ultimate objective is to ...
Read More
Foreign accent conversion in computer assisted pronunciation training

Learners of a second language practice their pronunciation by listening to and imitating utterances from native speakers. Recent research has shown that choosing a well-matched native speaker to imitate can have a positive impact on pronunciation ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems
May 2021
10862 pages
ISBN:9781450380966
DOI:10.1145/3411764
General Chairs:
Yoshifumi Kitamura
Tohoku University, Japan
,
Aaron Quigley
University of New South Wales, Australia
,
Program Chairs:
Katherine Isbister
University of California Santa Cruz, USA
,
Takeo Igarashi
The University of Tokyo, Japan
,
Publications Chairs:
Pernille Bjørn
University of Copenhagen, Denmark
,
Steven Drucker
Microsoft Research, USA
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 May 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Audio-Visual Corrective Feedback
Computer-Aided Pronunciation Training System
Exaggerated feedback
Language Learning
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate6,199of26,314submissions,24%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 541
  Total Downloads
- Downloads (Last 12 months)131
- Downloads (Last 6 weeks)21
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

PTeacher: a Computer-Aided Personalized Pronunciation Training System with Exaggerated Audio-Visual Corrective Feedback

CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems

ABSTRACT

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Visual-speech Synthesis of Exaggerated Corrective Feedback

Synthesizing English emphatic speech for multimodal corrective feedback in computer-aided pronunciation training

Foreign accent conversion in computer assisted pronunciation training