A HMM recognition of consonant-vowel syllables from lip contours: the cued speech case

Aboutabit, Noureddine; Beautemps, Denis; Clarke, Jeanne; Besacier, Laurent

doi:10.21437/Interspeech.2007-280

A HMM recognition of consonant-vowel syllables from lip contours: the cued speech case

Noureddine Aboutabit, Denis Beautemps, Jeanne Clarke, Laurent Besacier

Cued Speech (CS) is a manual code that complements lip-reading to enhance speech perception from visual input. The phonetic translation of CS gestures needs to combine the manual CS information with information from the lips, taking into account the desynchronization delay (Attina et al. [1], Aboutabit et al. [2]) between these two flows of information. This paper focuses on HMM recognition of the lip flow for Consonant Vowel (CV) syllables in the French Cued Speech production context. The CV syllables are considered in term of viseme groups that are compatible with the CS system. The HMM modeling is based on parameters derived from both the inner and outer lip contours. The global recognition score of CV syllable reaches 80.3%. This study shows that the errors are mainly observed on consonant groups in the context of high and mid-high rounded vowels. In contrast, CV syllables for anterior non rounded vowels ([ a, ε͂, i, œ͂, e, ε]) and for low and mid-low rounded vowels ([ã,ɔ,œ]) are well recognized (in average 87%).

doi: 10.21437/Interspeech.2007-280

Cite as: Aboutabit, N., Beautemps, D., Clarke, J., Besacier, L. (2007) A HMM recognition of consonant-vowel syllables from lip contours: the cued speech case. Proc. Interspeech 2007, 646-649, doi: 10.21437/Interspeech.2007-280

@inproceedings{aboutabit07_interspeech,
  author={Noureddine Aboutabit and Denis Beautemps and Jeanne Clarke and Laurent Besacier},
  title={{A HMM recognition of consonant-vowel syllables from lip contours: the cued speech case}},
  year=2007,
  booktitle={Proc. Interspeech 2007},
  pages={646--649},
  doi={10.21437/Interspeech.2007-280}
}