ISCA Archive Interspeech 2015
ISCA Archive Interspeech 2015

Improving automatic forced alignment for dysarthric speech transcription

Yu Ting Yeung, Ka Ho Wong, Helen Meng

Dysarthria is a motor speech disorder due to neurologic deficits. The impaired movement of muscles for speech production leads to disordered speech where utterances have prolonged pause intervals, slow speaking rates, poor articulation of phonemes, syllable deletions, etc. These present challenges towards the use of speech technologies for automatic processing of dysarthric speech data. In order to address these challenges, this work begins by addressing the performance degradation faced in forced alignment. We perform initial alignments to locate long pauses in dysarthric speech and make use of the pause intervals as anchor points. We apply speech recognition for word lattice outputs for recovering the time-stamps of the words in disordered or incomplete pronunciations. By verifying the initial alignments with word lattices, we obtain the reliably aligned segments. These segments provide constraints for new alignment grammars, that can improve alignment and transcription quality. We have applied the proposed strategy to the TORGO corpus and obtained improved alignments for most dysarthric speech data, while maintaining good alignments for non-dysarthric speech data.


doi: 10.21437/Interspeech.2015-619

Cite as: Yeung, Y.T., Wong, K.H., Meng, H. (2015) Improving automatic forced alignment for dysarthric speech transcription. Proc. Interspeech 2015, 2991-2995, doi: 10.21437/Interspeech.2015-619

@inproceedings{yeung15_interspeech,
  author={Yu Ting Yeung and Ka Ho Wong and Helen Meng},
  title={{Improving automatic forced alignment for dysarthric speech transcription}},
  year=2015,
  booktitle={Proc. Interspeech 2015},
  pages={2991--2995},
  doi={10.21437/Interspeech.2015-619}
}