Behavioral synchronization between speech and finger tapping provides a novel approach to the improvement of speech recognition accuracy. We combine a sequence of finger tapping timings recorded alongside an utterance using two distinct methods: in the first method, HMM state transition probabilities at the word boundaries are controlled by the timing of the finger tapping; in the second, the probability (relative frequency) of the finger tapping is used as a 'feature' and combined with MFCC in a HMM recognition system. We evaluate these methods through connected digit recognition under different noise conditions (AURORA-2J) and LVCSR tasks. Leveraging the synchrony between speech and finger tapping provides a 46 % relative improvement and a 1 % absolute improvement in connected digit recognition experiments and LVCSR experiments, respectively.
Cite as: Ban, H., Miyajima, C., Itou, K., Itakura, F., Takeda, K. (2004) Speech recognition using synchronization between speech and finger tapping. Proc. Interspeech 2004, 2289-2292, doi: 10.21437/Interspeech.2004-678
@inproceedings{ban04_interspeech, author={Hiromitsu Ban and Chiyomi Miyajima and Katsunobu Itou and Fumitada Itakura and Kazuya Takeda}, title={{Speech recognition using synchronization between speech and finger tapping}}, year=2004, booktitle={Proc. Interspeech 2004}, pages={2289--2292}, doi={10.21437/Interspeech.2004-678} }