Non-negative matrix factorization based compensation of music for automatic speech recognition

Raj, Bhiksha; Virtanen, Tuomas; Chaudhuri, Sourish; Singh, Rita

doi:10.21437/Interspeech.2010-268

Non-negative matrix factorization based compensation of music for automatic speech recognition

Bhiksha Raj, Tuomas Virtanen, Sourish Chaudhuri, Rita Singh

This paper proposes to use non-negative matrix factorization based speech enhancement in robust automatic recognition of mixtures of speech and music. We represent magnitude spectra of noisy speech signals as the non-negative weighted linear combination of speech and noise spectral basis vectors, that are obtained from training corpora of speech and music. We use overcomplete dictionaries consisting of random exemplars of the training data. The method is tested on the Wall Street Journal large vocabulary speech corpus which is artificially corrupted with polyphonic music from the RWC music database. Various music styles and speech-to-music ratios are evaluated. The proposed methods are shown to produce a consistent, significant improvement on the recognition performance in the comparison with the baseline method.

doi: 10.21437/Interspeech.2010-268

Cite as: Raj, B., Virtanen, T., Chaudhuri, S., Singh, R. (2010) Non-negative matrix factorization based compensation of music for automatic speech recognition. Proc. Interspeech 2010, 717-720, doi: 10.21437/Interspeech.2010-268

@inproceedings{raj10b_interspeech,
  author={Bhiksha Raj and Tuomas Virtanen and Sourish Chaudhuri and Rita Singh},
  title={{Non-negative matrix factorization based compensation of music for automatic speech recognition}},
  year=2010,
  booktitle={Proc. Interspeech 2010},
  pages={717--720},
  doi={10.21437/Interspeech.2010-268}
}