This paper addresses the problem of voice activity detection (VAD) in noisy environments. The VAD method proposed in this paper is based on a statistical model approach, and estimates statistical models sequentially without a priori knowledge of noise. Namely, the proposed method constructs a clean speech / silence state transition model beforehand, and sequentially adapts the model to the noisy environment by using a switching Kalman filter when a signal is observed. The evaluation is carried out by using a VAD evaluation framework, CENSREC-1-C. The evaluation results revealed that the proposed method significantly outperforms the baseline results of CENSREC-1-C as regards VAD accuracy in real environments.
Cite as: Fujimoto, M., Ishizuka, K. (2007) Noise robust voice activity detection based on switching kalman filter. Proc. Interspeech 2007, 2933-2936, doi: 10.21437/Interspeech.2007-731
@inproceedings{fujimoto07_interspeech, author={Masakiyo Fujimoto and Kentaro Ishizuka}, title={{Noise robust voice activity detection based on switching kalman filter}}, year=2007, booktitle={Proc. Interspeech 2007}, pages={2933--2936}, doi={10.21437/Interspeech.2007-731} }