ISCA Archive Interspeech 2014
ISCA Archive Interspeech 2014

Adaptation of deep neural network acoustic models using factorised i-vectors

Penny Karanasou, Yongqiang Wang, Mark J. F. Gales, Philip C. Woodland

The use of deep neural networks (DNNs) in a hybrid configuration is becoming increasingly popular and successful for speech recognition. One issue with these systems is how to efficiently adapt them to reflect an individual speaker or noise condition. Recently speaker i-vectors have been successfully used as an additional input feature for unsupervised speaker adaptation. In this work the use of i-vectors for adaptation is extended to incorporate acoustic factorisation. In particular, separate i-vectors are computed to represent speaker and acoustic environment. By ensuring “orthogonality” between the individual factor representations it is possible to represent a wide range of speaker and environment pairs by simply combining i-vectors from a particular speaker and a particular environment. In this paper the i-vectors are viewed as the weights of a cluster adaptive training (CAT) system, where the underlying models are GMMs rather than HMMs. This allows the factorisation approaches developed for CAT to be directly applied. Initial experiments were conducted on a noise distorted version of the WSJ corpus. Compared to standard speaker-based i-vector adaptation, factorised i-vectors showed performance gains.


doi: 10.21437/Interspeech.2014-488

Cite as: Karanasou, P., Wang, Y., Gales, M.J.F., Woodland, P.C. (2014) Adaptation of deep neural network acoustic models using factorised i-vectors. Proc. Interspeech 2014, 2180-2184, doi: 10.21437/Interspeech.2014-488

@inproceedings{karanasou14_interspeech,
  author={Penny Karanasou and Yongqiang Wang and Mark J. F. Gales and Philip C. Woodland},
  title={{Adaptation of deep neural network acoustic models using factorised i-vectors}},
  year=2014,
  booktitle={Proc. Interspeech 2014},
  pages={2180--2184},
  doi={10.21437/Interspeech.2014-488}
}