MagNetO: X-vector Magnitude Estimation Network plus Offset for Improved Speaker Recognition

Garcia-Romero, Daniel; Sell, Greg; Mccree, Alan

doi:10.21437/Odyssey.2020-1

MagNetO: X-vector Magnitude Estimation Network plus Offset for Improved Speaker Recognition

Daniel Garcia-Romero, Greg Sell, Alan Mccree

We present a magnitude estimation network that is combined with a modified ResNet x-vector system to generate embeddings whose inner product is able to produce calibrated scores with increased discrimination. A three-step training procedure is used. First, the network is trained using short segments and a multi-class cross-entropy loss with angular margin softmax. During the second step, only a reduced subset of the DNN parameters are refined using full-length recordings. Finally, the magnitude estimation network is trained using a binary cross-entropy loss over pairs of target and non-target trials. The resulting system is evaluated on 4 widely-used benchmarks and provides significant discrimination and calibration gains at multiple operating points.

doi: 10.21437/Odyssey.2020-1

Cite as: Garcia-Romero, D., Sell, G., Mccree, A. (2020) MagNetO: X-vector Magnitude Estimation Network plus Offset for Improved Speaker Recognition. Proc. The Speaker and Language Recognition Workshop (Odyssey 2020), 1-8, doi: 10.21437/Odyssey.2020-1

@inproceedings{garciaromero20_odyssey,
  author={Daniel Garcia-Romero and Greg Sell and Alan Mccree},
  title={{MagNetO: X-vector Magnitude Estimation Network plus Offset for Improved Speaker Recognition}},
  year=2020,
  booktitle={Proc. The Speaker and Language Recognition Workshop (Odyssey 2020)},
  pages={1--8},
  doi={10.21437/Odyssey.2020-1}
}