A Mahalanobis Distance Scoring with KISS Metric Learning Algorithm for Speaker Recognition

Lei, Zhenchun; Luo, Jian; Wan, Yanhong; Yang, Yingen

doi:10.1007/978-3-319-25417-3_51

Zhenchun Lei¹⁹,
Jian Luo¹⁹,
Yanhong Wan¹⁹ &
…
Yingen Yang¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9428))

Included in the following conference series:

Chinese Conference on Biometric Recognition

2383 Accesses

Abstract

The cosine similarity scoring is often used in the i-vector model for its computational efficiency and performance in text-independent speaker recognition field. We propose a new Mahalanobis distance scoring with distance metric learning algorithm in this paper. The Mahalanobis metric matrix is learned using the KISS (keep it simple and straightforward!) method, which is motivated by a statistical inference perspective based on a likelihood-ratio test. After whitening and length-normalization, the i-vectors extracted from the development utterances were used to train the metric matrix. Then, the score between the target i-vector and the test i-vector is based on the Mahalanobis distance. The results on NIST 2008 telephone data show that the performance of new scoring is obviously better than the cosine similarity scoring’s.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process 19(4), 788–798 (2011)
Article Google Scholar
Dehak, N., Dehak, R., Glass, J., Reynolds, D., Kenny, P.: Cosine similarity scoring without score normalization techniques. In: Proc. of Odyssey - The Speaker and Language Recognition Workshop, Brno, Czech Republic, 71–75 (2010)
Google Scholar
Brummer, N., Villalba, J., Lleida, E.: Fully bayesian likelihood ratio vs i-vector length normalization in speaker recognition systems. In: NIST SRE Analysis Workshop ( 2011)
Google Scholar
Bousquet, P.-M., Matrouf, D., Bonastre, J.-F.: Intersession compensation and scoring methods in the i-vector space for speaker recognition. In: Proc. of International conference on Speech Communication and Technology (2011)
Google Scholar
Prince, S.J.D.: Probabilistic linear discriminant analysis for inferences about identity. In: Proc. of International Conference on Computer Vision (ICCV), Rio de Janeiro, Brazil (2007)
Google Scholar
Kenny, P.: Bayesian speaker verification with heavy-tailed priors. In: Proc. of Odyssey - The Speaker and Language Recognition Workshop, Czech Republic (2010)
Google Scholar
Fang, X., Dekhak, N., Glass, J.: Bayesian distance metric learning on i-vector for speaker verification. In: INTERSPEECH 2013 – Proceedings of the 14th Annual Conference of the International Speech Communication Association, August 25–29, 2013, Lyon, France, pp. 2514–2518 (2013)
Google Scholar
Xing, E., Ng, A., Jordan, M., Russell, S.: Distance metric learning with application to clustering with side-information. In: Neural Information Processing Systems, pp. 505–512 (2002)
Google Scholar
Kostinger, M., Hirzer, M., Wohlhart, P., Roth, P.M., Bischof, H.: Large scale metric learning from equivalence constraints. In: Proceedings of the 2012 Computer Vision and Pattern Recognition., pp. 2288–2295 (2012)
Google Scholar
Kenny, P.: Joint factor analysis of speaker and session variability: theory and algorithms, Tech. rep., CRIM (2005)
Google Scholar
McLaren, M., van Leeuwen, D.: Source-normalised and weighted LDA for robust speaker recognition using i-vectors. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5456–5459 (2011)
Google Scholar
Garcia-Romero, D., Espy-Wilson, C.Y.: Analysis of i-vector length normalization in speaker recognition systems. In: Annual Conference of the International Speech Communication Association (Interspeech), pp. 249–252 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer and Information Engineering, Jiangxi Normal University, Nanchang, China
Zhenchun Lei, Jian Luo, Yanhong Wan & Yingen Yang

Authors

Zhenchun Lei
View author publications
You can also search for this author in PubMed Google Scholar
Jian Luo
View author publications
You can also search for this author in PubMed Google Scholar
Yanhong Wan
View author publications
You can also search for this author in PubMed Google Scholar
Yingen Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhenchun Lei .

Editor information

Editors and Affiliations

Civil Aviation University of China, Tianjin, China
Jinfeng Yang
and Technology, Tianjin Institute of Science, Tianjin, China
Jucheng Yang
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Zhenan Sun
Chinese Academy of Sciences, Beijing, China
Shiguang Shan
Sun Yat-sen University, Guangzhou, China
Weishi Zheng
Tsinghua University, Beijing, China
Jianjiang Feng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lei, Z., Luo, J., Wan, Y., Yang, Y. (2015). A Mahalanobis Distance Scoring with KISS Metric Learning Algorithm for Speaker Recognition. In: Yang, J., Yang, J., Sun, Z., Shan, S., Zheng, W., Feng, J. (eds) Biometric Recognition. CCBR 2015. Lecture Notes in Computer Science(), vol 9428. Springer, Cham. https://doi.org/10.1007/978-3-319-25417-3_51

Download citation

DOI: https://doi.org/10.1007/978-3-319-25417-3_51
Published: 24 October 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25416-6
Online ISBN: 978-3-319-25417-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics