Multi-factor authentication model based on multipurpose speech watermarking and online speaker recognition

Nematollahi, Mohammad Ali; Gamboa-Rosales, Hamurabi; Martinez-Ruiz, Francisco J.; De la Rosa-Vargas, Jose I.; Al-Haddad, S. A. R.; Esmaeilpour, Mansour

doi:10.1007/s11042-016-3350-1

Multi-factor authentication model based on multipurpose speech watermarking and online speaker recognition

Published: 04 March 2016

Volume 76, pages 7251–7281, (2017)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Mohammad Ali Nematollahi³,
Hamurabi Gamboa-Rosales¹,
Francisco J. Martinez-Ruiz¹,
Jose I. De la Rosa-Vargas¹,
S. A. R. Al-Haddad² &
…
Mansour Esmaeilpour³

642 Accesses
12 Citations
Explore all metrics

Abstract

In this paper, a Multi-Factor Authentication (MFA) method is developed by a combination of Personal Identification Number (PIN), One Time Password (OTP), and speaker biometric through the speech watermarks. For this reason, a multipurpose digital speech watermarking applied to embed semi-fragile and robust watermarks simultaneously in the speech signal, respectively to provide tamper detection and proof of ownership. Similarly, the blind semi-fragile speech watermarking technique, Discrete Wavelet Packet Transform (DWPT) and Quantization Index Modulation (QIM) are used to embed the watermark in an angle of the wavelet’s sub-bands where more speaker specific information is available. For copyright protection of the speech, a blind and robust speech watermarking are used by applying DWPT and multiplication. Where less speaker specific information is available the robust watermark is embedded through manipulating the amplitude of the wavelet’s sub-bands. Experimental results on TIMIT, MIT, and MOBIO demonstrate that there is a trade-off among recognition performance of speaker recognition systems, robustness, and capacity which are presented by various triangles. Furthermore, threat model and attack analysis are used to evaluate the feasibility of the developed MFA model. Accordingly, the developed MFA model is able to enhance the security of the systems against spoofing and communication attacks while improving the recognition performance via solving problems and overcoming limitations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Simultaneous speaker identification and watermarking

Article 15 January 2021

Semi-fragile digital speech watermarking for online speaker recognition

Article Open access 21 October 2015

A dual fragile watermarking scheme for speech authentication

Article 08 August 2015

References

Akhaee MA, Kalantari NK, Marvasti F (2009) Robust multiplicative audio and speech watermarking using statistical modeling. In IEEE International Conference on Communications, ICC’09. 2009. IEEE
Akhaee MA, Kalantari NK, Marvasti F (2010) Robust audio and speech watermarking using Gaussian and Laplacian modeling. Signal Process 90(8):2487–2497
Article MATH Google Scholar
Al-Nuaimy W et al (2011) An SVD audio watermarking approach using chaotic encrypted images. Digit Sig Process 21(6):764–779
Article Google Scholar
Baroughi AF, Craver S (2014) Additive attacks on speaker recognition. In IS&T/SPIE Electronic imaging. International Society for Optics and Photonics
Besacier L, Bonastre J-F, Fredouille C (2000) Localization and selection of speaker-specific information with statistical modeling. Speech Comm 31(2):89–106
Article Google Scholar
Bimbot F et al (2004) A tutorial on text-independent speaker verification. EURASIP J Appl Sig Process 2004:430–451
Article Google Scholar
Bolten JB (2003) E-authentication guidance for federal agencies. Office of Management and Budget, http://www.whitehouse.gov/omb/memoranda/fy04/m04-04.pdf. 2003
Brookes M (2006) VOICEBOX: a speech processing toolbox for MATLAB
Chaturvedi A, Mishra D, Mukhopadhyay S (2013) Improved biometric-based three-factor remote user authentication scheme with key agreement using smart card. In Information systems security, Springer, p 63–77
Dehak N et al (2011) Front-end factor analysis for speaker verification. Audio Speech Lang Process IEEE Trans 19(4):788–798
Article Google Scholar
Faundez-Zanuy M, Hagmüller M, Kubin G (2006) Speaker verification security improvement by means of speech watermarking. Speech Comm 48(12):1608–1619
Article MATH Google Scholar
Faundez-Zanuy M, Hagmüller M, Kubin G (2007) Speaker identification security improvement by means of speech watermarking. Pattern Recogn 40(11):3027–3034
Article MATH Google Scholar
Garofolo JS, L.D. Consortium (1993) TIMIT: acoustic-phonetic continuous speech corpus, Linguistic Data Consortium
Hinkley DV (1969) On the ratio of two correlated normal random variables. Biometrika 56(3):635–639
Article MathSciNet MATH Google Scholar
Huber R, Stögner H, Uhl A (2011) Two-factor biometric recognition with integrated tamper-protection watermarking. In Communications and multimedia security, Springer
Hyon S (2012) An investigation of dependencies between frequency components and speaker characteristics based on phoneme mean F-ratio contribution. In Signal & Information Processing Association Annual Summit and Conference (APSIPA ASC), 2012. Asia-Pacific: IEEE
Kenny P (2012) A small foot-print i-vector extractor. In Proc. Odyssey
Khitrov M (2013) Talking passwords: voice biometrics for data access and security. Biom Technol Today 2013(2):9–11
Article Google Scholar
Kim J-J, Hong S-P (2011) A method of risk assessment for multi-factor authentication. J Inf Process Syst (JIPS) 7(1):187–198
Article Google Scholar
Kumar A, Lee HJ (2013) Multi-factor authentication process using more than one token with watermark security. In Future information communication technology and applications, Springer, p 579–587
Li C-T, Hwang M-S (2010) An efficient biometrics-based remote user authentication scheme using smart cards. J Netw Comput Appl 33(1):1–5
Article Google Scholar
Li Q, Memon N, Sencar HT (2006) Security issues in watermarking applications-A deeper look. In Proceedings of the 4th ACM international workshop on Contents protection and security. ACM
Lu X, Dang J (2008) An investigation of dependencies between frequency components and speaker characteristics for text-independent speaker identification. Speech Comm 50(4):312–322
Article Google Scholar
Mallat S (2008) A wavelet tour of signal processing: the sparse way. Academic press
McCool C et al (2012) Bi-modal person recognition on a mobile phone: using mobile phone data. In Multimedia and Expo Workshops (ICMEW), 2012 I.E. International Conference on, IEEE
Mohamed S et al (2013) A method for speech watermarking in speaker verification
Nematollahi MA, Akhaee MA, Al-Haddad SAR, Gamboa-Rosales H (2015) Semi-fragile digital speech watermarking for online speaker recognition. EURASIP J Audio Speech Music Process 2015(1):1–15
Article Google Scholar
Nematollahi MA, Al-Haddad S (2015) Distant speaker recognition: an overview. Int J Humanoid Robot 12(03):1–45
Google Scholar
Nematollahi MA, Gamboa-Rosales H, Akhaee MA, Al-Haddad SAR (2015) Robust digital speech watermarking for online speaker recognition. Mathematical Problems in Engineering, 2015
O’Gorman L (2003) Comparing passwords, tokens, and biometrics for user authentication. Proc IEEE 91(12):2021–2040
Article Google Scholar
Pathak MA, Raj B (2013) Privacy-preserving speaker verification and identification using gaussian mixture models. Audio Speech Lang Process IEEE Trans 21(2):397–406
Article Google Scholar
Reynolds DA (1995) Speaker identification and verification using Gaussian mixture speaker models. Speech Comm 17(1):91–108
Article Google Scholar
Reynolds DA, Quatieri TF, Dunn RB (2000) Speaker verification using adapted Gaussian mixture models. Digit Sig Process 10(1):19–41
Article Google Scholar
Roberts C (2007) Biometric attack vectors and defences. Comput Secur 26(1):14–25
Article Google Scholar
Seyed Omid Sadjadi MS, Heck L (2013) MSR Identity toolbox v1.0: A MATLAB toolbox for speaker recognition research, IEEE
Simon J (2012) DataHash
Woo RH, Park A, Hazen TJ (2006) The MIT mobile device speaker verification corpus: data collection and preliminary experiments. In Speaker and Language Recognition Workshop, IEEE Odyssey 2006: The. 2006. IEEE
Wu Z et al (2015) Spoofing and countermeasures for speaker verification: a survey. Speech Comm 66:130–153
Article Google Scholar

Download references

Acknowledgments

The authors would like to appreciate anonymous reviewers who have made helpful comments on this drafts of this paper.

Author information

Authors and Affiliations

Department of Electronics Engineering, Autonomous University of Zacatecas, 98000, Zacatecas, Zac., Mexico
Hamurabi Gamboa-Rosales, Francisco J. Martinez-Ruiz & Jose I. De la Rosa-Vargas
Department of Computer & Communication Systems Engineering, Faculty of Engineering, University Putra Malaysia, UPM, Serdang, 43400, Selangor Darul Ehsan, Malaysia
S. A. R. Al-Haddad
Computer Engineering Department, Hamedan Branch, Islamic Azad University, Hamedan, Iran
Mohammad Ali Nematollahi & Mansour Esmaeilpour

Authors

Mohammad Ali Nematollahi
View author publications
You can also search for this author in PubMed Google Scholar
Hamurabi Gamboa-Rosales
View author publications
You can also search for this author in PubMed Google Scholar
Francisco J. Martinez-Ruiz
View author publications
You can also search for this author in PubMed Google Scholar
Jose I. De la Rosa-Vargas
View author publications
You can also search for this author in PubMed Google Scholar
S. A. R. Al-Haddad
View author publications
You can also search for this author in PubMed Google Scholar
Mansour Esmaeilpour
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohammad Ali Nematollahi.

Appendix A

Discrete Fourier Transform (DFT) is assumed as Weibull distribution. However, the distribution of the DWPT sub-bands is assumed as a Generalized Gaussian Distribution (GGD) [2]. GGD can be defined as in Eq. (14), if μ ²_s = 0 and σ ²_s are assumed.

$$ {f}_s\left(s;\mu, {\sigma}_s,v\right)=\frac{1}{2\varGamma \left(1+\frac{1}{v}\right)A\left({\sigma}_sv\right)} exp\left\{-{\left|\frac{s-\mu }{A\left({\sigma}_sv\right)}\right|}^v\right\} $$

(14)

where Γ(.) corresponds to Gamma function which is expressed by $ \varGamma (x)={\displaystyle {\int}_0^{\infty }{t}^{x-1}{e}^{-t}dt\cong}\sqrt{2\pi }{x}^{x-\frac{1}{2}}{e}^{-x},v $ corresponds to the shape of the distribution which can be estimated by statistical moment of the signal.

If the watermarked speech signal is passing through AWGN channel, it is possible to formulate the watermarked speech signal at receiver based on Eqs. (15) and (16).

$$ {r}_i=\alpha \times {s}_i+{n}_i\ if\ {m}_i=1 $$

(15)

$$ {r}_i=\frac{1}{\alpha}\times {s}_i+{n}_i\ if\ {m}_i=0 $$

(16)

where n _i corresponds to the amount of noise which is contaminated the watermarked speech signal. To estimate the probability of the watermark bits when it is 1, Eq. (17) is expressed:

$$ \left.R\right|1=\frac{{\displaystyle {\sum}_A}{\left(\alpha \times {s}_i+{n}_i\right)}^4}{{\displaystyle {\sum}_B}{\left({s}_i+{n}_i\right)}^4}\Rightarrow \left.R\right|1=\frac{\alpha^4{\displaystyle {\sum}_A}{s}_i^4+4{\alpha}^3{\displaystyle {\sum}_A}{s}_i^3{n}_i+6{\alpha}^2{\displaystyle {\sum}_A}{s}_i^2{n}_i^2+4\alpha {\displaystyle {\sum}_A}{s}_i{n}_i^3+{\displaystyle {\sum}_A}{n}_i^4}{{\displaystyle {\sum}_B}{s}_i^4+4{\displaystyle {\sum}_B}{s}_i^3{n}_i+6{\displaystyle {\sum}_B}{s}_i^2{n}_i^2+4{\displaystyle {\sum}_B}{s}_i{n}_i^3+{\displaystyle {\sum}_B}{n}_i^4} $$

(17)

As seen, the summation of different parameters in Eq. (17) are affected the amount of the detection threshold. By considering Central Limit Theorem (CLT), there is possible to compute different series in nominator and denominator based on Normal distribution. Due to large value for μ and long length of the speech frames, the Normal distribution is often generated positive numbers which can modeled parameters like ∑_A n ⁴_i which is always positive. Equations (18) and (19) are computed the mean and variance respectively.

$$ E\left\{\sum {s}_i^4\right\}=\sum E\left\{{s}_i^4\right\}=M{\mu}_4 $$

(18)

$$ \begin{array}{l}var\left({\displaystyle \sum {s}_i^4}\right)=E{\left\{\left({\displaystyle \sum \left({s}_i^4-M{\mu}_4\right)}\right)\right\}}^2=E{\left\{\left({\displaystyle \sum \left({s}_i^4-{\mu}_4\right)}\right)\right\}}^2=\hfill \\ {}{\displaystyle \sum E{\left\{\left(\left({s}_i^4-{\mu}_4\right)\right)\right\}}^2}={\displaystyle \sum \left(E\left\{{s}_i^8-{\mu}_4^2\right\}\right)}=M{\mu}_8-M{\mu}_4^2\hfill \end{array} $$

(19)

where M corresponds to the length of each set of A and B. By applying the moment of GGD for r = 4 and r = 8, Eqs. (20) and (21) are estimated.

$$ {\mu}_4=\frac{\sigma_s^4\ \Gamma \left(\frac{1}{v}\right)\ \Gamma \left(\frac{5}{v}\right)\ }{\Gamma^2\left(\frac{3}{v}\right)} $$

(20)

$$ {\mu}_8=\frac{\sigma_s^8\ {\Gamma}^3\left(\frac{1}{v}\right)\ \Gamma \left(\frac{9}{v}\right)\ }{\Gamma^4\left(\frac{3}{v}\right)} $$

(21)

By considering Eqs. (18) and (19), Eq. (22) is formulated.

$$ \sum {s}_i^4\sim \mathcal{N}\left(M{\mu}_4,M{\mu}_8-M{\mu}_4^2\right) $$

(22)

If the mean of the noise is assumed as zero, Eq. (23) can be expressed.

$$ {n}_i\sim \mathcal{N}\left(0,{\sigma}_n^2\right)\ \Rightarrow E\left\{{n}_i^m\right\}=\left\{\begin{array}{ll}0\hfill & for\ m=2k+1\hfill \\ {}\left(m-1\right)\left(m-3\right)\dots \times 1\times {\sigma}_n^m\hfill & for\ m=2k\hfill \end{array}\begin{array}{c}\hfill\ \hfill \\ {}\hfill\ \hfill \end{array}\right. $$

(23)

Then, the Normal distribution of 4 moment noise component can be estimated as in Eq. (24).

$$ {\displaystyle \sum {n}_i^4\sim \mathcal{N}}\left(3M{\sigma}_n^4,96M{\sigma}_n^8\right) $$

(24)

The other parameters in Eq. (17) can be computed from Eq. (25) to (27).

$$ {\displaystyle \sum {s}_i^3{n}_i\sim \mathcal{N}}\left(0,M{\mu}_6{\sigma}_n^2\right)\ \&\ {\mu}_6=\frac{\sigma_s^6\ {\Gamma}^2\left(\frac{1}{v}\right)\ \Gamma \left(\frac{7}{v}\right)\ }{\Gamma^3\left(\frac{3}{v}\right)} $$

(25)

$$ {\displaystyle \sum {s}_i^2{n}_i^2\sim \mathcal{N}}\left(M{\sigma}_s^2{\sigma}_n^2,3M{\mu}_4{\sigma}_n^4-M{\sigma}_s^4{\sigma}_n^4\right) $$

(26)

$$ {\displaystyle \sum {s}_i{n}_i^3\sim \mathcal{N}}\left(0,15M{\sigma}_s^2{\sigma}_n^6\right) $$

(27)

In order to simplify the computation, two free auxiliary parameters p and q are used in Eq. (28). Therefore, R|1,p,q can formulated as in Eq. (29).

$$ p={\displaystyle {\sum}_B{s}_i^4\ \&\ q}=\frac{{\displaystyle {\sum}_A{s}_i^4}}{{\displaystyle {\sum}_B{s}_i^4}} $$

(28)

$$ \left.R\right|1,p,q=\frac{\alpha^4pq+4{\alpha}^3{\displaystyle {\sum}_A{s}_i^3{n}_i+6{\alpha}^2}{\displaystyle {\sum}_A{s}_i^2{n}_i^2+4\alpha }{\displaystyle {\sum}_A{s}_i{n}_i^3+}{\displaystyle {\sum}_A{n}_i^4}}{p+4{\displaystyle {\sum}_B{s}_i^3{n}_i+6}{\displaystyle {\sum}_B{s}_i^2{n}_i^2+4}{\displaystyle {\sum}_B{s}_i{n}_i^3+{\displaystyle {\sum}_B{n}_i^4}}}=\frac{u}{w} $$

(29)

where u and w are defined themselves by Eqs. (30) and (31).

$$ \begin{array}{l}{f}_U(u)\sim \mathcal{N}\left({\alpha}^4pq+6{\alpha}^2M{\sigma}_s^2{\sigma}_n^2+3M{\sigma}_n^4,\ 16{\alpha}^6M{\mu}_6{\sigma}_n^2+36{\alpha}^4\right.\left(3M{\mu}_4{\sigma}_n^4-M{\sigma}_s^4{\sigma}_n^4\right)+16{\alpha}^2\times 15M{\sigma}_s^2{\sigma}_n^6+\hfill \\ {}\left.96M{\sigma}_n^8\right)\hfill \end{array} $$

(30)

$$ {f}_W(w)\sim \mathcal{N}\left(p+6M{\sigma}_s^2{\sigma}_n^2+3M{\sigma}_n^4,\ 16M{\mu}_6{\sigma}_n^2+36\left(3M{\mu}_4{\sigma}_n^4-M{\sigma}_s^4{\sigma}_n^4\right)+16\times 15M{\sigma}_s^2{\sigma}_n^6+96M{\sigma}_n^8\right) $$

(31)

The density of $ \frac{u}{w} $ is computed to estimate the pdf of R|1,p,q. By considering independency and normal distribution for two parameters of u and w, it is possible to express Eq. (32):

$$ {f}_{R\Big|1,p,q}(r)={\displaystyle {\int}_{-\infty}^{\infty}\left|w\right|{f}_{U,W}\left(wr,w\right)\ dw} $$

(32)

Also, if U and W are assumed as normal distribution and independent, then f _U,W(u, w) is formulated as in Eq. (33):

$$ {f}_{U,W}\left(u,w\right)={f}_U(u)\times {f}_W(w) $$

(33)

Equation (34) is closed-form solution for Eq. (31) which has already discussed in literature [14].

$$ D(r)=\frac{b(r)c(r)}{a^3(r)}\ \frac{1}{\sqrt{2\pi }{\sigma}_u{\sigma}_w}\left[2\Phi \left(\frac{b(r)}{a(r)}\right)-1\right]+\frac{1}{a^3(r)\pi {\sigma}_u{\sigma}_w}{e}^{-\frac{1}{2}\left(\frac{\mu_u^2}{\sigma_u^2}+\frac{\mu_w^2}{\sigma_w^2}\right)} $$

(34)

Each parameter in Eq. (34) is defined based on Eqs. (35) to (38):

$$ a(r)=\sqrt{\frac{r^2}{\sigma_u^2}+\frac{1}{\sigma_w^2}} $$

(35)

$$ b(r)=\frac{\mu_u}{\sigma_u^2}r+\frac{\mu_w}{\sigma_w^2} $$

(36)

$$ c(r)= exp\left\{\frac{1}{2}\frac{b^2(r)}{a^2(r)}-\frac{1}{2}\left(\frac{\mu_u^2}{\sigma_u^2}+\frac{\mu_w^2}{\sigma_w^2}\right)\ \right\} $$

(37)

$$ \Phi (r)={\displaystyle {\int}_{-\infty}^r\frac{1}{\sqrt{2\pi }}\ {e}^{-\raisebox{1ex}{$1$}\!\left/ \!\raisebox{-1ex}{$2$}\right.{u}^2}\ du} $$

(38)

As a result, Eq. (39) formulate the density of R|1:

$$ {f}_{R\Big|1}\left(r\Big|1\right)={\displaystyle {\int}_L^U{\displaystyle {\int}_{-\infty}^{\infty }{f}_{R\Big|1,p,q}}}\left(r\Big|1,p,q\right)\ {f}_P(p)\ {f}_Q(q) $$

(39)

The lowest bound and the highest bound are applied to restrict the energy ration between two A and B sets within L and U which is stated as in Eq. (40):

$$ L<\frac{{\displaystyle {\sum}_A{r}_i^4}}{{\displaystyle {\sum}_B{r}_i^4}}<U $$

(40)

Although Eq. (22) is expressed the density of parameter P, Eq. (41) is formulated the density of parameter q based on the ratio between independent and normal distribution.

$$ {f}_Q(q)=\frac{D(q)}{{\displaystyle {\int}_L^UD(q)\ dq}} $$

(41)

With using same manner in Eq. (17), the probability of r|0 is also computable. Therefore, Eq. (42) can estimate the probability of detected error:

$$ {P}_e=\frac{1}{2}{\displaystyle {\int}_T^{\infty }f\left(r\Big|0\right)}\ dr+\frac{1}{2}{\displaystyle {\int}_{-\infty}^Tf\left(r\Big|1\right)}\ dr $$

(42)

The threshold is estimated by minimizing the error as in Eq. (43):

$$ \frac{\partial {P}_e}{\partial T}=0\ \Rightarrow\ {f}_r\left(T\Big|0\right)={f}_r\left(T\Big|1\right) $$

(43)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nematollahi, M.A., Gamboa-Rosales, H., Martinez-Ruiz, F.J. et al. Multi-factor authentication model based on multipurpose speech watermarking and online speaker recognition. Multimed Tools Appl 76, 7251–7281 (2017). https://doi.org/10.1007/s11042-016-3350-1

Download citation

Received: 09 May 2015
Revised: 10 January 2016
Accepted: 09 February 2016
Published: 04 March 2016
Issue Date: March 2017
DOI: https://doi.org/10.1007/s11042-016-3350-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-factor authentication model based on multipurpose speech watermarking and online speaker recognition

Abstract

Access this article

Similar content being viewed by others

Simultaneous speaker identification and watermarking

Semi-fragile digital speech watermarking for online speaker recognition

A dual fragile watermarking scheme for speech authentication

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix A

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-factor authentication model based on multipurpose speech watermarking and online speaker recognition

Abstract

Access this article

Similar content being viewed by others

Simultaneous speaker identification and watermarking

Semi-fragile digital speech watermarking for online speaker recognition

A dual fragile watermarking scheme for speech authentication

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix A

Appendix A

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation