Skip to main content
Log in

Simultaneous speaker identification and watermarking

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Biometric template protection of speech signals and information hiding in speech signals are two challenging issues. To resolve such limitations and increase the level of security, our objective is to build multi-level security systems based on speech signals. So, speech watermarking is used simultaneously with automatic speaker identification. The speech watermarking is performed to embed images into the speech signals that are used for speaker identification. The watermark is extracted for authentication, and then the effect of watermark removal on the performance of the speaker identification system in the presence of degradations is studied. This paper presents an approach for speech watermarking based on empirical mode decomposition (EMD) in different transform domains and singular value decomposition (SVD). The speech signal is decomposed in different transform domains with EMD to yield zero-mean components called intrinsic mode functions (IMFs). The watermark is inserted into one of these IMF components with SVD. A comparison between different transform domains for implementing the proposed watermarking scheme on different IMFs is presented. The log-likelihood ratio (LLR), correlation coefficient (Cr), signal-to-noise ratio (SNR), and spectral distortion (SD) are used as metrics for the comparison. According to the simulation results, we find that the watermark embedding in the discrete sine transform domain provides higher SNR and Cr values and lower SD and LLR values. The proposed approach is robust to different attacks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  • Bhat, V., Sengupta, I., & Das, A. (2010). An adaptive audio watermarking based on the singular value decomposition in the wavelet domain. Digital Signal Processing, 20(6), 1547–1558.

    Article  Google Scholar 

  • Childers, D. G., Skinner, D. P., & Kemerait, R. C. (1977). The cepstrum: A guide to processing. Proceedings of the IEEE, 65(10), 1428–1443.

    Article  Google Scholar 

  • Cox, I. J., & Miller, M. L. (2002). The first 50 years of electronic watermarking. EURASIP Journal on Advances in Signal Processing, 2, 820936.

    Article  Google Scholar 

  • Evans, N., Mason, J., Liu, W.-M., & Fauve, B. (2006). An assessment on the fundamental limitations of spectral subtraction. In 2006 IEEE international conference on acoustics speech and signal processing proceedings (Vol. 1, pp. I–I).

  • Ghouti, L., Bouridane, A., Ibrahim, M. K., & Boussakta, S. (2006). Digital image watermarking using balanced multiwavelets. IEEE Transactions on Signal Processing, 54(4), 1519–1536.

    Article  Google Scholar 

  • Gupta, S., Jaafar, J., WanAhmad, W. F., & Bansal, A. (2013). Feature extraction using MFCC. Signal & Image Processing: An International Journal (SIPIJ), 4(4), 101–108.

    Google Scholar 

  • Haider, F., Akira, H., Luz, S., Vogel, C., & Campbell, N. (2018). On-talk and off-talk detection: A discrete wavelet transform analysis of electroencephalogram. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 960–964).

  • Hu, H.-T., Lin, S.-J., & Hsu, L.-Y. (2017). Effective blind speech watermarking via adaptive mean modulation and package synchronization in DWT domain. EURASIP Journal on Audio, Speech, and Music Processing, 1, 10.

    Article  Google Scholar 

  • Huang, N. E., Shen, Z., Long, S. R., Wu, M. C., Shih, H. H., Zheng, Q., et al. (1998). The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences, 454(1971), 903–995.

    Article  MathSciNet  Google Scholar 

  • Khaldi, K., Alouane, M.-T., & Boudraa, A.-O. (2010). Voiced speech enhancement based on adaptive filtering of selected intrinsic mode functions. Advances in Adaptive Data Analysis, 2(1), 65–80.

    Article  MathSciNet  Google Scholar 

  • Khaldi, K., & Boudraa, A.-O. (2012). On signals compression by EMD. Electronics Letters, 48(21), 1329–1331.

    Article  Google Scholar 

  • Kim, W.-G., Lee, J. C., & Lee, W. D. (2000). An audio watermarking scheme with hidden signatures. In International conference on signal processing (Vol. 253). Beijing.

  • Kirovski, D., & Malvar, H. (2001). Robust spread-spectrum audio watermarking. In 2001 IEEE international conference on acoustics, speech, and signal processing. Proceedings (Cat. No. 01CH37221) (Vol. 3, pp. 1345–1348).

  • Kubichek, R. (1993). Mel-cepstral distance measure for objective speech quality assessment. In Proceedings of IEEE pacific rim conference on communications computers and signal processing (Vol. 1, pp. 125–128).

  • Lie, W.-N., & Chang, L.-C. (2006). Robust and high-quality time-domain audio watermarking based on low-frequency amplitude modification. IEEE Transactions on Multimedia, 8(1), 46–59.

    Article  Google Scholar 

  • Lu, Z.-M., Xu, D.-G., & Sun, S.-H. (2005). Multipurpose image watermarking algorithm based on multistage vector quantization. IEEE Transactions on Image Processing, 14(6), 822–831.

    Article  Google Scholar 

  • Matam, B. R., & Lowe, D. (2010). Watermarking audio signals for copyright protection using ICA. In A. M. Al-Haj (Ed.), Advanced techniques in multimedia watermarking: Image, video and audio applications (pp. 144–157). Hersey, PA: IGI Global.

    Chapter  Google Scholar 

  • Muda, L., Begam, M., & Elamvazuthi, I. (2010). Voice recognition algorithms using mel frequency cepstral coefficient (MFCC) and dynamic time warping (DTW) techniques. arXiv preprint arXiv:1003.4083.

  • Neville, K. L., & Hussain, Z. M. (2009). Effects of wavelet compression of speech on its Mel-Cepstral coefficients. In International conference on communication, computer and power (ICCCP’09), Muscat, (pp. 387–390).

  • Prochazka, A. N., Kingsbury, G., Payner, P. J. W., & Uhlir, J. (2013). Signal analysis and prediction. Berlin: Springer.

    Google Scholar 

  • Soon, Y., Koh, S. N., & Yeo, C. K. (1998). Noisy speech enhancement using discrete cosine transform. Speech Communication, 24(3), 249–257.

    Article  Google Scholar 

  • Tirumala, S. S., Shahamiri, S. R., Garhwal, A. S., & Wang, R. (2017). Speaker identification features extraction methods: A systematic review. Expert Systems with Applications, 90, 250–271.

    Article  Google Scholar 

  • Wang, S.-B., Liu, X.-Y., Dang, X., & Wang, J.-M. (2017) A robust speech watermarking based on Quantization Index Modulation and Double Discrete Cosine Transform. In 2017 IEEE 10th international congress on image and signal processing, biomedical engineering and informatics (CISP-BMEI) (pp. 1–6).

  • Yang, W., Benbouchta, M., & Yantorno, R. (1998). Performance of the modified bark spectral distortion as an objective speech quality measure. In Proceedings of the 1998 IEEE international conference on acoustics, speech and signal processing, ICASSP’98 (Cat. No. 98CH36181) (Vol. 1, pp. 541–544).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fathi E. Abd El-Samie.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Abd El-Wahab, B.S., El-khobby, H.A., Abd Elnaby, M.M. et al. Simultaneous speaker identification and watermarking. Int J Speech Technol 24, 205–218 (2021). https://doi.org/10.1007/s10772-019-09658-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-019-09658-x

Keywords

Navigation