Abstract
Biometric template protection of speech signals and information hiding in speech signals are two challenging issues. To resolve such limitations and increase the level of security, our objective is to build multi-level security systems based on speech signals. So, speech watermarking is used simultaneously with automatic speaker identification. The speech watermarking is performed to embed images into the speech signals that are used for speaker identification. The watermark is extracted for authentication, and then the effect of watermark removal on the performance of the speaker identification system in the presence of degradations is studied. This paper presents an approach for speech watermarking based on empirical mode decomposition (EMD) in different transform domains and singular value decomposition (SVD). The speech signal is decomposed in different transform domains with EMD to yield zero-mean components called intrinsic mode functions (IMFs). The watermark is inserted into one of these IMF components with SVD. A comparison between different transform domains for implementing the proposed watermarking scheme on different IMFs is presented. The log-likelihood ratio (LLR), correlation coefficient (Cr), signal-to-noise ratio (SNR), and spectral distortion (SD) are used as metrics for the comparison. According to the simulation results, we find that the watermark embedding in the discrete sine transform domain provides higher SNR and Cr values and lower SD and LLR values. The proposed approach is robust to different attacks.
Similar content being viewed by others
References
Bhat, V., Sengupta, I., & Das, A. (2010). An adaptive audio watermarking based on the singular value decomposition in the wavelet domain. Digital Signal Processing, 20(6), 1547–1558.
Childers, D. G., Skinner, D. P., & Kemerait, R. C. (1977). The cepstrum: A guide to processing. Proceedings of the IEEE, 65(10), 1428–1443.
Cox, I. J., & Miller, M. L. (2002). The first 50 years of electronic watermarking. EURASIP Journal on Advances in Signal Processing, 2, 820936.
Evans, N., Mason, J., Liu, W.-M., & Fauve, B. (2006). An assessment on the fundamental limitations of spectral subtraction. In 2006 IEEE international conference on acoustics speech and signal processing proceedings (Vol. 1, pp. I–I).
Ghouti, L., Bouridane, A., Ibrahim, M. K., & Boussakta, S. (2006). Digital image watermarking using balanced multiwavelets. IEEE Transactions on Signal Processing, 54(4), 1519–1536.
Gupta, S., Jaafar, J., WanAhmad, W. F., & Bansal, A. (2013). Feature extraction using MFCC. Signal & Image Processing: An International Journal (SIPIJ), 4(4), 101–108.
Haider, F., Akira, H., Luz, S., Vogel, C., & Campbell, N. (2018). On-talk and off-talk detection: A discrete wavelet transform analysis of electroencephalogram. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 960–964).
Hu, H.-T., Lin, S.-J., & Hsu, L.-Y. (2017). Effective blind speech watermarking via adaptive mean modulation and package synchronization in DWT domain. EURASIP Journal on Audio, Speech, and Music Processing, 1, 10.
Huang, N. E., Shen, Z., Long, S. R., Wu, M. C., Shih, H. H., Zheng, Q., et al. (1998). The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences, 454(1971), 903–995.
Khaldi, K., Alouane, M.-T., & Boudraa, A.-O. (2010). Voiced speech enhancement based on adaptive filtering of selected intrinsic mode functions. Advances in Adaptive Data Analysis, 2(1), 65–80.
Khaldi, K., & Boudraa, A.-O. (2012). On signals compression by EMD. Electronics Letters, 48(21), 1329–1331.
Kim, W.-G., Lee, J. C., & Lee, W. D. (2000). An audio watermarking scheme with hidden signatures. In International conference on signal processing (Vol. 253). Beijing.
Kirovski, D., & Malvar, H. (2001). Robust spread-spectrum audio watermarking. In 2001 IEEE international conference on acoustics, speech, and signal processing. Proceedings (Cat. No. 01CH37221) (Vol. 3, pp. 1345–1348).
Kubichek, R. (1993). Mel-cepstral distance measure for objective speech quality assessment. In Proceedings of IEEE pacific rim conference on communications computers and signal processing (Vol. 1, pp. 125–128).
Lie, W.-N., & Chang, L.-C. (2006). Robust and high-quality time-domain audio watermarking based on low-frequency amplitude modification. IEEE Transactions on Multimedia, 8(1), 46–59.
Lu, Z.-M., Xu, D.-G., & Sun, S.-H. (2005). Multipurpose image watermarking algorithm based on multistage vector quantization. IEEE Transactions on Image Processing, 14(6), 822–831.
Matam, B. R., & Lowe, D. (2010). Watermarking audio signals for copyright protection using ICA. In A. M. Al-Haj (Ed.), Advanced techniques in multimedia watermarking: Image, video and audio applications (pp. 144–157). Hersey, PA: IGI Global.
Muda, L., Begam, M., & Elamvazuthi, I. (2010). Voice recognition algorithms using mel frequency cepstral coefficient (MFCC) and dynamic time warping (DTW) techniques. arXiv preprint arXiv:1003.4083.
Neville, K. L., & Hussain, Z. M. (2009). Effects of wavelet compression of speech on its Mel-Cepstral coefficients. In International conference on communication, computer and power (ICCCP’09), Muscat, (pp. 387–390).
Prochazka, A. N., Kingsbury, G., Payner, P. J. W., & Uhlir, J. (2013). Signal analysis and prediction. Berlin: Springer.
Soon, Y., Koh, S. N., & Yeo, C. K. (1998). Noisy speech enhancement using discrete cosine transform. Speech Communication, 24(3), 249–257.
Tirumala, S. S., Shahamiri, S. R., Garhwal, A. S., & Wang, R. (2017). Speaker identification features extraction methods: A systematic review. Expert Systems with Applications, 90, 250–271.
Wang, S.-B., Liu, X.-Y., Dang, X., & Wang, J.-M. (2017) A robust speech watermarking based on Quantization Index Modulation and Double Discrete Cosine Transform. In 2017 IEEE 10th international congress on image and signal processing, biomedical engineering and informatics (CISP-BMEI) (pp. 1–6).
Yang, W., Benbouchta, M., & Yantorno, R. (1998). Performance of the modified bark spectral distortion as an objective speech quality measure. In Proceedings of the 1998 IEEE international conference on acoustics, speech and signal processing, ICASSP’98 (Cat. No. 98CH36181) (Vol. 1, pp. 541–544).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Abd El-Wahab, B.S., El-khobby, H.A., Abd Elnaby, M.M. et al. Simultaneous speaker identification and watermarking. Int J Speech Technol 24, 205–218 (2021). https://doi.org/10.1007/s10772-019-09658-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-019-09658-x