skip to main content
10.1145/3560905.3568518acmconferencesArticle/Chapter ViewAbstractPublication PagessensysConference Proceedingsconference-collections
research-article

Push the Limit of Adversarial Example Attack on Speaker Recognition in Physical Domain

Published:24 January 2023Publication History

ABSTRACT

The integration of deep learning on Speaker Recognition (SR) advances its development and wide deployment, but also introduces the emerging threat of adversarial examples. However, only a few existing studies investigate its practical threat in physical domain, which either evaluate its feasibility only by directly replaying generated adversarial examples, or explore the partial channel interference for robustness improvement. In this paper, we propose a physical adversarial example attack, PhyTalker, which could generate and inject perturbations on voices in a live-streaming manner on attacking various SR models in different physical channels. Compared with the typical adversarial example for digital attacks, PhyTalker generates a subphoneme-level perturbation dictionary to decouple the perturbation optimization and injection. Moreover, we introduce the channel augmentation to compensate both device and environmental distortions, as well as model ensemble to improve the perturbation transferability. Finally, PhyTalker recognizes and localizes the latest recorded phoneme to determine the corresponding perturbations for real-time broadcasting. Extensive experiments are conducted with a large-scale corpus in real physical scenarios, and results show that PhyTalker achieves an overall Attack Success Rate (ASR) of 85.5% in attacking mainstream SR systems and Mel Cepstral Distortion (MCD) of 2.45dB in human audibility.

References

  1. FAKEBOB adversarial attack, Tom Dorr, Golfer Chen, and Pengfei Gao. 2019. FAKEBOB. https://github.com/FAKEBOB-adversarial-attack/FAKEBOB.Google ScholarGoogle Scholar
  2. Amazon Help & Customer Service. 2022. What Is Alexa Voice ID? https://www.amazon.com/gp/help/customer/display.html?nodeId=202199440.Google ScholarGoogle Scholar
  3. Apple. 2022. Apple Siri. https://www.apple.com/sg/siri/.Google ScholarGoogle Scholar
  4. Mathieu Bernard and Hadrien Titeux. 2021. Phonemizer: Text to Phones Transcription for Multiple Languages in Python. Journal of Open Source Software 6, 68 (2021), 3958. Google ScholarGoogle ScholarCross RefCross Ref
  5. Raghav Bharadwaj. 2019. Voice and Speech Recognition in Banking - What's Possible Today. https://emerj.com/ai-sector-overviews/voice-speech-recognition-banking/.Google ScholarGoogle Scholar
  6. Frédéric Bimbot, Jean-François Bonastre, Corinne Fredouille, Guillaume Gravier, Ivan Magrin-Chagnolleau, Sylvain Meignier, Téva Merlin, Javier Ortega-Garcia, Dijana Petrovska-Delacrétaz, and Douglas A. Reynolds. 2004. A Tutorial on Text-Independent Speaker Verification. EURASIP J. Adv. Signal Process. 2004, 4 (2004), 430--451.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Nicholas Carlini and David A. Wagner. 2018. Audio Adversarial Examples: Targeted Attacks on Speech-to-Text. In Proceedings of SP Workshops. IEEE Computer Society, San Francisco, CA, USA, 1--7.Google ScholarGoogle Scholar
  8. Guangke Chen, Sen Chen, Lingling Fan, Xiaoning Du, Zhe Zhao, Fu Song, and Yang Liu. 2021. Who is Real Bob? Adversarial Attacks on Speaker Recognition Systems. In Proceedings of SP. IEEE, Los Alamitos, CA, USA, 55--72.Google ScholarGoogle ScholarCross RefCross Ref
  9. Meng Chen, Li Lu, Zhongjie Ba, and Kui Ren. 2022. PhoneyTalker: An Out-of-the-Box Toolkit for Adversarial Example Attack on Speaker Recognition. In Proceedings of INFOCOM. IEEE, Virtual Event, 1419--1428.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Tao Chen, Longfei Shangguan, Zhenjiang Li, and Kyle Jamieson. 2020. Meta-morph: Injecting Inaudible Commands into Over-the-air Voice Controlled Systems. In Proceedings of NDSS. The Internet Society, San Diego, California, USA.Google ScholarGoogle Scholar
  11. Yuxuan Chen, Xuejing Yuan, Jiangshan Zhang, Yue Zhao, Shengzhi Zhang, Kai Chen, and XiaoFeng Wang. 2020. Devil's Whisper: A General Approach for Physical Adversarial Attacks against Commercial Black-box Speech Recognition Devices. In Proceedings of USENIX Security Symposium. USENIX Association, 2667--2684.Google ScholarGoogle Scholar
  12. Mia Chiquier, Chengzhi Mao, and Carl Vondrick. 2022. Real-Time Neural Voice Camouflage. In Proceedings of ICLR. OpenReview.net, Virtual Event.Google ScholarGoogle Scholar
  13. F. A. Rezaur Rahman Chowdhury, Quan Wang, Ignacio Lopez-Moreno, and Li Wan. 2018. Attention-Based Models for Text-Dependent Speaker Verification. In Proceedings of ICASSP. IEEE, Calgary, AB, Canada, 5359--5363.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Mohammad Esmaeilpour, Patrick Cardinal, and Alessandro Lameiras Koerich. 2021. Class-Conditional Defense GAN Against End-To-End Speech Attacks. In Proceedings of ICASSP. IEEE, Toronto, ON, Canada, 2565--2569.Google ScholarGoogle ScholarCross RefCross Ref
  15. Chao Gao, Guruprasad Saikumar, Amit Srivastava, and Premkumar Natarajan. 2011. Open-set speaker identification in broadcast news. In Proceedings of ICASSP. IEEE, Prague, Czech Republic, 5280--5283.Google ScholarGoogle ScholarCross RefCross Ref
  16. Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and Harnessing Adversarial Examples. In Proceedings of ICLR. OpenReview.net, San Diego, CA, USA.Google ScholarGoogle Scholar
  17. Google Assistant Help. 2022. Teach Google Assistant to recognize your voice with Voice Match. https://support.google.com/assistant/answer/9071681.Google ScholarGoogle Scholar
  18. Keita Goto and Nakamasa Inoue. 2020. Quasi-Newton Adversarial Attacks on Speaker Verification Systems. In Proceedings of APSIPA ASC. IEEE, Auckland, New Zealand, 527--531.Google ScholarGoogle Scholar
  19. Alex Graves and Jürgen Schmidhuber. 2005. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks 18, 5--6 (2005), 602--610.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Andrew Ilyas, Logan Engstrom, Anish Athalye, and Jessy Lin. 2018. Black-box Adversarial Attacks with Limited Queries and Information. In Proceedings of ICML, Vol. 80. IEEE, Stockholmsmässan, Stockholm, Sweden, 2142--2151.Google ScholarGoogle Scholar
  21. Md Tamzeed Islam and Shahriar Nirjon. 2021. Sound-Adapter: Multi-Source Domain Adaptation for Acoustic Classification Through Domain Discovery. In Proceedings of IPSN. ACM, Nashville, TN, USA, 176--190.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. ISO. 2009. Measurement of room acoustic parameters-part 1: Performance spaces. Standard. International Organization for Standardization.Google ScholarGoogle Scholar
  23. Arindam Jati, Chin-Cheng Hsu, Monisankha Pal, Raghuveer Peri, Wael AbdAlmageed, and Shrikanth Narayanan. 2021. Adversarial attack and defense strategies for deep speaker recognition systems. Comput. Speech Lang. 68 (2021), 101199.Google ScholarGoogle ScholarCross RefCross Ref
  24. Shreya Khare, Rahul Aralikatte, and Senthil Mani. 2019. Adversarial Black-Box Attacks on Automatic Speech Recognition Systems Using Multi-Objective Evolutionary Optimization. In Proceedings of Interspeech. ISCA, Graz, Austria, 3208--3212.Google ScholarGoogle ScholarCross RefCross Ref
  25. Aldebaro Klautau. 2001. ARPABET and the TIMIT alphabet. (2001).Google ScholarGoogle Scholar
  26. Felix Kreuk, Yossi Adi, Moustapha Cissé, and Joseph Keshet. 2018. Fooling End-To-End Speaker Verification With Adversarial Examples. In Proceedings of ICASSP. IEEE, Calgary, AB, Canada, 1962--1966.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. R. Kubichek. 1993. Mel-cepstral distance measure for objective speech quality assessment. In Proceedings of PACRIM, Vol. 1. IEEE, 125--128.Google ScholarGoogle ScholarCross RefCross Ref
  28. Alexey Kurakin, Ian J. Goodfellow, and Samy Bengio. 2017. Adversarial examples in the physical world. In Proceedings of ICLR. OpenReview.net, Toulon, France.Google ScholarGoogle Scholar
  29. Anthony Larcher, Kong-Aik Lee, Bin Ma, and Haizhou Li. 2014. Text-dependent speaker verification: Classifiers, databases and RSR2015. Speech Commun. 60 (2014), 56--77.Google ScholarGoogle ScholarCross RefCross Ref
  30. Vladimir I. Levenshtein et al. 1966. Binary codes capable of correcting deletions, insertions and reversals. Dokl. Akad. Nauk SSSR (1966).Google ScholarGoogle Scholar
  31. Chao Li, Xiaokong Ma, Bing Jiang, Xiangang Li, Xuewei Zhang, Xiao Liu, Ying Cao, Ajay Kannan, and Zhenyao Zhu. 2017. Deep Speaker: an End-to-End Neural Speaker Embedding System. CoRR abs/1705.02304 (2017).Google ScholarGoogle Scholar
  32. Jiguo Li, Xinfeng Zhang, Chuanmin Jia, Jizheng Xu, Li Zhang, Yue Wang, Siwei Ma, and Wen Gao. 2020. Universal Adversarial Perturbations Generative Network For Speaker Recognition. In Proceedings of ICME. IEEE, London, UK, 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  33. Xu Li, Jinghua Zhong, Xixin Wu, Jianwei Yu, Xunying Liu, and Helen Meng. 2020. Adversarial Attacks on GMM I-Vector Based Speaker Verification Systems. In Proceedings of ICASSP. IEEE, Barcelona, Spain, 6579--6583.Google ScholarGoogle ScholarCross RefCross Ref
  34. Zhuohang Li, Cong Shi, Yi Xie, Jian Liu, Bo Yuan, and Yingying Chen. 2020. Practical Adversarial Attacks Against Speaker Recognition Systems. In Proceedings of HotMobile. ACM, Austin, TX, USA, 9--14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Zhuohang Li, Yi Wu, Jian Liu, Yingying Chen, and Bo Yuan. 2020. AdvPulse: Universal, Synchronization-free, and Targeted Audio Adversarial Attacks via Subsecond Perturbations. In Proceedings of CCS. ACM, Virtual Event, USA, 1121--1134.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Tingting Liu and Shengxiao Guan. 2014. Factor analysis method for text-independent speaker identification. Journal of Software 9, 11 (2014), 2851--2860.Google ScholarGoogle ScholarCross RefCross Ref
  37. Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn Song. 2017. Delving into Transferable Adversarial Examples and Black-box Attacks. In Proceedings of ICLR. OpenReview.net, Toulon, France.Google ScholarGoogle Scholar
  38. Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2018. Towards Deep Learning Models Resistant to Adversarial Attacks. In Proceedings of ICLR. OpenReview.net, Vancouver, BC, Canada.Google ScholarGoogle Scholar
  39. Akhil Mathur, Tianlin Zhang, Sourav Bhattacharya, Petar Velickovic, Leonid Joffe, Nicholas D. Lane, Fahim Kawsar, and Pietro Liò. 2018. Using deep data augmentation training to address software and hardware heterogeneities in wearable and smartphone sensing devices. In Proceedings of IPSN, Luca Mottola, Jie Gao, and Pei Zhang (Eds.). IEEE / ACM, Porto, Portugal, 200--211.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Annamaria Mesaros, Toni Heittola, and Tuomas Virtanen. 2016. TUT database for acoustic scene classification and sound event detection. In Proceedings of EUSIPCO. IEEE, Budapest, Hungary, 1128--1132.Google ScholarGoogle ScholarCross RefCross Ref
  41. Arsha Nagrani, Joon Son Chung, and Andrew Zisserman. 2017. VoxCeleb: A Large-Scale Speaker Identification Dataset. In Processings of Interspeech, Francisco Lacerda (Ed.). ISCA, Stockholm, Sweden, 2616--2620.Google ScholarGoogle Scholar
  42. Paarth Neekhara, Shehzeen Hussain, Prakhar Pandey, Shlomo Dubnov, Julian J. McAuley, and Farinaz Koushanfar. 2019. Universal Adversarial Perturbations for Speech Recognition Systems. In Proceedings of Interspeech. ISCA, Graz, Austria, 481--485.Google ScholarGoogle ScholarCross RefCross Ref
  43. Institute of Telecommunication Sciences. 1996. voice frequency. https:/www.its.bldrdoc.gov/fs-1037/dir-039/_5829.htm.Google ScholarGoogle Scholar
  44. Vassil Panayotov, Guoguo Chen, Daniel Povey, and Sanjeev Khudanpur. 2015. Librispeech: An ASR corpus based on public domain audio books. In Processings of ICASSP. IEEE, South Brisbane, Queensland, Australia, 5206--5210.Google ScholarGoogle Scholar
  45. Krishan Rajaratnam, Kunal Shah, and Jugal Kalita. 2018. Isolated and Ensemble Audio Preprocessing Methods for Detecting Adversarial Examples against Automatic Speech Recognition. In Proceedings of ROCLING. Hsinchu, Taiwan, 16--30.Google ScholarGoogle Scholar
  46. Douglas D. Rife and John Vanderkooy. 1989. Transfer-function measurement with maximum-length sequences. Journal of the Audio Engineering Society 37, 6 (june 1989), 419--444.Google ScholarGoogle Scholar
  47. Lea Schönherr, Thorsten Eisenhofer, Steffen Zeiler, Thorsten Holz, and Dorothea Kolossa. 2020. Imperio: Robust Over-the-Air Adversarial Examples for Automatic Speech Recognition Systems. In Proceedings of ACSAC. ACM, Austin, TX, USA, 843--855.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Seeed. 2018. ReSpeaker Core v2.0. https://wiki.seeedstudio.com/ReSpeaker_Core_v2.0/.Google ScholarGoogle Scholar
  49. David Snyder, Daniel Garcia-Romero, Gregory Sell, Daniel Povey, and Sanjeev Khudanpur. 2018. X-Vectors: Robust DNN Embeddings for Speaker Recognition. In Proceedings of ICASSP. IEEE, Calgary, AB, Canada, 5329--5333.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Guy-Bart Stan, Jean-Jacques Embrechts, and Dominique Archambeau. 2002. Comparison of different impulse response measurement techniques. Journal of the Audio Engineering Society 50, 4 (2002), 249--262.Google ScholarGoogle Scholar
  51. Rohan Taori, Amog Kamsetty, Brenton Chu, and Nikita Vemuri. 2019. Targeted Adversarial Examples for Black Box Audio Systems. In Proceedings of SP Workshops. IEEE, San Francisco, CA, USA, 15--20.Google ScholarGoogle ScholarCross RefCross Ref
  52. Henry Turner, Giulio Lovisotto, and Ivan Martinovic. 2019. Attacking Speaker Recognition Systems with Phoneme Morphing. In Proceedings of ESORICS, Kazue Sako, Steve A. Schneider, and Peter Y. A. Ryan (Eds.), Vol. 11735. Springer, 471--492.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Ehsan Variani, Xin Lei, Erik McDermott, Ignacio Lopez-Moreno, and Javier Gonzalez-Dominguez. 2014. Deep neural networks for small footprint text-dependent speaker verification. In Proceedings of ICASSP. IEEE, Florence, Italy, 4052--4056.Google ScholarGoogle ScholarCross RefCross Ref
  54. Jesús Villalba, Yuekai Zhang, and Najim Dehak. 2020. x-Vectors Meet Adversarial Attacks: Benchmarking Adversarial Robustness in Speaker VeEication. In Proceedings of Interspeech. ISCA, Shanghai, China, 4233--4237.Google ScholarGoogle ScholarCross RefCross Ref
  55. Qing Wang, Pengcheng Guo, and Lei Xie. 2020. Inaudible Adversarial Perturbations for Targeted Attack in Speaker Recognition. In Proceedings of Interspeech. ISCA, Shanghai, China, 4228--4232.Google ScholarGoogle ScholarCross RefCross Ref
  56. WeChat. 2015. Voiceprint: The New WeChat Password. https://blog.wechat.com/2015/05/21/voiceprint-the-new-wechat-password/.Google ScholarGoogle Scholar
  57. WHO. 2019. Advice for the public: Coronavirus disease (COVID-19). https://www.who.int/emergencies/diseases/novel-coronavirus-2019/advice-for-public.Google ScholarGoogle Scholar
  58. Yi Xie, Zhuohang Li, Cong Shi, Jian Liu, Yingying Chen, and Bo Yuan. 2021. Enabling Fast and Universal Audio Adversarial Attack Using Generative Model. In Proceedings of AAAI. AAAI Press, Virtual Event, 14129--14137.Google ScholarGoogle Scholar
  59. Yi Xie, Cong Shi, Zhuohang Li, Jian Liu, Yingying Chen, and Bo Yuan. 2020. Real-Time, Universal, and Robust Adversarial Attacks Against Speaker Recognition Systems. In Proceedings of ICASSP. IEEE, Barcelona, Spain, 1738--1742.Google ScholarGoogle ScholarCross RefCross Ref
  60. Xuejing Yuan, Yuxuan Chen, Yue Zhao, Yunhui Long, Xiaokang Liu, Kai Chen, Shengzhi Zhang, Heqing Huang, Xiaofeng Wang, and Carl A. Gunter. 2018. CommanderSong: A Systematic Approach for Practical Adversarial Voice Recognition. In Proceedings of USENIX Security Symposium. USENIX Association, Baltimore, MD, USA, 49--64.Google ScholarGoogle Scholar
  61. Weiyi Zhang, Shuning Zhao, Le Liu, Jianmin Li, Xingliang Cheng, Thomas Fang Zheng, and Xiaolin Hu. 2021. Attack on Practical Speaker Verification System Using Universal Adversarial Perturbations. In Proceedings of ICASSP. IEEE, Toronto, ON, Canada, 2575--2579.Google ScholarGoogle ScholarCross RefCross Ref
  62. Yuekai Zhang, Ziyan Jiang, Jesús Villalba, and Najim Dehak. 2020. Black-Box Attacks on Spoofing Countermeasures Using Transferability of Adversarial Examples. In Proceedings of Interspeech. ISCA, Shanghai, China, 4238--4242.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Push the Limit of Adversarial Example Attack on Speaker Recognition in Physical Domain

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SenSys '22: Proceedings of the 20th ACM Conference on Embedded Networked Sensor Systems
        November 2022
        1280 pages
        ISBN:9781450398862
        DOI:10.1145/3560905

        Copyright © 2022 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 24 January 2023

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        SenSys '22 Paper Acceptance Rate52of187submissions,28%Overall Acceptance Rate174of867submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader