research-article

Public Access

AdvPulse: Universal, Synchronization-free, and Targeted Audio Adversarial Attacks via Subsecond Perturbations

Authors:
Zhuohang Li

University of Tennessee, Knoxville, Knoxville, TN, USA

University of Tennessee, Knoxville, Knoxville, TN, USA
View Profile

,
Yi Wu

University of Tennessee, Knoxville, Knoxville, TN, USA

University of Tennessee, Knoxville, Knoxville, TN, USA
View Profile

,
Jian Liu

University of Tennessee, Knoxville, Knoxville, TN, USA

University of Tennessee, Knoxville, Knoxville, TN, USA
View Profile

,
Yingying Chen

Rutgers University, New Brunswick, NJ, USA

Rutgers University, New Brunswick, NJ, USA
View Profile

,
Bo Yuan

Rutgers University, New Brunswick, NJ, USA

Rutgers University, New Brunswick, NJ, USA
View Profile

CCS '20: Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications SecurityOctober 2020Pages 1121–1134https://doi.org/10.1145/3372297.3423348

Published:02 November 2020Publication History

CCS '20: Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security

Pages 1121–1134

ABSTRACT

Existing efforts in audio adversarial attacks only focus on the scenarios where an adversary has prior knowledge of the entire speech input so as to generate an adversarial example by aligning and mixing the audio input with corresponding adversarial perturbation. In this work we consider a more practical and challenging attack scenario where the intelligent audio system takes streaming audio inputs (e.g., live human speech) and the adversary can deceive the system by playing adversarial perturbations simultaneously. This change in attack behavior brings great challenges, preventing existing adversarial perturbation generation methods from being applied directly. In practice, (1) the adversary cannot anticipate what the victim will say: the adversary cannot rely on their prior knowledge of the speech signal to guide how to generate adversarial perturbations; and (2) the adversary cannot control when the victim will speak: the synchronization between the adversarial perturbation and the speech cannot be guaranteed. To address these challenges, in this paper we propose AdvPulse, a systematic approach to generate subsecond audio adversarial perturbations, that achieves the capability to alter the recognition results of streaming audio inputs in a targeted and synchronization-free manner. To circumvent the constraints on speech content and time, we exploit penalty-based universal adversarial perturbation generation algorithm and incorporate the varying time delay into the optimization process. We further tailor the adversarial perturbation according to environmental sounds to make it inconspicuous to humans. Additionally, by considering the sources of distortions occurred during the physical playback, we are able to generate more robust audio adversarial perturbations that can remain effective even under over-the-air propagation. Extensive experiments on two representative types of intelligent audio systems (i.e., speaker recognition and speech command recognition) are conducted in various realistic environments. The results show that our attack can achieve an average attack success rate of over 89.6% in indoor environments and 76.0% in inside-vehicle scenarios even with loud engine and road noises.

Supplemental Material

Copy of CCS2020_fpe202_Zhuohang Li - Pat Weeden.mov

mov

285.1 MB

Download

References

Mart'in Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. Tensorflow: A system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI). 265--283.Google Scholar
Hadi Abdullah, Washington Garcia, Christian Peeters, Patrick Traynor, Kevin RB Butler, and Joseph Wilson. 2019. Practical hidden voice attacks against speech and speaker recognition systems. arXiv preprint arXiv:1904.05734 (2019).Google Scholar
Moustafa Alzantot, Bharathan Balaji, and Mani Srivastava. 2017. Did you hear that? adversarial examples against automatic speech recognition. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS).Google Scholar
Amazon. 2020. Amazon Echo. https://www.amazon.com/all-new-Echo/dp/B07R1CXKN7Google Scholar
Anish Athalye, Nicholas Carlini, and David Wagner. 2018. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. arXiv preprint arXiv:1802.00420 (2018).Google Scholar
Anish Athalye, Logan Engstrom, Andrew Ilyas, and Kevin Kwok. 2017. Synthesizing robust adversarial examples. arXiv preprint arXiv:1707.07397 (2017).Google Scholar
Chase Bank. 2019. Security as unique as your voice. https://www.chase.com/personal/voice-biometrics.Google Scholar
Karissa Bell. 2015. A smarter Siri learns to recognize the sound of your voice in iOS 9. https://mashable.com/2015/09/11/hey-siri-voice-recognition/Google Scholar
Nicholas Carlini and David Wagner. 2017. Towards evaluating the robustness of neural networks. In Proceedings of the IEEE Symposium on Security and Privacy (SP). 39--57.Google ScholarCross Ref
Nicholas Carlini and David Wagner. 2018. Audio adversarial examples: Targeted attacks on speech-to-text. In Proceedings of the IEEE Security and Privacy Workshops (SPW). 1--7.Google ScholarCross Ref
Guangke Chen, Sen Chen, Lingling Fan, Xiaoning Du, Zhe Zhao, Fu Song, and Yang Liu. 2019. Who is Real Bob? Adversarial Attacks on Speaker Recognition Systems. arXiv preprint arXiv:1911.01840 (2019).Google Scholar
Tao Chen, Longfei Shangguan, Zhenjiang Li, and Kyle Jamieson. 2020. Metamorph: Injecting Inaudible Commands into Over-the-air Voice Controlled Systems. In Proceedings of the Network and Distributed System Security Symposium (NDSS).Google ScholarCross Ref
Moustapha M Cisse, Yossi Adi, Natalia Neverova, and Joseph Keshet. 2017. Houdini: Fooling deep structured visual and speech recognition models with adversarial examples. In Proceedings of Advances in neural information processing systems (NeurIPS). 6977--6987.Google Scholar
Nilaksh Das, Madhuri Shanbhogue, Shang-Tse Chen, Li Chen, Michael E Kounavis, and Duen Horng Chau. 2018. Adagio: Interactive experimentation with adversarial attack and defense for audio. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 677--681.Google Scholar
Timothy Dozat. 2016. Incorporating nesterov momentum into adam. In International Conference on Learning Representations (ICLR).Google Scholar
John Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive subgradient methods for online learning and stochastic optimization. Journal of machine learning research, Vol. 12, Jul (2011), 2121--2159.Google ScholarDigital Library
Yuan Gong, Boyang Li, Christian Poellabauer, and Yiyu Shi. 2019. Real-time adversarial attacks. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI).Google ScholarCross Ref
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep learning .MIT press.Google ScholarDigital Library
Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014).Google Scholar
Google. 2020 a. Google Home. https://store.google.com/us/product/google_homeGoogle Scholar
Google. 2020 b. Speech-to-text Conversion Powered by Machine Learning. https://cloud.google.com/speech-to-textGoogle Scholar
Alex Graves, Santiago Fernández, Faustino Gomez, and Jürgen Schmidhuber. 2006. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd international conference on Machine learning. 369--376.Google ScholarDigital Library
Chris Hall. 2019. Hey BMW: Your intelligent voice assistant is actually pretty good. https://www.pocket-lint.com/cars/news/bmw/148690-hey-bmw-your-intelligent-voice-assistant-is-actually-pretty-good.Google Scholar
Awni Hannun, Carl Case, Jared Casper, Bryan Catanzaro, Greg Diamos, Erich Elsen, Ryan Prenger, Sanjeev Satheesh, Shubho Sengupta, Adam Coates, et al. 2014. Deep speech: Scaling up end-to-end speech recognition. arXiv preprint arXiv:1412.5567 (2014).Google Scholar
Xuedong Huang. 2017. Microsoft researchers achieve new conversational speech recognition milestone. https://www.microsoft.com/en-us/research/blog/microsoft-researchers-achieve-new-conversational-speech-recognition-milestone/Google Scholar
Marco Jeub, Magnus Schafer, and Peter Vary. 2009. A binaural room impulse response database for the evaluation of dereverberation algorithms. In International Conference on Digital Signal Processing. IEEE, 1--5.Google ScholarCross Ref
Jack Kiefer, Jacob Wolfowitz, et al. 1952. Stochastic estimation of the maximum of a regression function. The Annals of Mathematical Statistics, Vol. 23, 3 (1952), 462--466.Google ScholarCross Ref
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google Scholar
Keisuke Kinoshita, Marc Delcroix, Takuya Yoshioka, Tomohiro Nakatani, Emanuel Habets, Reinhold Haeb-Umbach, Volker Leutnant, Armin Sehr, Walter Kellermann, Roland Maas, et al. 2013. The REVERB challenge: A common evaluation framework for dereverberation and recognition of reverberant speech. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. 1--4.Google ScholarCross Ref
Felix Kreuk, Yossi Adi, Moustapha Cisse, and Joseph Keshet. 2018. Fooling end-to-end speaker verification with adversarial examples. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). 1962--1966.Google ScholarCross Ref
Zhuohang Li, Cong Shi, Yi Xie, Jian Liu, Bo Yuan, and Yingying Chen. 2020. Practical Adversarial Attacks Against Speaker Recognition Systems. In Proceedings of the 21st International Workshop on Mobile Computing Systems and Applications (HotMobile). 9--14.Google ScholarDigital Library
Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. 2017. Universal adversarial perturbations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1765--1773.Google ScholarCross Ref
Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. 2016. Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2574--2582.Google ScholarCross Ref
Satoshi Nakamura, Kazuo Hiyane, Futoshi Asano, Takanobu Nishiura, and Takeshi Yamada. 2000. Acoustical sound database in real environments for sound scene understanding and hands-free speech recognition. In Language Resources and Evaluation Conference. 965--968.Google Scholar
Paarth Neekhara, Shehzeen Hussain, Prakhar Pandey, Shlomo Dubnov, Julian McAuley, and Farinaz Koushanfar. 2019. Universal adversarial perturbations for speech recognition systems. arXiv preprint arXiv:1905.03828 (2019).Google Scholar
Vijayaditya Peddinti, Daniel Povey, and Sanjeev Khudanpur. 2015. A time delay neural network architecture for efficient modeling of long temporal contexts. In Annual Conference of the International Speech Communication Association (INTERSPEECH).Google Scholar
Daniel Povey, Arnab Ghoshal, Gilles Boulianne, Lukas Burget, Ondrej Glembek, Nagendra Goel, Mirko Hannemann, Petr Motlicek, Yanmin Qian, Petr Schwarz, Jan Silovsky, Georg Stemmer, and Karel Vesely. 2011. The Kaldi Speech Recognition Toolkit. In IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU). IEEE Signal Processing Society. IEEE Catalog No.: CFP11SRW-USB.Google Scholar
Yao Qin, Nicholas Carlini, Garrison Cottrell, Ian Goodfellow, and Colin Raffel. 2019. Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition. In Proceedings of the International Conference on Machine Learning (ICLR). 5231--5240.Google Scholar
Tara N Sainath and Carolina Parada. 2015. Convolutional neural networks for small-footprint keyword spotting. In Annual Conference of the International Speech Communication Association (INTERSPEECH).Google Scholar
Samsung. 2020. Unlocks your phone with Bixby Voice. https://www.samsung.com/us/support/answer/ANS00082783/Google Scholar
Lea Schönherr, Katharina Kohls, Steffen Zeiler, Thorsten Holz, and Dorothea Kolossa. 2018. Adversarial attacks against automatic speech recognition systems via psychoacoustic hiding. arXiv preprint arXiv:1808.05665 (2018).Google Scholar
Google Assistant SDK. 2020. Best Practices for Audio. https://developers.google.com/assistant/sdk/guides/service/python/best-practices/audio.Google Scholar
Suwon Shon, Hao Tang, and James Glass. 2018. Frame-level speaker embeddings for text-independent speaker recognition and analysis of end-to-end model. In IEEE Spoken Language Technology Workshop (SLT). IEEE, 1007--1013.Google ScholarCross Ref
David Snyder, Daniel Garcia-Romero, Gregory Sell, Daniel Povey, and Sanjeev Khudanpur. 2018. X-vectors: Robust dnn embeddings for speaker recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). 5329--5333.Google ScholarCross Ref
Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2013. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013).Google Scholar
Tesla. 2020. Voice Commands. https://www.tesla.com/support/voice-commands.Google Scholar
Jon Vadillo and Roberto Santana. 2019. Universal adversarial examples in speech command classification. arXiv preprint arXiv:1911.10182 (2019).Google Scholar
Christophe Veaux, Junichi Yamagishi, Kirsten MacDonald, et al. 2017. CSTR VCTK corpus: English multi-speaker corpus for CSTR voice cloning toolkit. University of Edinburgh. The Centre for Speech Technology Research (CSTR) (2017).Google Scholar
Pete Warden. 2018. Speech commands: A dataset for limited-vocabulary speech recognition. arXiv preprint arXiv:1804.03209 (2018).Google Scholar
Weidi Xie, Arsha Nagrani, Joon Son Chung, and Andrew Zisserman. 2019. Utterance-level aggregation for speaker recognition in the wild. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 5791--5795.Google ScholarCross Ref
Yi Xie, Cong Shi, Zhuohang Li, Jian Liu, Yingying Chen, and Bo Yuan. 2020. Real-time, Universal, and Robust Adversarial Attacks Against Speaker Recognition Systems. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).Google ScholarCross Ref
Hiromu Yakura and Jun Sakuma. 2018. Robust audio adversarial example for a physical attack. arXiv preprint arXiv:1810.11793 (2018).Google Scholar
Zhuolin Yang, Bo Li, Pin-Yu Chen, and Dawn Song. 2018. Characterizing audio adversarial examples using temporal dependency. arXiv preprint arXiv:1809.10875 (2018).Google Scholar
Xuejing Yuan, Yuxuan Chen, Yue Zhao, Yunhui Long, Xiaokang Liu, Kai Chen, Shengzhi Zhang, Heqing Huang, XiaoFeng Wang, and Carl A Gunter. 2018. Commandersong: A systematic approach for practical adversarial voice recognition. In 27th USENIX Security Symposium (USENIX Security 18). 49--64.Google Scholar
Lei Zhang, Yan Meng, Jiahao Yu, Chong Xiang, Brandon Falk, and Haojin Zhu. 2020. Voiceprint Mimicry Attack Towards Speaker Verification System in Smart Home. In Proceedings of the IEEE International Conference on Computer Communications (INFOCOM).Google ScholarDigital Library
Yingke Zhu, Tom Ko, David Snyder, Brian Mak, and Daniel Povey. 2018. Self-Attentive Speaker Embeddings for Text-Independent Speaker Verification. In Interspeech. 3573--3577.Google Scholar

Index Terms

AdvPulse: Universal, Synchronization-free, and Targeted Audio Adversarial Attacks via Subsecond Perturbations
1. Computing methodologies
  1. Machine learning
2. Security and privacy

Recommendations

Robust Detection of Machine-induced Audio Attacks in Intelligent Audio Systems with Microphone Array
CCS '21: Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security

With the popularity of intelligent audio systems in recent years, their vulnerabilities have become an increasing public concern. Existing studies have designed a set of machine-induced audio attacks, such as replay attacks, synthesis attacks, hidden ...
Read More
Double Targeted Universal Adversarial Perturbations
Computer Vision – ACCV 2020
Abstract
Despite their impressive performance, deep neural networks (DNNs) are widely known to be vulnerable to adversarial attacks, which makes it challenging for them to be deployed in security-sensitive applications, such as autonomous driving. Image-...
Read More
Smooth Perturbations for Time Series Adversarial Attacks
Advances in Knowledge Discovery and Data Mining
Abstract
Adversarial attacks represent a threat to every deep neural network. They are particularly effective if they can perturb a given model while remaining undetectable. They have been initially introduced for image classifiers, and are well studied ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CCS '20: Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security
October 2020
2180 pages
ISBN:9781450370899
DOI:10.1145/3372297
General Chairs:
Jay Ligatti
University of South Florida, USA
,
Xinming Ou
University of South Florida, USA
,
Program Chairs:
Jonathan Katz
University of Maryland, USA
,
Giovanni Vigna
University of California-Santa Barbara, USA
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 November 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
audio adversarial attack
intelligent audio system
synchronization-free
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,261of6,999submissions,18%
Upcoming Conference
CCS '24

Sponsor:

sigsac

ACM SIGSAC Conference on Computer and Communications Security

October 14 - 18, 2024

Salt Lake City , UT , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 36
  Total Citations
  View Citations
- 1,951
  Total Downloads
- Downloads (Last 12 months)534
- Downloads (Last 6 weeks)59
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

AdvPulse: Universal, Synchronization-free, and Targeted Audio Adversarial Attacks via Subsecond Perturbations

CCS '20: Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Robust Detection of Machine-induced Audio Attacks in Intelligent Audio Systems with Microphone Array

Double Targeted Universal Adversarial Perturbations

Smooth Perturbations for Time Series Adversarial Attacks

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

AdvPulse: Universal, Synchronization-free, and Targeted Audio Adversarial Attacks via Subsecond Perturbations

CCS '20: Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Robust Detection of Machine-induced Audio Attacks in Intelligent Audio Systems with Microphone Array

Double Targeted Universal Adversarial Perturbations

Smooth Perturbations for Time Series Adversarial Attacks

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media