Abstract
The state-of-the-art performance of deep learning models comes at a high cost for companies and institutions, due to the tedious data collection and the heavy processing requirements. Recently, Nagai et al. (Int J Multimed Inf Retr 7(1):3–16, 2018), Uchida et al. (Embedding watermarks into deep neural networks, ICMR, 2017) proposed to watermark convolutional neural networks for image classification, by embedding information into their weights. While this is a clear progress toward model protection, this technique solely allows for extracting the watermark from a network that one accesses locally and entirely. Instead, we aim at allowing the extraction of the watermark from a neural network (or any other machine learning model) that is operated remotely, and available through a service API. To this end, we propose to mark the model’s action itself, tweaking slightly its decision frontiers so that a set of specific queries convey the desired information. In the present paper, we formally introduce the problem and propose a novel zero-bit watermarking algorithm that makes use of adversarial model examples. While limiting the loss of performance of the protected model, this algorithm allows subsequent extraction of the watermark using only few queries. We experimented the approach on three neural networks designed for image classification, in the context of MNIST digit recognition task.
Similar content being viewed by others
Notes
“\({\hat{k}}_w+\varepsilon\)” stands for a small modification of the parameters of \({\hat{k}}_w\) that preserves the value of the model, i.e., that does not deteriorate significantly its performance.
Code will be open-sourced on GitHub, upon article acceptance.
This about \(3.5\%\) accuracy drop is also the one tolerated by a recent work on trojaning neural networks [18].
References
Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow I, Harp A, Irving G, Isard M, Jia Y, Jozefowicz R, Kaiser L, Kudlur M, Levenberg J, Mané D, Monga R, Moore S, Murray D, Olah C, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker P, Vanhoucke V, Vasudevan V, Viégas F, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng X (2015) TensorFlow: large-scale machine learning on heterogeneous systems. https://www.tensorflow.org/. Software available from tensorflow.org
Adi Y, Baum C, Cisse M, Pinkas B, Keshet J (2018) Turning your weakness into a strength: watermarking deep neural networks by backdooring. In: 27th \(\{\)USENIX\(\}\) security symposium (\(\{\)USENIX\(\}\) security 18) pp 1615–1631
Braudaway GW, Magerlein KA, Mintzer CF (1996) Color correct digital watermarking of images. United States Patent 5530759
Carlini N, Wagner DA (2018) Audio adversarial examples: targeted attacks on speech-to-text. CoRR arXiv:1801.01944
Chang CY, Su SJ (2005) A neural-network-based robust watermarking scheme. SMC, Santa Monica
Chollet F et al. (2015) Keras. https://keras.io
Davchev T, Korres T, Fotiadis S, Antonopoulos N, Ramamoorthy S (2019) An empirical evaluation of adversarial robustness under transfer learning. In: ICML workshop on understanding and improving generalization in deep learning
Duddu V, Samanta D, Rao DV, Balas VE (2018) Stealing neural networks via timing side channels. CoRR arXiv:1812.11720
Goodfellow IJ, Shlens J, Szegedy C (2015) Explaining and harnessing adversarial examples. In: ICLR
Grosse K, Manoharan P, Papernot N, Backes M, McDaniel PD (2017) On the (statistical) detection of adversarial examples. CoRR arXiv:1702.06280
Guo J, Potkonjak M (2018) Watermarking deep neural networks for embedded systems. In: 2018 IEEE/ACM international conference on computer-aided design (ICCAD), pp 1–8. https://doi.org/10.1145/3240765.3240862
Hartung F, Kutter M (1999) Multimedia watermarking techniques. Proc IEEE 87(7):1079–1107. https://doi.org/10.1109/5.771066
Le QV, Jaitly N, Hinton GE (2015) A simple way to initialize recurrent networks of rectified linear units. CoRR arXiv:1504.00941
Le Merrer E, Perez P, Trédan G (2017) Adversarial frontier stitching for remote neural network watermarking. CoRR arXiv:1711.01894
Le Merrer E, Trédan G (2019) Tampernn: efficient tampering detection of deployed neural nets. CoRR arXiv:1903.00317
LeCun Y, Cortes C, Burges CJ (1998) The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist
Li S, Neupane A, Paul S, Song C, Krishnamurthy SV, Roy-Chowdhury AK, Swami A (2018) Adversarial perturbations against real-time video classification systems. CoRR arXiv:1807.00458
Liu Y, Ma S, Aafer Y, Lee WC, Zhai J, Wang W, Zhang X (2017) Trojaning attack on neural networks. NDSS, New York
Moosavi-Dezfooli S, Fawzi A, Fawzi O, Frossard P (2017) Universal adversarial perturbations. In: CVPR
Nagai Y, Uchida Y, Sakazawa S, Satoh S (2018) Digital watermarking for deep neural networks. Int J Multimed Inf Retr 7(1):3–16
Oh SJ, Augustin M, Fritz M, Schiele B (2018) Towards reverse-engineering black-box neural networks. In: International conference on learning representations. https://openreview.net/forum?id=BydjJte0-
Papernot N, Carlini N, Goodfellow I, Feinman R, Faghri F, Matyasko A, Hambardzumyan K, Juang YL, Kurakin A, Sheatsley R, Garg A, Lin YC (2017) cleverhans v2.0.0: an adversarial machine learning library. arXiv preprint arXiv:1610.00768
Papernot N, McDaniel P, Goodfellow I, Jha S, Celik ZB, Swami A (2017) Practical black-box attacks against machine learning. In: ASIA CCS
Papernot N, McDaniel P, Jha S, Fredrikson M, Berkay Celik Z, Swami A (2015) The limitations of deep learning in adversarial settings. arXiv preprint arXiv:1511.07528
Papernot N, McDaniel PD, Jha S, Fredrikson M, Celik ZB, Swami A (2015) The limitations of deep learning in adversarial settings. arXiv preprint arXiv:1511.07528
Rouhani BD, Chen H, Koushanfar F (2018) Deepsigns: A generic watermarking framework for IP protection of deep learning models. CoRR arXiv:1804.00750
Rozsa A, Günther M, Boult TE (2016) Are accuracy and robustness correlated? In: ICMLA
Sethi TS, Kantardzic M (2018) Data driven exploratory attacks on black box classifiers in adversarial domains. Neurocomputing 289:129–143. https://doi.org/10.1016/j.neucom.2018.02.007
Shafahi A, Huang WR, Studer C, Feizi S, Goldstein T (2018) Are adversarial examples inevitable? CoRR arXiv:1809.02104
Shin HC, Roth HR, Gao M, Lu L, Xu Z, Nogues I, Yao J, Mollura D, Summers RM (2016) Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans Med Imaging 35(5):1285–1298. https://doi.org/10.1109/TMI.2016.2528162
Tramèr F, Zhang F, Juels A, Reiter MK, Ristenpart T (2016) Stealing machine learning models via prediction apis. In: USENIX security symposium
Tramèr F, Kurakin A, Papernot N, Boneh D, McDaniel P (2017) Ensemble adversarial training: attacks and defenses. arXiv preprint arXiv:1705.07204
Uchida Y, Nagai Y, Sakazawa S, Satoh S (2017) Embedding watermarks into deep neural networks. ICMR
van den Berg E (2016) Some insights into the geometry and training of neural networks. arXiv preprint arXiv:1605.00329
Van Schyndel RG, Tirkel AZ, Osborne CF (1994) A digital watermark. In: Proceedings of 1st international conference on image processing, vol 2. IEEE, pp 86–90
Wang B, Gong NZ (2018) Stealing hyperparameters in machine learning. CoRR arXiv:1802.05351
Yuan X, He P, Zhu Q, Li X (2019) Adversarial examples: attacks and defenses for deep learning. IEEE Transactions on neural networks and learning systems, pp 1–20. https://doi.org/10.1109/TNNLS.2018.2886017
Zhang J, Gu Z, Jang J, Wu H, Stoecklin MP, Huang H, Molloy I (2018) Protecting intellectual property of deep neural networks with watermarking. In: Proceedings of the 2018 on Asia conference on computer and communications security. ACM, pp 159–172
Zhao X, Liu Q, Zheng H, Zhao BY (2015) Towards graph watermarks. In: COSN
Acknowledgements
The authors would like to thank the reviewers for their constructive comments.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Le Merrer, E., Pérez, P. & Trédan, G. Adversarial frontier stitching for remote neural network watermarking. Neural Comput & Applic 32, 9233–9244 (2020). https://doi.org/10.1007/s00521-019-04434-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-019-04434-z