research-article

Open Access

An Empirical Study of Artifacts and Security Risks in the Pre-trained Model Supply Chain

Authors:
Wenxin Jiang

Purdue University, West Lafayette, IN, USA

Purdue University, West Lafayette, IN, USA
View Profile

,
Nicholas Synovic

Loyola University Chicago, Chicago, IL, USA

Loyola University Chicago, Chicago, IL, USA
View Profile

,
Rohan Sethi

Loyola University Chicago, Chicago, IL, USA

Loyola University Chicago, Chicago, IL, USA
View Profile

,
Aryan Indarapu

University of Illinois-Urbana Champaign, Champaign, IL, USA

University of Illinois-Urbana Champaign, Champaign, IL, USA
View Profile

,
Matt Hyatt

Loyola University Chicago, Chicago, IL, USA

Loyola University Chicago, Chicago, IL, USA
View Profile

,
Taylor R. Schorlemmer

Purdue University, West Lafayette, IN, USA

Purdue University, West Lafayette, IN, USA
View Profile

,
George K. Thiruvathukal

Loyola University Chicago, Chicago, IL, USA

Loyola University Chicago, Chicago, IL, USA
View Profile

,
James C. Davis

Purdue University, West Lafayette, IN, USA

Purdue University, West Lafayette, IN, USA
View Profile

SCORED'22: Proceedings of the 2022 ACM Workshop on Software Supply Chain Offensive Research and Ecosystem DefensesNovember 2022Pages 105–114https://doi.org/10.1145/3560835.3564547

Published:08 November 2022Publication History

SCORED'22: Proceedings of the 2022 ACM Workshop on Software Supply Chain Offensive Research and Ecosystem Defenses

Pages 105–114

ABSTRACT

Deep neural networks achieve state-of-the-art performance on many tasks, but require increasingly complex architectures and costly training procedures. Engineers can reduce costs by reusing a pre-trained model (PTM) and fine-tuning it for their own tasks. To facilitate software reuse, engineers collaborate around model hubs, collections of PTMs and datasets organized by problem domain. Although model hubs are now comparable in popularity and size to other software ecosystems, the associated PTM supply chain has not yet been examined from a software engineering perspective. We present an empirical study of artifacts and security features in 8 model hubs. We indicate the potential threat models and show that the existing defenses are insufficient for ensuring the security of PTMs. We compare PTM and traditional supply chains, and propose directions for further measurements and tools to increase the reliability of the PTM supply chain.

References

Naveed Akhtar and Ajmal Mian. 2018. Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey. IEEE Access , Vol. 6 (2018), 14410--14430.Google ScholarCross Ref
Vishnu Banna, Akhil Chinnakotla, Zhengxin Yan, Anirudh Vegesana, Naveen Vivek, Kruthi Krishnappa, Wenxin Jiang, Yung-Hsiang Lu, George K. Thiruvathukal, and James C. Davis. 2021. An Experience Report on Machine Learning Reproducibility: Guidance for Practitioners and TensorFlow Model Garden Contributors. http://arxiv.org/abs/2107.00821Google Scholar
Adrien Bibal and Benoît Frénay. 2016. Interpretability of Machine Learning Models and Representations: an Introduction. In European Symposium on Artificial Neural Networks.Google Scholar
Jon M. Boyens, Celia Paulsen, Rama Moorthy, and Nadya Bartol. 2015. Supply Chain Risk Management Practices for Federal Information Systems and Organizations. Technical Report NIST SP 800--161. National Institute of Standards and Technology. https://doi.org/10.6028/NIST.SP.800--161Google Scholar
Houssem Ben Braiek and Foutse Khomh. 2020. On testing machine learning programs. Journal of Systems and Software (JSS) , Vol. 164 (2020), 110542.Google ScholarCross Ref
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. Technical Report arXiv:2005.14165. arXiv. http://arxiv.org/abs/2005.14165Google Scholar
Anirban Chakraborty, Manaar Alam, Vishal Dey, Anupam Chattopadhyay, and Debdeep Mukhopadhyay. 2018. Adversarial Attacks and Defences: A Survey. https://arxiv.org/abs/1810.00069Google Scholar
Cisco. 2022. ClamAV. https://www.clamav.net/Google Scholar
Antonia Creswell, Tom White, Vincent Dumoulin, Kai Arulkumaran, Biswa Sengupta, and Anil A. Bharath. 2018. Generative Adversarial Networks: An Overview. IEEE Signal Processing Magazine , Vol. 35 (2018), 53--65. https://doi.org/10.1109/MSP.2017.2765202Google ScholarCross Ref
Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Shijin Wang, and Guoping Hu. 2020. Revisiting Pre-Trained Models for Chinese Natural Language Processing. http://arxiv.org/abs/2004.13922Google Scholar
James C Davis, Christy A Coghlan, Francisco Servant, and Dongyoon Lee. 2018. The impact of regular expression denial of service (ReDoS) in practice: an empirical study at the ecosystem scale. In Proceedings of the 2018 26th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering (ESEC/FSE). 246--256. https://dl.acm.org/doi/10.1145/3236024.3236027Google ScholarDigital Library
Gabriel V. de la Cruz, Yunshu Du, and Matthew E. Taylor. 2019. Pre-training with Non-expert Human Demonstration for Deep Reinforcement Learning. The Knowledge Engineering Review , Vol. 34 (2019). https://doi.org/10.1017/S0269888919000055Google ScholarCross Ref
Alexandre Decan, Tom Mens, and Eleni Constantinou. 2018. On the impact of security vulnerabilities in the npm package dependency network. In International Conference on Mining Software Repositories (MSR). 181--191. https://doi.org/10.1145/3196398.3196401Google ScholarDigital Library
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Technical Report arXiv:1810.04805. arXiv. http://arxiv.org/abs/1810.04805Google Scholar
Finale Doshi-Velez and Been Kim. 2017. Towards A Rigorous Science of Interpretable Machine Learning. https://arxiv.org/abs/1702.08608Google Scholar
Min Du, Feifei Li, Guineng Zheng, and Vivek Srikumar. 2017. DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning. In ACM SIGSAC Conference on Computer and Communications Security (CCS). Association for Computing Machinery, Dallas, Texas, USA, 1285--1298. https://doi.org/10.1145/3133956.3134015Google ScholarDigital Library
Shixian Du, Tianbo Lu, Lingling Zhao, Bing Xu, Xiaobo Guo, and Hongyu Yang. 2013. Towards An Analysis of Software Supply Chain Risk Management. In Proceedings of the World Congress on Engineering and Computer Science, Vol. 1.Google Scholar
Parijat Dube, Bishwaranjan Bhattacharjee, Siyu Huo, Patrick Watson, and Brian Belgodere. 2019. Automatic Labeling of Data for Transfer Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops. 122--129.Google Scholar
Hugging Face. 2021. Hugging Face -- The AI community building the future. https://huggingface.co/Google Scholar
Hugging Face. 2022a. Announcing Evaluation on the Hub. https://huggingface.co/blog/eval-on-the-hubGoogle Scholar
Hugging Face. 2022b. Hugging Face Hub documentation. https://huggingface.co/docs/hub/indexGoogle Scholar
Jon Fingas. 2022. AI trained on 4chan's most hateful board is just as toxic as you'd expect. https://www.engadget.com/ai-bot-4chan-hate-machine-162550734.htmlGoogle Scholar
Shannon Flynn. 2020. Artificial Intelligence Bias Affects Everyone - Even You. https://rehack.com/iot/artificial-intelligence-bias/Google Scholar
Hironobu Fujiyoshi, Tsubasa Hirakawa, and Takayoshi Yamashita. 2019. Deep learning-based image recognition for autonomous driving. IATSS Research , Vol. 43 (2019), 244--252.Google ScholarCross Ref
Joshua Garcia, Yang Feng, Junjie Shen, Sumaya Almanee, Yuan Xia, and and Qi Alfred Chen. 2020. A comprehensive study of autonomous vehicle bugs. In International Conference on Software Engineering (ICSE). IEEE, Seoul, Korea (South), 385--396. https://dl.acm.org/doi/10.1145/3377811.3380397Google ScholarDigital Library
Micah Goldblum, Dimitris Tsipras, Chulin Xie, Xinyun Chen, Avi Schwarzschild, Dawn Song, Aleksander Madry, Bo Li, and Tom Goldstein. 2022. Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses. IEEE Transactions on Pattern Analysis and Machine Intelligence (2022).Google Scholar
Nikhil Krishna Gopalakrishna, Dharun Anandayuvaraj, Annan Detti, Forrest Lee Bland, Sazzadur Rahaman, and James C Davis. 2022. “If security is required”: Engineering and Security Practices for Machine Learning-based IoT Devices. In International Workshop on Software Engineering Research & Practices for the Internet of Things (SERP4IoT).Google ScholarDigital Library
Josh Gordon. 2018. Introducing TensorFlow Hub: A Library for Reusable Machine Learning Modules in TensorFlow. https://blog.tensorflow.org/2018/03/introducing-tensorflow-hub-library.html.Google Scholar
Jianping Gou, Baosheng Yu, Stephen J Maybank, and Dacheng Tao. 2021. Knowledge distillation: A survey. , 1789--1819 pages.Google Scholar
Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. 2018. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour. https://arxiv.org/abs/1706.02677Google Scholar
Tianyu Gu, Brendan Dolan-Gavitt, and Siddharth Garg. 2019. BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain. https://doi.org/10.48550/arXiv.1708.06733Google ScholarCross Ref
Qianyu Guo, Sen Chen, Xiaofei Xie, Lei Ma, Qiang Hu, Hongtao Liu, Yang Liu, Jianjun Zhao, and Xiaohong Li. 2019. An empirical study towards characterizing deep learning development and deployment across different frameworks and platforms. In IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 810--822. https://doi.org/10.1109/ASE.2019.00080Google ScholarDigital Library
Ronan Hamon, Henrik Junklewitz, and Ignacio Sanchez. 2020. Robustness and explainability of Artificial Intelligence: from technical to policy solutions. Publications Office of the European Union (2020).Google Scholar
Xu Han, Zhengyan Zhang, Ning Ding, Yuxian Gu, Xiao Liu, Yuqi Huo, Jiezhong Qiu, Yuan Yao, Ao Zhang, Liang Zhang, Wentao Han, Minlie Huang, Qin Jin, Yanyan Lan, Yang Liu, Zhiyuan Liu, Zhiwu Lu, Xipeng Qiu, Ruihua Song, Jie Tang, Ji-Rong Wen, Jinhui Yuan, Wayne Xin Zhao, and Jun Zhu. 2021. Pre-trained models: Past, present and future. AI Open , Vol. 2 (2021), 225--250.Google ScholarCross Ref
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In Conference on Computer Vision and Pattern Recognition (CVPR). IEE, 770--778. https://doi.org/10.1109/CVPR.2016.90Google ScholarCross Ref
Cathal Horan. 2020. Can GPT-3 or BERT Ever Understand Language??-The Limits of Deep Learning Language Models. https://neptune.ai/blog/gpt-3-bert-limits-of-deep-learning-language-modelsGoogle Scholar
Ahmed Hosny, Michael Schwier, Christoph Berger, Evin P. Örnek, Mehmet Turan, Phi V. Tran, Leon Weninger, Fabian Isensee, Klaus H. Maier-Hein, Richard McKinley, Michael T. Lu, Udo Hoffmann, Bjoern Menze, Spyridon Bakas, Andriy Fedorov, and Hugo JWL Aerts. 2019. ModelHub. AI: Dissemination Platform for Deep Learning Models. http://arxiv.org/abs/1911.13218Google Scholar
Ling Huang, Anthony D Joseph, Blaine Nelson, Benjamin I P Rubinstein, and J D Tygar. 2011. Adversarial Machine Learning. In ACM workshop on Security and Artificial Intelligence. IEEE, 43--58. https://doi.org/10.1145/2046684.2046692Google ScholarDigital Library
Slinger Jansen and Ewoud Bloemendal. 2013. Defining app stores: The role of curated marketplaces in software ecosystems. In International conference of software business. Springer, Berlin, Heidelberg, 195--206. http://dx.doi.org/10.1007/978--3--642--39336--5_19Google ScholarCross Ref
A. Jeyanthi Suresh and J. Visumathi. 2020. Inception ResNet deep transfer learning model for human action recognition using LSTM. Materials Today: Proceedings (2020).Google Scholar
Yu Koh Jing. 2021. Model Zoo - Deep learning code and pretrained models. https://modelzoo.co/Google Scholar
Andrew Khalel and Motaz El-Saban. 2018. Automatic Pixelwise Object Labeling for Aerial Imagery Using Stacked U-Nets. http://arxiv.org/abs/1803.04953Google Scholar
Yannic Kilcher. 2022. Totally Harmless Model. https://huggingface.co/ykilcher/totally-harmless-modelGoogle Scholar
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems (NeurIPS), Vol. 6. 84--90.Google Scholar
Keita Kurita, Paul Michel, and Graham Neubig. 2020. Weight Poisoning Attacks on Pre-trained Models. Technical Report. arXiv. http://arxiv.org/abs/2004.06660Google Scholar
Computational Imaging and Bioinformatics Lab. 2022. Modelhub. http://modelhub.ai/Google Scholar
Piergiorgio Ladisa, Henrik Plate, Matias Martinez, and Olivier Barais. 2022. Taxonomy of Attacks on Open-Source Software Supply Chains. http://arxiv.org/abs/2204.04008Google Scholar
Yingqi Liu, Shiqing Ma, Yousra Aafer, Wen-Chuan Lee, Juan Zhai, Weihang Wang, and Xiangyu Zhang. 2018. Trojaning Attack on Neural Networks. In Network and Distributed Systems Security (NDSS) Symposium.Google Scholar
Zeyan Liu, Fengjun Li, Zhu Li, and Bo Luo. 2022. LoneNeuron: a Highly-Effective Feature-Domain Neural Trojan Using Invisible and Polymorphic Watermarks. In ACM SIGSAC Conference on Computer and Communications Security. ACM, Los Angeles.Google ScholarDigital Library
Konstantinos Manikas and Klaus Marius Hansen. 2013. Software ecosystems -- A systematic literature review. Journal of Systems and Software (JSS) , Vol. 86 (2013), 1294--1306.Google ScholarDigital Library
Pedro Marcelino. 2022. Transfer learning from pre-trained models. https://towardsdatascience.com/transfer-learning-from-pre-trained-models-f2393f124751Google Scholar
MathWorks. 2022. MATLAB Deep Learning Model Hub. https://www.mathworks.com/solutions/deep-learning.htmlGoogle Scholar
Diego Montes, Pongpatapee Peerapatanapokin, Jeff Schultz, Chengjun Guo, Wenxin Jiang, and James C Davis. 2022. Discrepancies among pre-trained deep neural networks: a new threat to model zoo reliability. In European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE-IVR track). ACM, Singapore. https://doi.org/10.1145/3540250.3560881Google ScholarDigital Library
NPM. 2022. npm. https://www.npmjs.com/Google Scholar
NVIDIA. 2022. NVIDIA NGC: AI Development Catalog. https://catalog.ngc.nvidia.com/Google Scholar
Marc Ohm, Henrik Plate, Arnold Sykosch, and Michael Meier. 2020. Backstabber's Knife Collection: A Review of Open Source Software Supply Chain Attacks. In Detection of Intrusions and Malware, and Vulnerability Assessment, , Clémentine Maurice, Leyla Bilge, Gianluca Stringhini, and Nuno Neves (Eds.). Springer, 23--43.Google Scholar
ONNX. 2022. ONNX Model Zoo. ONNX. https://github.com/onnx/modelsGoogle Scholar
Sinno Jialin Pan and Qiang Yang. 2010. A Survey on Transfer Learning. IEEE Transactions on Knowledge and Data Engineering , Vol. 3 (2010), 1--40. https://doi.org/10.1109/TKDE.2009.191Google ScholarDigital Library
David Patterson, Joseph Gonzalez, Quoc Le, Chen Liang, Lluis-Miquel Munguia, Daniel Rothchild, David So, Maud Texier, and Jeff Dean. 2021. Carbon Emissions and Large Neural Network Training. https://doi.org/10.48550/arXiv.2104.10350Google ScholarCross Ref
Hung Viet Pham. 2020. Problems and Opportunities in Training Deep Learning Software Systems: An Analysis of Variance. In International Conference on Automated Software Engineering (ASE). 771--783. https://doi.org/10.1145/3324884.3416545Google ScholarDigital Library
PyPI. 2022. Python Package Index. https://pypi.orgGoogle Scholar
Pytorch. 2022. PyTorch Hub. https://pytorch.org/hub/Google Scholar
XiPeng Qiu, TianXiang Sun, YiGe Xu, YunFan Shao, Ning Dai, and XuanJing Huang. 2020. Pre-trained models for natural language processing: A survey. Science China Technological Sciences , Vol. 63 (2020), 1872--1897.Google ScholarCross Ref
A. Sai Bharadwaj Reddy and D. Sujitha Juliet. 2019. Transfer Learning with ResNet-50 for Malaria Cell-Image Classification. In International Conference on Communication and Signal Processing (ICCSP). 0945--0949.Google Scholar
Edmar Rezende, Guilherme Ruppert, Tiago Carvalho, Fabio Ramos, and Paulo de Geus. 2017. Malicious Software Classification Using Transfer Learning of ResNet-50 Deep Neural Network. In International Conference on Machine Learning and Applications (ICMLA). 1011--1014.Google Scholar
Aniruddha Saha, Akshayvarun Subramanya, and Hamed Pirsiavash. 2020. Hidden Trigger Backdoor Attacks. Proceedings of the AAAI Conference on Artificial Intelligence , Vol. 34 (2020), 11957--11965. https://doi.org/10.1609/aaai.v34i07.6871Google ScholarCross Ref
Sunandini Sanyal, Sravanti Addepalli, and R Venkatesh Babu. 2022. Towards Data-Free Model Stealing in a Hard Label Setting. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15284--15293.Google ScholarCross Ref
Sebastian Schelter, Felix Biessmann, Tim Januschowski, David Salinas, Stephan Seufert, and Gyuri Szarvas. 2018. On Challenges in Machine Learning Model Management. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering (2018).Google Scholar
John Seymour and Philip Tully. 2016. Weaponizing data science for social engineering: Automated E2E spear phishing on Twitter. Black Hat USA (2016).Google Scholar
Connor Shorten and Taghi M. Khoshgoftaar. 2019. A survey on Image Data Augmentation for Deep Learning. Journal of Big Data (2019).Google Scholar
Jonah Sokipriala. 2021. Prediction of Steering Angle for Autonomous Vehicles Using Pre-Trained Neural Network. European Journal of Engineering and Technology Research (2021).Google Scholar
Chuanqi Tan, Fuchun Sun, Tao Kong, Wenchang Zhang, Chao Yang, and Chunfang Liu. 2018. A Survey on Deep Transfer Learning. IEEE Transactions on knowledge and data engineering , Vol. IEEE Transactions on knowledge and data engineering (2018). http://arxiv.org/abs/1808.01974Google Scholar
Xin Tan, Kai Gao, Minghui Zhou, and Li Zhang. 2022. An exploratory study of deep learning supply chain. In International Conference on Software Engineering (ICSE). Pittsburgh Pennsylvania.Google ScholarDigital Library
Rachael Tatman, Jake Vanderplas, and Sohier Dane. 2018. A Practical Taxonomy of Reproducibility for Machine Learning Research. In Reproducibility in Machine Learning Workshop at ICML.Google Scholar
TensorFlow. 2022. TensorFlow Hub. https://www.tensorflow.org/hubGoogle Scholar
George K Thiruvathukal, Yung-Hsiang Lu, Jaeyoun Kim, Yiran Chen, and Bo Chen. 2022. Low-power Computer Vision: Improve the Efficiency of Artificial Intelligence.Google Scholar
Sebastian Thrun and Lorien Pratt. 1998. Learning to learn: Introduction and overview. In Learning to learn. Springer.Google ScholarDigital Library
Nikolai Philipp Tschacher. 2016. Typosquatting in programming language package managers. Ph.,D. Dissertation. Universit"at Hamburg, Fachbereich Informatik.Google Scholar
Shuo Wang, Surya Nepal, Carsten Rudolph, Marthie Grobler, Shangyu Chen, and Tianle Chen. 2022b. Backdoor Attacks Against Transfer Learning With Pre-Trained Deep Learning Models. IEEE Transactions on Services Computing , Vol. 15, 3 (2022), 1526--1539. https://doi.org/10.1109/TSC.2020.3000900Google ScholarCross Ref
Zhi Wang, Chaoge Liu, and Xiang Cui. 2021. Evilmodel: hiding malware inside of neural network models. , 7 pages.Google Scholar
Zhi Wang, Chaoge Liu, Xiang Cui, Jie Yin, and Xutong Wang. 2022a. EvilModel 2.0: Bringing Neural Network Models into Malware Attacks. Computers & Security (2022). https://doi.org/10.1016/j.cose.2022.102807Google ScholarDigital Library
Jeannette M. Wing. 2021. Trustworthy AI. Commun. ACM (2021).Google Scholar
Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander Rush. 2020. Transformers: State-of-the-Art Natural Language Processing. In Conference on Empirical Methods in Natural Language Processing: System Demonstrations.Google Scholar
Lei Xu, Lin Chen, Zhimin Gao, Yang Lu, and Weidong Shi. 2017. CoC: Secure Supply Chain Management System Based on Public Ledger. In International Conference on Computer Communication and Networks (ICCCN).Google Scholar
Mu Yuan, Lan Zhang, Xiang-Yang Li, and Hui Xiong. 2020. Comprehensive and efficient data labeling via adaptive model scheduling. (2020), 1858--1861.Google Scholar
Nusrat Zahan, Tom Zimmermann, Patrice Godefroid, Brendan Murphy, Chandra Maddila, and Laurie Williams. 2022. What are Weak Links in the npm Supply Chain?. In ICSE 2022. https://www.microsoft.com/en-us/research/publication/what-are-weak-links-in-the-npm-supply-chain/Google Scholar
Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, Todor Mihaylov, Myle Ott, Sam Shleifer, Kurt Shuster, Daniel Simig, Punit Singh Koura, Anjali Sridhar, Tianlu Wang, and Luke Zettlemoyer. 2022. OPT: Open Pre-trained Transformer Language Models. arXiv (2022). http://arxiv.org/abs/2205.01068Google Scholar
Fuzhen Zhuang, Zhiyuan Qi, Keyu Duan, Dongbo Xi, Yongchun Zhu, Hengshu Zhu, Hui Xiong, and Qing He. 2020. A Comprehensive Survey on Transfer Learning. arxiv: 1911.02685 https://arxiv.org/abs/1911.02685Google Scholar
Markus Zimmermann, Cristian-Alexandru Staicu, and Michael Pradel. 2019. Small World with High Risks: A Study of Security Threats in the npm Ecosystem. In USENIX Security Symposium. https://doi.org/10.5555/3361338.3361407 ioGoogle ScholarDigital Library

Index Terms

An Empirical Study of Artifacts and Security Risks in the Pre-trained Model Supply Chain
1. Computing methodologies
  1. Machine learning
2. General and reference
  1. Cross-computing tools and techniques
    1. Empirical studies

Recommendations

SoK: Analysis of Software Supply Chain Security by Establishing Secure Design Properties
SCORED'22: Proceedings of the 2022 ACM Workshop on Software Supply Chain Offensive Research and Ecosystem Defenses

This paper systematizes knowledge about secure software supply chain patterns. It identifies four stages of a software supply chain attack and proposes three security properties crucial for a secured supply chain: transparency, validity, and separation. ...
Read More
An Empirical Study of Pre-Trained Model Reuse in the Hugging Face Deep Learning Model Registry
ICSE '23: Proceedings of the 45th International Conference on Software Engineering

Deep Neural Networks (DNNs) are being adopted as components in software systems. Creating and specializing DNNs from scratch has grown increasingly difficult as state-of-the-art architectures grow more complex. Following the path of traditional ...
Read More
An Empirical Study on Using Large Language Models to Analyze Software Supply Chain Security Failures
SCORED '23: Proceedings of the 2023 Workshop on Software Supply Chain Offensive Research and Ecosystem Defenses

As we increasingly depend on software systems, the consequences of breaches in the software supply chain become more severe. High-profile cyber attacks like SolarWinds and ShadowHammer have resulted in significant financial and data losses, underlining ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SCORED'22: Proceedings of the 2022 ACM Workshop on Software Supply Chain Offensive Research and Ecosystem Defenses
November 2022
121 pages
ISBN:9781450398855
DOI:10.1145/3560835
Program Chairs:
Santiago Torres-Arias
Purdue University, USA
,
Marcela Melara
Intel Corporation, USA
,
Laurent Simon
Google Inc., USA
Copyright © 2022 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 November 2022
Check for updates
Author Tags
deep neural networks
empirical software engineering
machine learning
model hubs
software reuse
software supply chain
Qualifiers
- research-article
Conference
Upcoming Conference
CCS '24

Sponsor:

sigsac

ACM SIGSAC Conference on Computer and Communications Security

October 14 - 18, 2024

Salt Lake City , UT , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 5
  Total Citations
  View Citations
- 619
  Total Downloads
- Downloads (Last 12 months)479
- Downloads (Last 6 weeks)81
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

An Empirical Study of Artifacts and Security Risks in the Pre-trained Model Supply Chain

SCORED'22: Proceedings of the 2022 ACM Workshop on Software Supply Chain Offensive Research and Ecosystem Defenses

ABSTRACT

References

Cited By

Index Terms

Recommendations

SoK: Analysis of Software Supply Chain Security by Establishing Secure Design Properties

An Empirical Study of Pre-Trained Model Reuse in the Hugging Face Deep Learning Model Registry

An Empirical Study on Using Large Language Models to Analyze Software Supply Chain Security Failures

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

An Empirical Study of Artifacts and Security Risks in the Pre-trained Model Supply Chain

SCORED'22: Proceedings of the 2022 ACM Workshop on Software Supply Chain Offensive Research and Ecosystem Defenses

ABSTRACT

References

Cited By

Index Terms

Recommendations

SoK: Analysis of Software Supply Chain Security by Establishing Secure Design Properties

An Empirical Study of Pre-Trained Model Reuse in the Hugging Face Deep Learning Model Registry

An Empirical Study on Using Large Language Models to Analyze Software Supply Chain Security Failures

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media