Abstract
Security Orchestration, Automation, and Response (SOAR) platforms integrate and orchestrate a wide variety of security tools to accelerate the operational activities of Security Operation Center (SOC). Integration of security tools in a SOAR platform is mostly done manually using APIs, plugins, and scripts. SOC teams need to navigate through API calls of different security tools to find a suitable API to define or update an incident response action. Analyzing various types of API documentation with diverse API format and presentation structure involves significant challenges such as data availability, data heterogeneity, and semantic variation for automatic identification of security tool APIs specific to a particular task. Given these challenges can have negative impact on SOC team’s ability to handle security incident effectively and efficiently, we consider it important to devise suitable automated support solutions to address these challenges. We propose a novel learning-based framework for automated security tool
- [1] 2019. NLTK: Categorizing and Tagging Words. Retrieved January 13, 2021 from https://www.nltk.org/book/ch05.html.Google Scholar
- [2] . 2017. PyMISP - Python API. Retrieved from https://pymisp.readthedocs.io/en/latest/.
Accessed March 3, 2021. Google Scholar - [3] . 2016. You get where you’re looking for: The impact of information sources on code security. In Proceedings of the 2016 IEEE Symposium on Security and Privacy (SP). IEEE, 289–305.Google ScholarCross Ref
- [4] . 2016. Exploring different dimensions of attention for uncertainty detection. arXiv:1612.06549. Retrieved from https://arxiv.org/abs/1612.06549.Google Scholar
- [5] . 2011. DITA Best Practices: A Roadmap for Writing, Editing, and Architecting in DITA. IBM Press.Google Scholar
- [6] . 2009. Learning Deep Architectures for AI. Now Publishers Inc.Google ScholarCross Ref
- [7] . 2017. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics 5 (2017), 135–146.Google ScholarCross Ref
- [8] . 2019. The State of Incident Response 2017. Retrieved from https://www.paloaltonetworks.com/resources/research/the-state-of-incident-response-2017.
Accessed March 3, 2021. Google Scholar - [9] . 2019. An ontology-driven approach to automating the process of integrating security software systems. In Proceedings of the 2019 IEEE/ACM International Conference on Software and System Processes (ICSSP). IEEE, 54–63.Google Scholar
- [10] . 2019. BIKER: a tool for Bi-information source based API method recommendation. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1075–1079.Google ScholarDigital Library
- [11] . 2021. Automated query reformulation for efficient search based on query logs from stack overflow. In Proceedings of the 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 1273–1285.Google ScholarDigital Library
- [12] . 2012. Searching connected API subgraph via text phrases. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering. 1–11.Google ScholarDigital Library
- [13] . 2017. By the community & for the community: A deep learning approach to assist collaborative editing in q&a sites. Proceedings of the ACM on Human–Computer Interaction 1, CSCW (2017), 1–21.Google ScholarDigital Library
- [14] . 2018. A neural framework for retrieval and summarization of source code. In Proceedings of the 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 826–831.Google ScholarDigital Library
- [15] . 2015. Keras. Retrieved from https://github.com/fchollet/keras.
Accessed March 3, 2021. Google Scholar - [16] . 2007. Sampling Techniques. John Wiley & Sons.Google Scholar
- [17] . 2019. SOAR Platforms: Everything You Need to Know About Security Orchestration, Automation, and Response. Retrieved from https://threatconnect.com/wp-content/uploads/ThreatConnect-SOAR-eBook.pdf.
Accessed January 13, 2021. Google Scholar - [18] . 2018. Text data augmentation made simple by leveraging NLP cloud APIs. arXiv:1812.04718. Retrieved from https://arxiv.org/abs/1812.04718.Google Scholar
- [19] . 2020. SOAR: Conclusions for 2020. Retrieved from https://orangecyberdefense.com/global/blog/managed-detection-response/soar-conclusions-for-2020/.
March 3, 2021. Google Scholar - [20] . 2015. Development emails content analyzer: Intention mining in developer discussions (T). In Proceedings of the 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 12–23.Google ScholarDigital Library
- [21] . 2017. Data augmentation for low-resource neural machine translation. arXiv:1705.00440. Retrieved from https://arxiv.org/abs/1705.00440.Google Scholar
- [22] . 2020. Technical Q&A site answer recommendation via question boosting. ACM Transactions on Software Engineering and Methodology (TOSEM) 30, 1 (2020), 1–34.Google ScholarDigital Library
- [23] . 2016. Deep API learning. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 631–642.Google ScholarDigital Library
- [24] . 2018. Polisis: Automated analysis and presentation of privacy policies using deep learning. In Proceedings of the 27th \(\lbrace\)USENIX\(\rbrace\) Security Symposium (\(\lbrace\)USENIX\(\rbrace\) Security 18). 531–548.Google Scholar
- [25] . 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735–1780.Google ScholarDigital Library
- [26] . 2018. Automating intention mining. IEEE Transactions on Software Engineering 46, 10 (2018), 1098–1119.Google ScholarCross Ref
- [27] . 2018. API method recommendation without worrying about the task-API knowledge gap. In Proceedings of the 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 293–304.Google ScholarDigital Library
- [28] . 2019. Automated interpretation and integration of security tools using semantic knowledge. In Proceedings of the International Conference on Advanced Information Systems Engineering. Springer, 513–528.Google ScholarDigital Library
- [29] . 2020. Architecture-centric support for integrating security tools in a security orchestration platform. In Proceedings of the European Conference on Software Architecture. Springer, 165–181.Google ScholarDigital Library
- [30] . July 2017. Cybersecurity Analytics and Operations in Transition: Challenges, Plans, Successes, and Strategies. Retrieved from https://www.esg-global.com/.
Accessed March 3, 2021. Google Scholar - [31] . 2021. Predicting issue types on GitHub. Science of Computer Programming 205 (2021), 102598.Google ScholarCross Ref
- [32] . 2015. Analyzing tagging accuracy of part-of-speech taggers. In Proceedings of the International Conference on Genetic and Evolutionary Computing. Springer, 347–354.Google Scholar
- [33] . 2019. Sentiment classification using convolutional neural networks. Applied Sciences 9, 11 (2019), 2347.Google ScholarCross Ref
- [34] . 2014. Convolutional neural networks for sentence classification. arXiv:1408.5882. Retrieved from https://arxiv.org/abs/1408.5882.Google Scholar
- [35] . 2018. Contextual augmentation: Data augmentation by words with paradigmatic relations. arXiv:1805.06201. Retrieved from https://arxiv.org/abs/1805.06201.Google Scholar
- [36] . 2015. Recurrent convolutional neural networks for text classification. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 29.Google ScholarCross Ref
- [37] . 2020. PUMiner: Mining security posts from developer question and answer websites with PU learning. In Proceedings of the 17th International Conference on Mining Software Repositories. 350–361.Google ScholarDigital Library
- [38] . 2019. Automated software vulnerability assessment with concept drift. In Proceedings of the 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR). IEEE, 371–382.Google ScholarDigital Library
- [39] . 2020. Textshield: Robust text classification based on multimodal embedding and neural machine translation. In Proceedings of the 29th \(\lbrace\)USENIX\(\rbrace\) Security Symposium (\(\lbrace\)USENIX\(\rbrace\) Security 20). 1381–1398.Google Scholar
- [40] . 2018. Vuldeepecker: A deep learning-based system for vulnerability detection. arXiv:1801.01681. Retrieved from https://arxiv.org/abs/1801.01681.Google Scholar
- [41] . 2021. LimaCharlie REST API Documentation. Retrieved from https://api.limacharlie.io/static/swagger/#/.
Accessed March 3, 2021. Google Scholar - [42] . 2021. LimaCharlie Sensor Commands. Retrieved from https://doc.limacharlie.io/docs/documentation/docs/sensor_commands.md.
Accessed March 3, 2021. Google Scholar - [43] . 2021. Python-LimaCharlie API Documentation. Retrieved from https://python-limacharlie.readthedocs.io/en/master/limacharlie.html.
Accessed March 3, 2021. Google Scholar - [44] . 2022. Opinion mining for software development: A systematic literature review. ACM Transactions on Software Engineering and Methodology 31, 3 (2022), 1–41.Google ScholarDigital Library
- [45] . 2020. Adaptive deep code search. In Proceedings of the 28th International Conference on Program Comprehension. 48–59.Google ScholarDigital Library
- [46] . 2015. Query expansion via wordnet for effective code search. In Proceedings of the 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER). IEEE, 545–549.Google Scholar
- [47] . 2019. NLP Augmentation. Retrieved from https://github.com/makcedward/nlpaug.
Accessed March 3, 2021. Google Scholar - [48] . 2011. Building queries for prior-art search. In Proceedings of the Information Retrieval Facility Conference. Springer, 3–15.Google ScholarCross Ref
- [49] . 2012. Interrater reliability: The kappa statistic. Biochemia Medica 22, 3 (2012), 276–282.Google ScholarCross Ref
- [50] . 2011. Portfolio: Finding relevant functions and their usage. In Proceedings of the 33rd International Conference on Software Engineering. 111–120.Google ScholarDigital Library
- [51] . 2018. Application programming interface documentation: What do software developers want? Journal of Technical Writing and Communication 48, 3 (2018), 295–330.Google ScholarCross Ref
- [52] . 2013. Efficient estimation of word representations in vector space. arXiv:1301.3781. Retrieved from https://arxiv.org/abs/1301.3781.Google Scholar
- [53] . 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of the Advances in Neural Information Processing Systems. 3111–3119.Google ScholarDigital Library
- [54] . 2021. MISP Automation API. Retrieved from https://www.circl.lu/doc/misp/automation/.
Accessed March 3, 2021. Google Scholar - [55] . 2021. PyMISP - Python Library to Access MISP: Example Scripts. Retrieved from https://www.circl.lu/doc/misp/pymisp/.
Accessed March 3, 2021. Google Scholar - [56] . 2017. Exploring API embedding for API usages and applications. In Proceedings of the 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). IEEE, 438–449.Google ScholarDigital Library
- [57] . 2020. Niacin: A Python package for text data enrichment. Journal of Open Source Software 5, 50 (2020), 2136.Google ScholarCross Ref
- [58] Wiebke Wagner. 2010. Steven bird, ewan klein and edward loper: Natural language processing with python, analyzing text with the natural language toolkit. Language Resources and Evaluation 44, 4 (2010), 421–424.Google Scholar
- [59] . 2006. Why don’t people read the manual? In Proceedings of the 24th Annual ACM International Conference on Design of Communication. 11–18.Google ScholarDigital Library
- [60] . 2015. How can i improve my app? Classifying user reviews for software maintenance and evolution. In Proceedings of the 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 281–290.Google ScholarDigital Library
- [61] . 2012. Crowd documentation: Exploring the coverage and the dynamics of API discussions on Stack Overflow. Georgia Institute of Technology, Tech. Rep 11 (2012).Google Scholar
- [62] . 2014. GloVe: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 1532–1543.Google ScholarCross Ref
- [63] . 2021. Splunk Phantom: Harness the Full Power of Your Security Investments with Security Orchestration, Automation and Response. Retrieved from https://www.splunk.com/en_us/software/splunk-security-orchestration-and-automation/features.html.
Accessed March 3, 2021. Google Scholar - [64] . 1998. Early stopping-but when? In Proceedings of the Neural Networks: Tricks of the Trade. Springer, 55–69.Google ScholarCross Ref
- [65] . 2020. EasyAug: An automatic textual data augmentation platform for classification tasks. In Proceedings of the Companion Proceedings of the Web Conference 2020. 249–252.Google ScholarDigital Library
- [66] . 2016. Swim: Synthesizing what i mean-code search and idiomatic snippet synthesis. In Proceedings of the 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE). IEEE, 357–367.Google ScholarDigital Library
- [67] . 2016. Rack: Automatic api recommendation using crowdsourced knowledge. In Proceedings of the 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), Vol. 1. IEEE, 349–359.Google ScholarCross Ref
- [68] . 2020. Security Orchestration and Automation (SOAR) Playbook: Your Practical Guide to Implementing a SOAR Solution. Retrieved from https://www.rapid7.com/info/security-orchestration-and-automation-playbook/.
Accessed March 3, 2021. Google Scholar - [69] . 2010. Software framework for topic modelling with large corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. ELRA, Valletta, Malta, 45–50.Google Scholar
- [70] . 2019, Accessed March 3, 2021. Threat Connect- Playbook Fridays: How to Create a Playbook for the Non-Programmer. Retrieved from https://threatconnect.com/blog/playbooks-for-non-programmers/.Google Scholar
- [71] . 2015. Convolutional neural networks for biomedical text classification: Application in indexing biomedical articles. In Proceedings of the 6th ACM Conference on Bioinformatics, Computational Biology and Health Informatics. 258–267.Google ScholarDigital Library
- [72] . 2017. Garbage in, garbage out: How purport-edly great ML models can be screwed up by bad data. Proceedings of Blackhat 2017 (2017).Google Scholar
- [73] . 1997. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing 45, 11 (1997), 2673–2681.Google ScholarDigital Library
- [74] . 2021. ENTERPRISE INCIDENT & CASE MANAGEMENT SOLUTION FOR SECURITY ORCHESTRATION, AUTOMATION, & RESPONSE. Retrieved from https://query.prod.cms.rt.microsoft.com/cms/api/am/binary/RE36vyb.
Accessed March 3, 2021. Google Scholar - [75] . 2020. Running Snort as a Daemon. Retrieved from http://manual-snort-org.s3-website-us-east-1.amazonaws.com/node11.html.
Accessed March 3, 2021. Google Scholar - [76] . 2020. SNORT Users Manual 2.9.16. Retrieved from http://manual-snort-org.s3-website-us-east-1.amazonaws.com/.
Accessed March 3, 2021. Google Scholar - [77] . 2017. Towards pervasive and user satisfactory cnn across gpu microarchitectures. In Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 1–12.Google ScholarCross Ref
- [78] . 2010. Performance and scalability of GPU-based convolutional neural networks. In Proceedings of the 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing. IEEE, 317–324.Google ScholarDigital Library
- [79] . 2019. How to fine-tune BERT for text classification? In Proceedings of the China National Conference on Chinese Computational Linguistics. Springer, 194–206.Google ScholarDigital Library
- [80] . 2021. Security Orchestration, Automation and Response (SOAR) Capabilities. Retrieved from https://swimlane.com/assets/uploads/documents/SOAR_Capabilities_e_book___Swimlane.pdf.Google Scholar
- [81] . 2021. Sparse_categorical_crossentropy. Retrieved from https://www.tensorflow.org/api_docs/python/tf/keras/losses/sparse_categorical_crossentropy.
Accessed January 27, 2021. Google Scholar - [82] . 2017. APIBot: Question answering bot for API documentation. In Proceedings of the 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 153–158.Google ScholarCross Ref
- [83] . 2020. Classification of functional and non-functional requirement in software requirement using Word2vec and fast Text. In Proceedings of the Journal of Physics: Conference Series, Vol. 1529. IOP Publishing, 042077.Google ScholarCross Ref
- [84] . 2010. Princeton University “About WordNet”. Retrieved from https://wordnet.princeton.edu/.
January 13, 2021. Google Scholar - [85] . 2020. Directions in abusive language training data, a systematic review: Garbage in, garbage out. PloS One 15, 12 (2020), e0243300.Google ScholarCross Ref
- [86] . 2020. Security operations center: A systematic study and open challenges. IEEE Access 8 (2020), 227756–227779.Google ScholarCross Ref
- [87] . 2019. Extracting API tips from developer question and answer websites. In Proceedings of the 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR). IEEE, 321–332.Google ScholarDigital Library
- [88] . 2019. Eda: Easy data augmentation techniques for boosting performance on text classification tasks. arXiv:1901.11196. Retrieved from https://arxiv.org/abs/1901.11196.Google Scholar
- [89] . 2016. Learning text representation using recurrent convolutional neural network with highway layers. arXiv:1606.06905. Retrieved from https://arxiv.org/abs/1606.06905.Google Scholar
- [90] . 2020. Data-driven approach to application programming interface documentation mining: A review. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 10, 5 (2020), e1369.Google ScholarCross Ref
- [91] . 2020. API method recommendation via explicit matching of functionality verb phrases. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1015–1026.Google ScholarDigital Library
- [92] . 2018. MULAPI: Improving API method recommendation with API usage location. Journal of Systems and Software 142 (2018), 195–205.Google ScholarCross Ref
- [93] . 2016. From word embeddings to document similarities for improved information retrieval in software engineering. In Proceedings of the 38th International Conference on Software Engineering. 404–415.Google ScholarDigital Library
- [94] . 2018. Addressing unseen word problem in text classification. In Proceedings of the International Conference on Applications of Natural Language to Information Systems. Springer, 339–351.Google ScholarDigital Library
- [95] . 2014. Tagging accuracy analysis on part-of-speech taggers. Journal of Computer and Communications 2, 4 (2014), 157–162.Google ScholarCross Ref
- [96] . 2015. Character-level convolutional networks for text classification. In Proceedings of the Advances in Neural Information Processing Systems. 649–657.Google Scholar
- [97] . 2009. Inferring resource specifications from natural language API documentation. In Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering. IEEE, 307–318.Google ScholarDigital Library
- [98] . 2016. Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. arXiv:1611.06639. Retrieved from https://arxiv.org/abs/1611.06639.Google Scholar
Index Terms
- APIRO: A Framework for Automated Security Tools API Recommendation
Recommendations
API method recommendation without worrying about the task-API knowledge gap
ASE '18: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software EngineeringDevelopers often need to search for appropriate APIs for their programming tasks. Although most libraries have API reference documentation, it is not easy to find appropriate APIs due to the lexical gap and knowledge gap between the natural language ...
SOAR4IoT: Securing IoT Assets with Digital Twins
ARES '22: Proceedings of the 17th International Conference on Availability, Reliability and SecurityAs more and more security tools provide organizations with cybersecurity capabilities, security analysts are overwhelmed by security events. Resolving these events is challenging due to extensive manual processes, limited financial resources, and human ...
Effective API recommendation without historical software repositories
ASE '18: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software EngineeringIt is time-consuming and labor-intensive to learn and locate the correct API for programming tasks. Thus, it is beneficial to perform API recommendation automatically. The graph-based statistical model has been shown to recommend top-10 API candidates ...
Comments