ABSTRACT
Because of rampant security breaches in IoT devices, searching vulnerabilities in massive IoT ecosystems is more crucial than ever. Recent studies have demonstrated that control-flow graph (CFG) based bug search techniques can be effective and accurate in IoT devices across different architectures. However, these CFG-based bug search approaches are far from being scalable to handle an enormous amount of IoT devices in the wild, due to their expensive graph matching overhead. Inspired by rich experience in image and video search, we propose a new bug search scheme which addresses the scalability challenge in existing cross-platform bug search techniques and further improves search accuracy. Unlike existing techniques that directly conduct searches based upon raw features (CFGs) from the binary code, we convert the CFGs into high-level numeric feature vectors. Compared with the CFG feature, high-level numeric feature vectors are more robust to code variation across different architectures, and can easily achieve realtime search by using state-of-the-art hashing techniques. We have implemented a bug search engine, Genius, and compared it with state-of-art bug search approaches. Experimental results show that Genius outperforms baseline approaches for various query loads in terms of speed and accuracy. We also evaluated Genius on a real-world dataset of 33,045 devices which was collected from public sources and our system. The experiment showed that Genius can finish a search within 1 second on average when performed over 8,126 firmware images of 420,558,702 functions. By only looking at the top 50 candidates in the search result, we found 38 potentially vulnerable firmware images across 5 vendors, and confirmed 23 of them by our manual analysis. We also found that it took only 0.1 seconds on average to finish searching for all 154 vulnerabilities in two latest commercial firmware images from D-LINK. 103 of them are potentially vulnerable in these images, and 16 of them were confirmed.
- Cybersecurity and the Internet of Things. http://www.ey.com/Publication/vwLUAssets/EY-cybersecurity-and-the-internet-of-things.pdf.Google Scholar
- DDWRT ftp. http://download1.dd-wrt.com/dd-wrtv2/downloads/others/eko/BrainSlayer-V24-preSP2/.Google Scholar
- Industrial Utilities and Devices Where the Cyber Threat Lurks. http://www.cyactive.com/industrial-utilities-devices-cyber-threat-lurks/.Google Scholar
- Iot when cyberattacks have physical effects. http://www.federaltimes.com/story/government/solutions-ideas/2016/04/08/internet-things-when-cyberattacks-have physical-effects/82787430/.Google Scholar
- mongodb. https://www.mongodb.com.Google Scholar
- Nearpy. https://pypi.python.org/pypi/NearPy.Google Scholar
- DD-WRT Firmware Image r21676. ftp://ftp.dd-wrt.com/others/eko/BrainSlayer-V24-preSP2/2013/05--27--2013-r21676/senao-eoc5610/linux.bin, 2013.Google Scholar
- ReadyNAS Firmware Image v6.1.6. http://www.downloads.netgear.com/files/GDC/READYNAS-100/ReadyNASOS-6.1.6-arm.zip, 2013.Google Scholar
- A. Andoni and P. Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Communications of the ACM Commun., 51, 2008. Google ScholarDigital Library
- R. Arandjelovic and A. Zisserman. All about vlad. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1578--1585, 2013. Google ScholarDigital Library
- T. Avgerinos, S. K. Cha, A. Rebert, E. J. Schwartz, M. Woo, and D. Brumley. Automatic exploit generation. Communications of the ACM, 57(2):74--84, 2014. Google ScholarDigital Library
- M.-F. Balcan, A. Blum, and A. Gupta. Approximate clustering without the approximation. In Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1068--1077, 2009. Google ScholarDigital Library
- M. Bourquin, A. King, and E. Robbins. Binslayer: accurate comparison of binary executables. In Proceedings of the 2nd ACM SIGPLAN Program Protection and Reverse Engineering Workshop, 2013. Google ScholarDigital Library
- H. Bunke and K. Shearer. A graph distance metric based on the maximal common subgraph. Pattern recognition letters, 19(3):255--259, 1998. Google ScholarDigital Library
- S. K. Cha, M. Woo, and D. Brumley. Program-adaptive mutational fuzzing. In Oakland, 2015.Google Scholar
- K. Chatfield, V. S. Lempitsky, A. Vedaldi, and A. Zisserman. The devil is in the details: an evaluation of recent feature encoding methods. In BMVC, volume 2, page 8, 2011.Google ScholarCross Ref
- D. D. Chen, M. Egele, M. Woo, and D. Brumley. Towards automated dynamic analysis for linux-based embedded firmware. In NDSS, 2016.Google ScholarCross Ref
- K. Chen, P. Wang, Y. Lee, X. Wang, N. Zhang, H. Huang, W. Zou, and P. Liu. Finding unknown malice in 10 seconds: Mass vetting for new threats at the google-play scale. In USENIX Security, 2015. Google ScholarDigital Library
- A. Costin, J. Zaddach, A. Francillon, and D. Balzarotti. A large-scale analysis of the security of embedded firmwares. In USENIX Security, 2014. Google ScholarDigital Library
- Y. David and E. Yahav. Tracelet-based code search in executables. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2014. Google ScholarDigital Library
- T. Dullien and R. Rolles. Graph-based comparison of executable objects (english version). SSTIC, 5:1--3, 2005.Google Scholar
- M. Egele, M. Woo, P. Chapman, and D. Brumley. Blanket execution: Dynamic similarity testing for program binaries and components. In USENIX Security, 2014. Google ScholarDigital Library
- S. Eschweiler, K. Yakdan, and E. Gerhards-Padilla. discovre: Efficient cross-architecture identification of bugs in binary code. In NDSS, 2016.Google ScholarCross Ref
- Q. Feng, A. Prakash, M. Wang, C. Carmony, and H. Yin. Origen: Automatic extraction of offset-revealing instructions for cross-version memory analysis. In ASIACCS, 2016. Google ScholarDigital Library
- H. Flake. Structural comparison of executable objects. In DIMVA, volume 46, 2004.Google Scholar
- D. Gao, M. K. Reiter, and D. Song. Binhunt: Automatically finding semantic differences in binary programs. In Information and Communications Security. 2008. Google ScholarDigital Library
- J. Holcombe. Soho network equipment (technical report). https://securityevaluators.com/knowledge/case_studies/routers/soho_techreport.pdf.Google Scholar
- The IDA Pro Disassembler and Debugger. http://www.datarescue.com/idabase/.Google Scholar
- J. Jang, A. Agrawal, and D. Brumley. Redebug: finding unpatched code clones in entire os distributions. In Oakland, 2012.Google Scholar
- L. Jiang, T. Mitamura, S.-I. Yu, and A. G. Hauptmann. Zero-example event search using multimodal pseudo relevance feedback. In ICMR, 2014. Google ScholarDigital Library
- L. Jiang, W. Tong, and A. G. Meng, Deyu andHauptmann. Towards efficient learning of optimal spatial bag-of-words representations. In ICMR, 2014. Google ScholarDigital Library
- L. Jiang, S.-I. Yu, D. Meng, T. Mitamura, and A. G. Hauptmann. Bridging the ultimate semantic gap: A semantic search engine for internet videos. In ICMR, 2015. Google ScholarDigital Library
- T. Kamiya, S. Kusumoto, and K. Inoue. Ccfinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Transactions on Software Engineering, 28(7):654--670, 2002. Google ScholarDigital Library
- W. M. Khoo, A. Mycroft, and R. Anderson. Rendezvous: A search engine for binary code. In Proceedings of the 10th Working Conference on Mining Software Repositories, 2013. Google ScholarDigital Library
- Z. Li, S. Lu, S. Myagmar, and Y. Zhou. Cp-miner: A tool for finding copy-paste and related bugs in operating system code. In OSDI, volume 4, pages 289--302, 2004. Google ScholarDigital Library
- W. Liu, J. Wang, S. Kumar, and S.-F. Chang. Hashing with graphs. In ICML, 2011.Google ScholarDigital Library
- McCabe. More Complex = Less Secure. Miss a Test Path and You Could Get Hacked. http://www.mccabe.com/sqe/books.htm, 2012.Google Scholar
- A. McCallum, K. Nigam, et al. A comparison of event models for naive bayes text classification. In the workshop on learning for text categorization, 1998.Google Scholar
- J. Ming, M. Pan, and D. Gao. ibinhunt: binary hunting with inter-procedural control flow. In Information Security and Cryptology, pages 92--109. Springer, 2012. Google ScholarDigital Library
- F. Murtagh. A survey of recent advances in hierarchical clustering algorithms. The Computer Journal, 26(4):354--359, 1983.Google ScholarCross Ref
- G. Myles and C. Collberg. K-gram based software birthmarks. In Proceedings of the 2005 ACM symposium on Applied computing, 2005. Google ScholarDigital Library
- M. Newman. Networks: an introduction. 2010. Google ScholarCross Ref
- A. Y. Ng, M. I. Jordan, Y. Weiss, et al. On spectral clustering: Analysis and an algorithm. Advances in neural information processing systems, 2:849--856, 2002. Google ScholarDigital Library
- H. Perl, S. Dechand, M. Smith, D. Arp, F. Yamaguchi, K. Rieck, S. Fahl, and Y. Acar. Vccfinder: Finding potential vulnerabilities in open-source projects to assist code audits. In CCS, 2015. Google ScholarDigital Library
- J. Pewny, B. Garmany, R. Gawlik, C. Rossow, and T. Holz. Cross-architecture bug search in binary executables. In Oakland, 2015.Google Scholar
- J. Pewny, F. Schuster, L. Bernhard, T. Holz, and C. Rossow. Leveraging semantic signatures for bug search in binary programs. In ACSAC, 2014. Google ScholarDigital Library
- G. Qian, S. Sural, Y. Gu, and S. Pramanik. Similarity between euclidean and cosine angle distance for nearest neighbor queries. In Proceedings of the symposium on Applied computing, pages 1232--1237, 2004. Google ScholarDigital Library
- A. Rebert, S. K. Cha, T. Avgerinos, J. Foote, D. Warren, G. Grieco, and D. Brumley. Optimizing seed selection for fuzzing. In USENIX Security, 2014. Google ScholarDigital Library
- K. Riesen and H. Bunke. Approximate graph edit distance computation by means of bipartite graph matching. Image and vision computing, 27(7):950--959, 2009. Google ScholarDigital Library
- M. Shahrokh Esfahani. Effect of separate sampling on classification accuracy. Bioinformatics, 30:242--250, 2014.Google ScholarCross Ref
- E. C. R. Shin, D. Song, and R. Moazzezi. Recognizing functions in binaries with neural networks. In USENIX Security, 2015. Google ScholarDigital Library
- Y. Shoshitaishvili, R. Wang, C. Hauser, C. Kruegel, and G. Vigna. Firmalice-automatic detection of authentication bypass vulnerabilities in binary firmware. In NDSS, 2015.Google ScholarCross Ref
- J. Sivic and A. Zisserman. Video google: A text retrieval approach to object matching in videos. In IEEE International Conference on Computer Vision, 2003. Google ScholarDigital Library
- M. Slaney and M. Casey. Locality-sensitive hashing for finding nearest neighbors. Signal Processing Magazine, IEEE, 25(2):128--131, 2008.Google ScholarCross Ref
- N. Stephens, J. Grosen, C. Salls, A. Dutcher, and R. Wang. Driller: Augmenting fuzzing through selective symbolic execution. In NDSS, 2016.Google ScholarCross Ref
- M. Wall. Galib: A c+ library of genetic algorithm components. Mechanical Engineering Department, Massachusetts Institute of Technology, 87:54, 1996.Google Scholar
- R. Weber, H.-J. Schek, and S. Blott. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In VLDB, volume 98, pages 194--205, 1998. Google ScholarDigital Library
- F. Yamaguchi, A. Maier, H. Gascon, and K. Rieck. Automatic inference of search patterns for taint-style vulnerabilities. In Oakland, 2015.Google Scholar
- J. Yang, Y.-G. Jiang, A. G. Hauptmann, and C.-W. Ngo. Evaluating bag-of-visual-words representations in scene classification. In International workshop on Workshop on multimedia information retrieval, 2007. Google ScholarDigital Library
- S.-I. Yu, L. Jiang, Z. Xu, Y. Yang, and A. G. Hauptmann. Content-based video search over 1 million videos with 1 core in 1 second. In ICMR, 2015. Google ScholarDigital Library
- J. Zaddach, L. Bruno, A. Francillon, and D. Balzarotti. Avatar: A framework to support dynamic security analysis of embedded systems' firmwares. In NDSS, 2014.Google ScholarCross Ref
- M. Zhang, Y. Duan, Q. Feng, and H. Yin. Towards automatic generation of security-centric descriptions for android apps. In CCS, 2015. Google ScholarDigital Library
- M. Zhang, Y. Duan, H. Yin, and Z. Zhao. Semantics-Aware Android Malware Classification Using Weighted Contextual API Dependency Graphs. In CCS, 2014. Google ScholarDigital Library
Index Terms
- Scalable Graph-based Bug Search for Firmware Images
Recommendations
A taxonomy of IoT firmware security and principal firmware analysis techniques
AbstractInternet of Things (IoT) has come a long way since its inception. However, the standardization process in IoT systems for a secure IoT solution is still in its early days. Numerous quality review articles have been contributed by ...
Graphical abstractDisplay Omitted
Extracting Conditional Formulas for Cross-Platform Bug Search
ASIA CCS '17: Proceedings of the 2017 ACM on Asia Conference on Computer and Communications SecurityWith the recent increase in security breaches in embedded systems and IoT devices, it becomes increasingly important to search for vulnerabilities directly in binary executables in a cross-platform setting. However, very little has been explored in this ...
Effective Bug Triage Based on Historical Bug-Fix Information
ISSRE '14: Proceedings of the 2014 IEEE 25th International Symposium on Software Reliability EngineeringFor complex and popular software, project teams could receive a large number of bug reports. It is often tedious and costly to manually assign these bug reports to developers who have the expertise to fix the bugs. Many bug triage techniques have been ...
Comments