skip to main content
10.1145/2976749.2978370acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article

Scalable Graph-based Bug Search for Firmware Images

Published:24 October 2016Publication History

ABSTRACT

Because of rampant security breaches in IoT devices, searching vulnerabilities in massive IoT ecosystems is more crucial than ever. Recent studies have demonstrated that control-flow graph (CFG) based bug search techniques can be effective and accurate in IoT devices across different architectures. However, these CFG-based bug search approaches are far from being scalable to handle an enormous amount of IoT devices in the wild, due to their expensive graph matching overhead. Inspired by rich experience in image and video search, we propose a new bug search scheme which addresses the scalability challenge in existing cross-platform bug search techniques and further improves search accuracy. Unlike existing techniques that directly conduct searches based upon raw features (CFGs) from the binary code, we convert the CFGs into high-level numeric feature vectors. Compared with the CFG feature, high-level numeric feature vectors are more robust to code variation across different architectures, and can easily achieve realtime search by using state-of-the-art hashing techniques. We have implemented a bug search engine, Genius, and compared it with state-of-art bug search approaches. Experimental results show that Genius outperforms baseline approaches for various query loads in terms of speed and accuracy. We also evaluated Genius on a real-world dataset of 33,045 devices which was collected from public sources and our system. The experiment showed that Genius can finish a search within 1 second on average when performed over 8,126 firmware images of 420,558,702 functions. By only looking at the top 50 candidates in the search result, we found 38 potentially vulnerable firmware images across 5 vendors, and confirmed 23 of them by our manual analysis. We also found that it took only 0.1 seconds on average to finish searching for all 154 vulnerabilities in two latest commercial firmware images from D-LINK. 103 of them are potentially vulnerable in these images, and 16 of them were confirmed.

References

  1. Cybersecurity and the Internet of Things. http://www.ey.com/Publication/vwLUAssets/EY-cybersecurity-and-the-internet-of-things.pdf.Google ScholarGoogle Scholar
  2. DDWRT ftp. http://download1.dd-wrt.com/dd-wrtv2/downloads/others/eko/BrainSlayer-V24-preSP2/.Google ScholarGoogle Scholar
  3. Industrial Utilities and Devices Where the Cyber Threat Lurks. http://www.cyactive.com/industrial-utilities-devices-cyber-threat-lurks/.Google ScholarGoogle Scholar
  4. Iot when cyberattacks have physical effects. http://www.federaltimes.com/story/government/solutions-ideas/2016/04/08/internet-things-when-cyberattacks-have physical-effects/82787430/.Google ScholarGoogle Scholar
  5. mongodb. https://www.mongodb.com.Google ScholarGoogle Scholar
  6. Nearpy. https://pypi.python.org/pypi/NearPy.Google ScholarGoogle Scholar
  7. DD-WRT Firmware Image r21676. ftp://ftp.dd-wrt.com/others/eko/BrainSlayer-V24-preSP2/2013/05--27--2013-r21676/senao-eoc5610/linux.bin, 2013.Google ScholarGoogle Scholar
  8. ReadyNAS Firmware Image v6.1.6. http://www.downloads.netgear.com/files/GDC/READYNAS-100/ReadyNASOS-6.1.6-arm.zip, 2013.Google ScholarGoogle Scholar
  9. A. Andoni and P. Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Communications of the ACM Commun., 51, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. Arandjelovic and A. Zisserman. All about vlad. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1578--1585, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. T. Avgerinos, S. K. Cha, A. Rebert, E. J. Schwartz, M. Woo, and D. Brumley. Automatic exploit generation. Communications of the ACM, 57(2):74--84, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M.-F. Balcan, A. Blum, and A. Gupta. Approximate clustering without the approximation. In Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1068--1077, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Bourquin, A. King, and E. Robbins. Binslayer: accurate comparison of binary executables. In Proceedings of the 2nd ACM SIGPLAN Program Protection and Reverse Engineering Workshop, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. H. Bunke and K. Shearer. A graph distance metric based on the maximal common subgraph. Pattern recognition letters, 19(3):255--259, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. K. Cha, M. Woo, and D. Brumley. Program-adaptive mutational fuzzing. In Oakland, 2015.Google ScholarGoogle Scholar
  16. K. Chatfield, V. S. Lempitsky, A. Vedaldi, and A. Zisserman. The devil is in the details: an evaluation of recent feature encoding methods. In BMVC, volume 2, page 8, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  17. D. D. Chen, M. Egele, M. Woo, and D. Brumley. Towards automated dynamic analysis for linux-based embedded firmware. In NDSS, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  18. K. Chen, P. Wang, Y. Lee, X. Wang, N. Zhang, H. Huang, W. Zou, and P. Liu. Finding unknown malice in 10 seconds: Mass vetting for new threats at the google-play scale. In USENIX Security, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Costin, J. Zaddach, A. Francillon, and D. Balzarotti. A large-scale analysis of the security of embedded firmwares. In USENIX Security, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Y. David and E. Yahav. Tracelet-based code search in executables. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. T. Dullien and R. Rolles. Graph-based comparison of executable objects (english version). SSTIC, 5:1--3, 2005.Google ScholarGoogle Scholar
  22. M. Egele, M. Woo, P. Chapman, and D. Brumley. Blanket execution: Dynamic similarity testing for program binaries and components. In USENIX Security, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Eschweiler, K. Yakdan, and E. Gerhards-Padilla. discovre: Efficient cross-architecture identification of bugs in binary code. In NDSS, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  24. Q. Feng, A. Prakash, M. Wang, C. Carmony, and H. Yin. Origen: Automatic extraction of offset-revealing instructions for cross-version memory analysis. In ASIACCS, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. H. Flake. Structural comparison of executable objects. In DIMVA, volume 46, 2004.Google ScholarGoogle Scholar
  26. D. Gao, M. K. Reiter, and D. Song. Binhunt: Automatically finding semantic differences in binary programs. In Information and Communications Security. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J. Holcombe. Soho network equipment (technical report). https://securityevaluators.com/knowledge/case_studies/routers/soho_techreport.pdf.Google ScholarGoogle Scholar
  28. The IDA Pro Disassembler and Debugger. http://www.datarescue.com/idabase/.Google ScholarGoogle Scholar
  29. J. Jang, A. Agrawal, and D. Brumley. Redebug: finding unpatched code clones in entire os distributions. In Oakland, 2012.Google ScholarGoogle Scholar
  30. L. Jiang, T. Mitamura, S.-I. Yu, and A. G. Hauptmann. Zero-example event search using multimodal pseudo relevance feedback. In ICMR, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. L. Jiang, W. Tong, and A. G. Meng, Deyu andHauptmann. Towards efficient learning of optimal spatial bag-of-words representations. In ICMR, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. L. Jiang, S.-I. Yu, D. Meng, T. Mitamura, and A. G. Hauptmann. Bridging the ultimate semantic gap: A semantic search engine for internet videos. In ICMR, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. T. Kamiya, S. Kusumoto, and K. Inoue. Ccfinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Transactions on Software Engineering, 28(7):654--670, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. W. M. Khoo, A. Mycroft, and R. Anderson. Rendezvous: A search engine for binary code. In Proceedings of the 10th Working Conference on Mining Software Repositories, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Z. Li, S. Lu, S. Myagmar, and Y. Zhou. Cp-miner: A tool for finding copy-paste and related bugs in operating system code. In OSDI, volume 4, pages 289--302, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. W. Liu, J. Wang, S. Kumar, and S.-F. Chang. Hashing with graphs. In ICML, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. McCabe. More Complex = Less Secure. Miss a Test Path and You Could Get Hacked. http://www.mccabe.com/sqe/books.htm, 2012.Google ScholarGoogle Scholar
  38. A. McCallum, K. Nigam, et al. A comparison of event models for naive bayes text classification. In the workshop on learning for text categorization, 1998.Google ScholarGoogle Scholar
  39. J. Ming, M. Pan, and D. Gao. ibinhunt: binary hunting with inter-procedural control flow. In Information Security and Cryptology, pages 92--109. Springer, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. F. Murtagh. A survey of recent advances in hierarchical clustering algorithms. The Computer Journal, 26(4):354--359, 1983.Google ScholarGoogle ScholarCross RefCross Ref
  41. G. Myles and C. Collberg. K-gram based software birthmarks. In Proceedings of the 2005 ACM symposium on Applied computing, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. M. Newman. Networks: an introduction. 2010. Google ScholarGoogle ScholarCross RefCross Ref
  43. A. Y. Ng, M. I. Jordan, Y. Weiss, et al. On spectral clustering: Analysis and an algorithm. Advances in neural information processing systems, 2:849--856, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. H. Perl, S. Dechand, M. Smith, D. Arp, F. Yamaguchi, K. Rieck, S. Fahl, and Y. Acar. Vccfinder: Finding potential vulnerabilities in open-source projects to assist code audits. In CCS, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. J. Pewny, B. Garmany, R. Gawlik, C. Rossow, and T. Holz. Cross-architecture bug search in binary executables. In Oakland, 2015.Google ScholarGoogle Scholar
  46. J. Pewny, F. Schuster, L. Bernhard, T. Holz, and C. Rossow. Leveraging semantic signatures for bug search in binary programs. In ACSAC, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. G. Qian, S. Sural, Y. Gu, and S. Pramanik. Similarity between euclidean and cosine angle distance for nearest neighbor queries. In Proceedings of the symposium on Applied computing, pages 1232--1237, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. A. Rebert, S. K. Cha, T. Avgerinos, J. Foote, D. Warren, G. Grieco, and D. Brumley. Optimizing seed selection for fuzzing. In USENIX Security, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. K. Riesen and H. Bunke. Approximate graph edit distance computation by means of bipartite graph matching. Image and vision computing, 27(7):950--959, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. M. Shahrokh Esfahani. Effect of separate sampling on classification accuracy. Bioinformatics, 30:242--250, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  51. E. C. R. Shin, D. Song, and R. Moazzezi. Recognizing functions in binaries with neural networks. In USENIX Security, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Y. Shoshitaishvili, R. Wang, C. Hauser, C. Kruegel, and G. Vigna. Firmalice-automatic detection of authentication bypass vulnerabilities in binary firmware. In NDSS, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  53. J. Sivic and A. Zisserman. Video google: A text retrieval approach to object matching in videos. In IEEE International Conference on Computer Vision, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. M. Slaney and M. Casey. Locality-sensitive hashing for finding nearest neighbors. Signal Processing Magazine, IEEE, 25(2):128--131, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  55. N. Stephens, J. Grosen, C. Salls, A. Dutcher, and R. Wang. Driller: Augmenting fuzzing through selective symbolic execution. In NDSS, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  56. M. Wall. Galib: A c+ library of genetic algorithm components. Mechanical Engineering Department, Massachusetts Institute of Technology, 87:54, 1996.Google ScholarGoogle Scholar
  57. R. Weber, H.-J. Schek, and S. Blott. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In VLDB, volume 98, pages 194--205, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. F. Yamaguchi, A. Maier, H. Gascon, and K. Rieck. Automatic inference of search patterns for taint-style vulnerabilities. In Oakland, 2015.Google ScholarGoogle Scholar
  59. J. Yang, Y.-G. Jiang, A. G. Hauptmann, and C.-W. Ngo. Evaluating bag-of-visual-words representations in scene classification. In International workshop on Workshop on multimedia information retrieval, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. S.-I. Yu, L. Jiang, Z. Xu, Y. Yang, and A. G. Hauptmann. Content-based video search over 1 million videos with 1 core in 1 second. In ICMR, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. J. Zaddach, L. Bruno, A. Francillon, and D. Balzarotti. Avatar: A framework to support dynamic security analysis of embedded systems' firmwares. In NDSS, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  62. M. Zhang, Y. Duan, Q. Feng, and H. Yin. Towards automatic generation of security-centric descriptions for android apps. In CCS, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. M. Zhang, Y. Duan, H. Yin, and Z. Zhao. Semantics-Aware Android Malware Classification Using Weighted Contextual API Dependency Graphs. In CCS, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Scalable Graph-based Bug Search for Firmware Images

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        CCS '16: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security
        October 2016
        1924 pages
        ISBN:9781450341394
        DOI:10.1145/2976749

        Copyright © 2016 ACM

        © 2016 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 24 October 2016

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        CCS '16 Paper Acceptance Rate137of831submissions,16%Overall Acceptance Rate1,261of6,999submissions,18%

        Upcoming Conference

        CCS '24
        ACM SIGSAC Conference on Computer and Communications Security
        October 14 - 18, 2024
        Salt Lake City , UT , USA

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader