research-article

Scalable Graph-based Bug Search for Firmware Images

Authors:
Qian Feng

Syracuse university, syracuse, NY, USA

Syracuse university, syracuse, NY, USA
View Profile

,
Rundong Zhou

Syracuse university, syracuse, NY, USA

Syracuse university, syracuse, NY, USA
View Profile

,
Chengcheng Xu

Syracuse university, syracuse, NY, USA

Syracuse university, syracuse, NY, USA
View Profile

,
Yao Cheng

Syracuse university, syracuse, NY, USA

Syracuse university, syracuse, NY, USA
View Profile

,
Brian Testa

Air Force Research Lab, Rome, NY, USA

Air Force Research Lab, Rome, NY, USA
View Profile

,
Heng Yin

University of California, Riverside, Riverside, CA, USA

University of California, Riverside, Riverside, CA, USA
View Profile

CCS '16: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications SecurityOctober 2016Pages 480–491https://doi.org/10.1145/2976749.2978370

Published:24 October 2016Publication History

CCS '16: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security

Pages 480–491

ABSTRACT

Because of rampant security breaches in IoT devices, searching vulnerabilities in massive IoT ecosystems is more crucial than ever. Recent studies have demonstrated that control-flow graph (CFG) based bug search techniques can be effective and accurate in IoT devices across different architectures. However, these CFG-based bug search approaches are far from being scalable to handle an enormous amount of IoT devices in the wild, due to their expensive graph matching overhead. Inspired by rich experience in image and video search, we propose a new bug search scheme which addresses the scalability challenge in existing cross-platform bug search techniques and further improves search accuracy. Unlike existing techniques that directly conduct searches based upon raw features (CFGs) from the binary code, we convert the CFGs into high-level numeric feature vectors. Compared with the CFG feature, high-level numeric feature vectors are more robust to code variation across different architectures, and can easily achieve realtime search by using state-of-the-art hashing techniques. We have implemented a bug search engine, Genius, and compared it with state-of-art bug search approaches. Experimental results show that Genius outperforms baseline approaches for various query loads in terms of speed and accuracy. We also evaluated Genius on a real-world dataset of 33,045 devices which was collected from public sources and our system. The experiment showed that Genius can finish a search within 1 second on average when performed over 8,126 firmware images of 420,558,702 functions. By only looking at the top 50 candidates in the search result, we found 38 potentially vulnerable firmware images across 5 vendors, and confirmed 23 of them by our manual analysis. We also found that it took only 0.1 seconds on average to finish searching for all 154 vulnerabilities in two latest commercial firmware images from D-LINK. 103 of them are potentially vulnerable in these images, and 16 of them were confirmed.

References

Cybersecurity and the Internet of Things. http://www.ey.com/Publication/vwLUAssets/EY-cybersecurity-and-the-internet-of-things.pdf.Google Scholar
DDWRT ftp. http://download1.dd-wrt.com/dd-wrtv2/downloads/others/eko/BrainSlayer-V24-preSP2/.Google Scholar
Industrial Utilities and Devices Where the Cyber Threat Lurks. http://www.cyactive.com/industrial-utilities-devices-cyber-threat-lurks/.Google Scholar
Iot when cyberattacks have physical effects. http://www.federaltimes.com/story/government/solutions-ideas/2016/04/08/internet-things-when-cyberattacks-have physical-effects/82787430/.Google Scholar
mongodb. https://www.mongodb.com.Google Scholar
Nearpy. https://pypi.python.org/pypi/NearPy.Google Scholar
DD-WRT Firmware Image r21676. ftp://ftp.dd-wrt.com/others/eko/BrainSlayer-V24-preSP2/2013/05--27--2013-r21676/senao-eoc5610/linux.bin, 2013.Google Scholar
ReadyNAS Firmware Image v6.1.6. http://www.downloads.netgear.com/files/GDC/READYNAS-100/ReadyNASOS-6.1.6-arm.zip, 2013.Google Scholar
A. Andoni and P. Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Communications of the ACM Commun., 51, 2008. Google ScholarDigital Library
R. Arandjelovic and A. Zisserman. All about vlad. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1578--1585, 2013. Google ScholarDigital Library
T. Avgerinos, S. K. Cha, A. Rebert, E. J. Schwartz, M. Woo, and D. Brumley. Automatic exploit generation. Communications of the ACM, 57(2):74--84, 2014. Google ScholarDigital Library
M.-F. Balcan, A. Blum, and A. Gupta. Approximate clustering without the approximation. In Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1068--1077, 2009. Google ScholarDigital Library
M. Bourquin, A. King, and E. Robbins. Binslayer: accurate comparison of binary executables. In Proceedings of the 2nd ACM SIGPLAN Program Protection and Reverse Engineering Workshop, 2013. Google ScholarDigital Library
H. Bunke and K. Shearer. A graph distance metric based on the maximal common subgraph. Pattern recognition letters, 19(3):255--259, 1998. Google ScholarDigital Library
S. K. Cha, M. Woo, and D. Brumley. Program-adaptive mutational fuzzing. In Oakland, 2015.Google Scholar
K. Chatfield, V. S. Lempitsky, A. Vedaldi, and A. Zisserman. The devil is in the details: an evaluation of recent feature encoding methods. In BMVC, volume 2, page 8, 2011.Google ScholarCross Ref
D. D. Chen, M. Egele, M. Woo, and D. Brumley. Towards automated dynamic analysis for linux-based embedded firmware. In NDSS, 2016.Google ScholarCross Ref
K. Chen, P. Wang, Y. Lee, X. Wang, N. Zhang, H. Huang, W. Zou, and P. Liu. Finding unknown malice in 10 seconds: Mass vetting for new threats at the google-play scale. In USENIX Security, 2015. Google ScholarDigital Library
A. Costin, J. Zaddach, A. Francillon, and D. Balzarotti. A large-scale analysis of the security of embedded firmwares. In USENIX Security, 2014. Google ScholarDigital Library
Y. David and E. Yahav. Tracelet-based code search in executables. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2014. Google ScholarDigital Library
T. Dullien and R. Rolles. Graph-based comparison of executable objects (english version). SSTIC, 5:1--3, 2005.Google Scholar
M. Egele, M. Woo, P. Chapman, and D. Brumley. Blanket execution: Dynamic similarity testing for program binaries and components. In USENIX Security, 2014. Google ScholarDigital Library
S. Eschweiler, K. Yakdan, and E. Gerhards-Padilla. discovre: Efficient cross-architecture identification of bugs in binary code. In NDSS, 2016.Google ScholarCross Ref
Q. Feng, A. Prakash, M. Wang, C. Carmony, and H. Yin. Origen: Automatic extraction of offset-revealing instructions for cross-version memory analysis. In ASIACCS, 2016. Google ScholarDigital Library
H. Flake. Structural comparison of executable objects. In DIMVA, volume 46, 2004.Google Scholar
D. Gao, M. K. Reiter, and D. Song. Binhunt: Automatically finding semantic differences in binary programs. In Information and Communications Security. 2008. Google ScholarDigital Library
J. Holcombe. Soho network equipment (technical report). https://securityevaluators.com/knowledge/case_studies/routers/soho_techreport.pdf.Google Scholar
The IDA Pro Disassembler and Debugger. http://www.datarescue.com/idabase/.Google Scholar
J. Jang, A. Agrawal, and D. Brumley. Redebug: finding unpatched code clones in entire os distributions. In Oakland, 2012.Google Scholar
L. Jiang, T. Mitamura, S.-I. Yu, and A. G. Hauptmann. Zero-example event search using multimodal pseudo relevance feedback. In ICMR, 2014. Google ScholarDigital Library
L. Jiang, W. Tong, and A. G. Meng, Deyu andHauptmann. Towards efficient learning of optimal spatial bag-of-words representations. In ICMR, 2014. Google ScholarDigital Library
L. Jiang, S.-I. Yu, D. Meng, T. Mitamura, and A. G. Hauptmann. Bridging the ultimate semantic gap: A semantic search engine for internet videos. In ICMR, 2015. Google ScholarDigital Library
T. Kamiya, S. Kusumoto, and K. Inoue. Ccfinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Transactions on Software Engineering, 28(7):654--670, 2002. Google ScholarDigital Library
W. M. Khoo, A. Mycroft, and R. Anderson. Rendezvous: A search engine for binary code. In Proceedings of the 10th Working Conference on Mining Software Repositories, 2013. Google ScholarDigital Library
Z. Li, S. Lu, S. Myagmar, and Y. Zhou. Cp-miner: A tool for finding copy-paste and related bugs in operating system code. In OSDI, volume 4, pages 289--302, 2004. Google ScholarDigital Library
W. Liu, J. Wang, S. Kumar, and S.-F. Chang. Hashing with graphs. In ICML, 2011.Google ScholarDigital Library
McCabe. More Complex = Less Secure. Miss a Test Path and You Could Get Hacked. http://www.mccabe.com/sqe/books.htm, 2012.Google Scholar
A. McCallum, K. Nigam, et al. A comparison of event models for naive bayes text classification. In the workshop on learning for text categorization, 1998.Google Scholar
J. Ming, M. Pan, and D. Gao. ibinhunt: binary hunting with inter-procedural control flow. In Information Security and Cryptology, pages 92--109. Springer, 2012. Google ScholarDigital Library
F. Murtagh. A survey of recent advances in hierarchical clustering algorithms. The Computer Journal, 26(4):354--359, 1983.Google ScholarCross Ref
G. Myles and C. Collberg. K-gram based software birthmarks. In Proceedings of the 2005 ACM symposium on Applied computing, 2005. Google ScholarDigital Library
M. Newman. Networks: an introduction. 2010. Google ScholarCross Ref
A. Y. Ng, M. I. Jordan, Y. Weiss, et al. On spectral clustering: Analysis and an algorithm. Advances in neural information processing systems, 2:849--856, 2002. Google ScholarDigital Library
H. Perl, S. Dechand, M. Smith, D. Arp, F. Yamaguchi, K. Rieck, S. Fahl, and Y. Acar. Vccfinder: Finding potential vulnerabilities in open-source projects to assist code audits. In CCS, 2015. Google ScholarDigital Library
J. Pewny, B. Garmany, R. Gawlik, C. Rossow, and T. Holz. Cross-architecture bug search in binary executables. In Oakland, 2015.Google Scholar
J. Pewny, F. Schuster, L. Bernhard, T. Holz, and C. Rossow. Leveraging semantic signatures for bug search in binary programs. In ACSAC, 2014. Google ScholarDigital Library
G. Qian, S. Sural, Y. Gu, and S. Pramanik. Similarity between euclidean and cosine angle distance for nearest neighbor queries. In Proceedings of the symposium on Applied computing, pages 1232--1237, 2004. Google ScholarDigital Library
A. Rebert, S. K. Cha, T. Avgerinos, J. Foote, D. Warren, G. Grieco, and D. Brumley. Optimizing seed selection for fuzzing. In USENIX Security, 2014. Google ScholarDigital Library
K. Riesen and H. Bunke. Approximate graph edit distance computation by means of bipartite graph matching. Image and vision computing, 27(7):950--959, 2009. Google ScholarDigital Library
M. Shahrokh Esfahani. Effect of separate sampling on classification accuracy. Bioinformatics, 30:242--250, 2014.Google ScholarCross Ref
E. C. R. Shin, D. Song, and R. Moazzezi. Recognizing functions in binaries with neural networks. In USENIX Security, 2015. Google ScholarDigital Library
Y. Shoshitaishvili, R. Wang, C. Hauser, C. Kruegel, and G. Vigna. Firmalice-automatic detection of authentication bypass vulnerabilities in binary firmware. In NDSS, 2015.Google ScholarCross Ref
J. Sivic and A. Zisserman. Video google: A text retrieval approach to object matching in videos. In IEEE International Conference on Computer Vision, 2003. Google ScholarDigital Library
M. Slaney and M. Casey. Locality-sensitive hashing for finding nearest neighbors. Signal Processing Magazine, IEEE, 25(2):128--131, 2008.Google ScholarCross Ref
N. Stephens, J. Grosen, C. Salls, A. Dutcher, and R. Wang. Driller: Augmenting fuzzing through selective symbolic execution. In NDSS, 2016.Google ScholarCross Ref
M. Wall. Galib: A c+ library of genetic algorithm components. Mechanical Engineering Department, Massachusetts Institute of Technology, 87:54, 1996.Google Scholar
R. Weber, H.-J. Schek, and S. Blott. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In VLDB, volume 98, pages 194--205, 1998. Google ScholarDigital Library
F. Yamaguchi, A. Maier, H. Gascon, and K. Rieck. Automatic inference of search patterns for taint-style vulnerabilities. In Oakland, 2015.Google Scholar
J. Yang, Y.-G. Jiang, A. G. Hauptmann, and C.-W. Ngo. Evaluating bag-of-visual-words representations in scene classification. In International workshop on Workshop on multimedia information retrieval, 2007. Google ScholarDigital Library
S.-I. Yu, L. Jiang, Z. Xu, Y. Yang, and A. G. Hauptmann. Content-based video search over 1 million videos with 1 core in 1 second. In ICMR, 2015. Google ScholarDigital Library
J. Zaddach, L. Bruno, A. Francillon, and D. Balzarotti. Avatar: A framework to support dynamic security analysis of embedded systems' firmwares. In NDSS, 2014.Google ScholarCross Ref
M. Zhang, Y. Duan, Q. Feng, and H. Yin. Towards automatic generation of security-centric descriptions for android apps. In CCS, 2015. Google ScholarDigital Library
M. Zhang, Y. Duan, H. Yin, and Z. Zhao. Semantics-Aware Android Malware Classification Using Weighted Contextual API Dependency Graphs. In CCS, 2014. Google ScholarDigital Library

Index Terms

Scalable Graph-based Bug Search for Firmware Images
1. Computer systems organization
  1. Embedded and cyber-physical systems
    1. Embedded systems
      1. Firmware
2. Security and privacy
  1. Systems security
    1. Vulnerability management
      1. Vulnerability scanners

Recommendations

A taxonomy of IoT firmware security and principal firmware analysis techniques
Abstract
Internet of Things (IoT) has come a long way since its inception. However, the standardization process in IoT systems for a secure IoT solution is still in its early days. Numerous quality review articles have been contributed by ...
Graphical abstract

Display Omitted

Read More
Extracting Conditional Formulas for Cross-Platform Bug Search
ASIA CCS '17: Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security

With the recent increase in security breaches in embedded systems and IoT devices, it becomes increasingly important to search for vulnerabilities directly in binary executables in a cross-platform setting. However, very little has been explored in this ...
Read More
Effective Bug Triage Based on Historical Bug-Fix Information
ISSRE '14: Proceedings of the 2014 IEEE 25th International Symposium on Software Reliability Engineering

For complex and popular software, project teams could receive a large number of bug reports. It is often tedious and costly to manually assign these bug reports to developers who have the expertise to fix the bugs. Many bug triage techniques have been ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CCS '16: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security
October 2016
1924 pages
ISBN:9781450341394
DOI:10.1145/2976749
General Chairs:
Edgar Weippl
SBA Research, Austria
,
Stefan Katzenbeisser
TU Darmstadt, CYSEC, Germany
,
Program Chairs:
Christopher Kruegel
University of California, Santa Barbara, USA
,
Andrew Myers
Cornell University, USA
,
Shai Halevi
IBM Research, USA
Copyright © 2016 ACM
© 2016 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 October 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
firmware security
graph encoding
machine learning
Qualifiers
- research-article
Conference

Acceptance Rates
CCS '16 Paper Acceptance Rate137of831submissions,16%Overall Acceptance Rate1,261of6,999submissions,18%
More
Upcoming Conference
CCS '24

Sponsor:

sigsac

ACM SIGSAC Conference on Computer and Communications Security

October 14 - 18, 2024

Salt Lake City , UT , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 244
  Total Citations
  View Citations
- 2,974
  Total Downloads
- Downloads (Last 12 months)290
- Downloads (Last 6 weeks)29
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Scalable Graph-based Bug Search for Firmware Images

CCS '16: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security

ABSTRACT

References

Cited By

Index Terms

Recommendations

A taxonomy of IoT firmware security and principal firmware analysis techniques

Extracting Conditional Formulas for Cross-Platform Bug Search

Effective Bug Triage Based on Historical Bug-Fix Information