research-article

ASSURE: automatic software self-healing using rescue points

Authors:
Stelios Sidiroglou

Columbia University, New York, USA

Columbia University, New York, USA
View Profile

,
Oren Laadan

Columbia University, New York, USA

Columbia University, New York, USA
View Profile

,
Carlos Perez

Columbia University, New York, USA

Columbia University, New York, USA
View Profile

,
Nicolas Viennot

Columbia University, New York, USA

Columbia University, New York, USA
View Profile

,
Jason Nieh

Columbia University, New York, USA

Columbia University, New York, USA
View Profile

,
Angelos D. Keromytis

Columbia University, New York, USA

Columbia University, New York, USA
View Profile

ASPLOS XIV: Proceedings of the 14th international conference on Architectural support for programming languages and operating systemsMarch 2009Pages 37–48https://doi.org/10.1145/1508244.1508250

Published:07 March 2009Publication History

ASPLOS XIV: Proceedings of the 14th international conference on Architectural support for programming languages and operating systems

Pages 37–48

ABSTRACT

Software failures in server applications are a significant problem for preserving system availability. We present ASSURE, a system that introduces rescue points that recover software from unknown faults while maintaining both system integrity and availability, by mimicking system behavior under known error conditions. Rescue points are locations in existing application code for handling a given set of programmer-anticipated failures, which are automatically repurposed and tested for safely enabling fault recovery from a larger class of (unanticipated) faults. When a fault occurs at an arbitrary location in the program, ASSURE restores execution to an appropriate rescue point and induces the program to recover execution by virtualizing the program's existing error-handling facilities. Rescue points are identified using fuzzing, implemented using a fast coordinated checkpoint-restart mechanism that handles multi-process and multi-threaded applications, and, after testing, are injected into production code using binary patching. We have implemented an ASSURE Linux prototype that operates without application source code and without base operating system kernel changes. Our experimental results on a set of real-world server applications and bugs show that ASSURE enabled recovery for all of the bugs tested with fast recovery times, has modest performance overhead, and provides automatic self-healing orders of magnitude faster than current human-driven patch deployment methods.

References

M. Abadi, M. Budiu, U. Erlingsson, and J. Ligatti. Control-flow Integrity. In Proceedings of the ACM conference on Computer and Communications Security (CCS), pages 340--353, November 2005. Google ScholarDigital Library
J. Boyd. Patterns of Conflict. Unpublished briefing, http://www.d-n-i.net/boyd/pdf/poc.pdf, 1986.Google Scholar
T. C. Bressoud and F. B. Schneider. Hypervisor-based fault tolerance. ACM Trans. Comput. Syst., 14(1):80--107, 1996. Google ScholarDigital Library
D. Brumley, H. Wang, S. Jha, and D. Song. Creating vulnerability signatures using weakest pre-conditions. In Proceedings of the 2007 Computer Security Foundations Symposium, Venice, Italy, July 2007. Google ScholarDigital Library
B. Buck and J. K. Hollingsworth. An API for runtime code patching. The International Journal of High Performance Computing Applications, 14(4):317--329, Winter 2000. Google ScholarDigital Library
G. Candea and A. Fox. Crash-only software. In Proceedings of the 9th Workshop on Hot Topics in Operating Systems, May 2003. Google ScholarDigital Library
S. Chandra. An evaluation of the recovery-related properties of Software Faults. PhD thesis, University of Michigan, 2000. Google ScholarDigital Library
M. Costa, J. Crowcroft, M. Castro, and A. Rowstron. Vigilante: End-to-End Containment of Internet Worms. In Proceedings of the ACM Symposium on Systems and Operating Systems Principles (SOSP), December 2005. Google ScholarDigital Library
B. Demsky and M. C. Rinard. Automatic detection and repair of errors in data structures. In Proceedings of the ACM Conference on Object Oriented Programming, Systems, Languages, and Applications (OOPSLA), October 2003. Google ScholarDigital Library
J. Etoh. GCC extension for protecting applications from stack-smashing attacks. http://www.trl.ibm.com/projects/security/ssp/.Google Scholar
S. T. King, G. W. Dunlap, and P. M. Chen. Debugging operating systems with time-traveling virtual machines. In Proceedings of the USENIX Technical Conference, 2005. Google ScholarDigital Library
V. Kiriansky, D. Bruening, and S. Amarasinghe. Secure execution via program shepherding. In Proceedings of the USENIX Security Symposium, August 2002. Google ScholarDigital Library
N. Kolettis and N. D. Fulton. Software rejuvenation: analysis, module and applications. In FTCS '95: Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing, page 381, Washington, DC, USA, 1995. IEEE Computer Society. Google ScholarDigital Library
O. Laadan and J. Nieh. Transparent checkpoint-restart of multiple processes on commodity operating systems. In Proceedings of the USENIX Technical Conference, 2007. Google ScholarDigital Library
B. Miller, L. Fredriksen, and B. So. An empirical study of the reliability of unix utilities. Communications of the ACM, 33(12), December 1990. Google ScholarDigital Library
J. Newsome, D. Brumley, and D. Song. Vulnerability-specific execution filtering for exploit prevention on commodity software. In Proceedings of the Symposium on Network and Distributed System Security (SNDSS), February 2006.Google Scholar
National Vulnerability Database. http://nvd.nist.gov/statistics.cfm, April 2006.Google Scholar
S. Osman, D. Subhraveti, G. Su, and J. Nieh. The design and implementation of Zap: A system for migrating computing environments. In Proceedings of the 5th USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 361--376, December 2002. Google ScholarDigital Library
PaX Project. Address space layout randomization, Mar 2003. http://pageexec.virtualave.net/docs/aslr.txt.Google Scholar
A. D. Roelker. Snort 2.0: Protocol flow analyzer.Google Scholar
S. Sidiroglou, Y. Giovanidis, and A. Keromytis. A dynamic mechanism for recovery from buffer overflow attacks. In Proceedings of the Information Security Conference (ISC), September 2005. Google ScholarDigital Library
S. Sidiroglou, M. E. Locasto, S. W. Boyd, and A. D. Keromytis. Building a reactive immune system for software services. In Proceedings of the USENIX Technical Conference, April 2005. Google ScholarDigital Library
Y. Song, M. E. Locasto, A. Stavrou, A. D. Keromytis, and S. J. Stolfo. On the infeasibility of modeling polymorphic shellcode. In Proceedings of the 14th ACM conference on Computer and communications security (CCS), 2007. Google ScholarDigital Library
M. Sullivan and R. Chillarege. Software defects and their impact on system availability -- a study of field failures in operating systems. 21st Int. Symp. on Fault-Tolerant Computing (FTCS--21), pages 2--9, 1991.Google ScholarCross Ref
J. Tucek, J. Newsome, S. Lu, C. Huang, S. Xanthos, D. Brumley, Y. Zhou, and D. Song. Sweeper: a lightweight end-to-end system for defending against fast worms. In Proceedings of the ACM SIGOPS/EuroSys European Conference on Computer Systems (EUROSYS), 2007. Google ScholarDigital Library
H. J. Wang, C. Guo, D. R. Simon, and A. Zugenmaier. Shield: vulnerability-driven network filters for Preventing Known Vulnerability Exploits. In Proceedings of the ACM SIGCOMM Conference, August 2004. Google ScholarDigital Library
V. Paxson. Bro: a system for detecting network intruders in real-time. Computer Networks (Amsterdam, Netherlands: 1999), 31(23-24):2435--2463, 1999. Google ScholarDigital Library
F. Qin, J. Tucek, J. Sundaresan, and Y. Zhou. Rx: treating bugs as allergies -- a safe method to survive software failures. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP), October 2005. Google ScholarDigital Library
E. Rescorla. Security holes.. Who cares? In Proceedings of the 12th USENIX Security Symposium, Washington, D.C., 2003. Google ScholarDigital Library
M. Rinard. Acceptability-oriented Computing. In Proceedings of ACM Conference on Object Oriented Programming, Systems, Languages, and Applications, October 2003. Google ScholarDigital Library
M. Rinard, C. Cadar, D. Dumitran, D. Roy, T. Leu, and J. W Beebee. Enhancing server availability and security through Failure-Oblivious Computing. In Proceedings of the Symposium on Operating Systems Design and Implementation (OSDI), December 2004. Google ScholarDigital Library

Index Terms

ASSURE: automatic software self-healing using rescue points

Recommendations

ASSURE: automatic software self-healing using rescue points
ASPLOS 2009

Software failures in server applications are a significant problem for preserving system availability. We present ASSURE, a system that introduces rescue points that recover software from unknown faults while maintaining both system integrity and ...
Read More
ASSURE: automatic software self-healing using rescue points
ASPLOS 2009

Software failures in server applications are a significant problem for preserving system availability. We present ASSURE, a system that introduces rescue points that recover software from unknown faults while maintaining both system integrity and ...
Read More
Self-healing multitier architectures using cascading rescue points
ACSAC '12: Proceedings of the 28th Annual Computer Security Applications Conference

Software bugs and vulnerabilities cause serious problems to both home users and the Internet infrastructure, limiting the availability of Internet services, causing loss of data, and reducing system integrity. Software self-healing using rescue points (...
Read More

Reviews

Reviewer: Rafael Corchuelo

Software crashes due to programming bugs constitute a major problem for systems that need to be available 24 hours a day, seven days a week. Many authors are researching techniques that allow computer applications to detect their own failures and recover from them automatically. "Recovering" means that the application rolls back to a safe state and returns a reasonable error code to the client; recovering in such a way improves system availability, while an appropriate patch is created by a programmer. Sidiroglou et al. have devised a tool called ASSURE that allows server applications that run on Linux 2.6 systems to recover from their failures. ASSURE is innovative insofar as it can deal with applications that are available in binary form only, run on multiple threads and processes, handle polymorphic or encrypted input, or have deterministic bugs (not necessarily memory leaks); furthermore, it does not require any modifications to the underlying operating system. The authors have tested ASSURE on a number of actual bugs in well-known server applications, such as Apache, Squid, and MySQL. They prove that the tool is very efficient. ASSURE builds on so-called rescue points, which are functions that return integer error codes or null pointers when a known error is detected. Rescue points and error codes are identified automatically by running the application in a testing environment where it is fed invalid inputs. When a failure is detected for the first time, ASSURE analyzes the problem in a sandbox. First, it determines what function failed. Then, it identifies the closest rescue point to which the application can be rolled back to keep working well. A piece of code is then inserted at this rescue point that returns an error code, thus preventing the application from continuing and failing. Finally, the application is restarted. The paper is not at all difficult to read, although it does not provide enough details for other scientists to repeat the work. The authors' writing style is didactic and they get to the point very straightforwardly. They also make it very clear what their original contributions are. I recommend this paper and ASSURE for system administrators who need to keep their servers highly available. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ASPLOS XIV: Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
March 2009
358 pages
ISBN:9781605584065
DOI:10.1145/1508244
General Chair:
Mary Lou Soffa
University of Virginia, USA
,
Program Chair:
Mary Jane Irwin
Penn State University, USA
ACM SIGPLAN Notices Volume 44, Issue 3
ASPLOS 2009
March 2009
346 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1508284
Issue’s Table of Contents
ACM SIGARCH Computer Architecture News Volume 37, Issue 1
ASPLOS 2009
March 2009
346 pages
ISSN:0163-5964
DOI:10.1145/2528521
Issue’s Table of Contents
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 March 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
binary patching
chekpoint restart
error recovery
reliable software
software self-healing
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate535of2,713submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 177
  Total Citations
  View Citations
- 1,259
  Total Downloads
- Downloads (Last 12 months)36
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

ASSURE: automatic software self-healing using rescue points

ASPLOS XIV: Proceedings of the 14th international conference on Architectural support for programming languages and operating systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

ASSURE: automatic software self-healing using rescue points

ASSURE: automatic software self-healing using rescue points

Self-healing multitier architectures using cascading rescue points

Reviews

Access critical reviews of Computing literature here