ABSTRACT
Software failures in server applications are a significant problem for preserving system availability. We present ASSURE, a system that introduces rescue points that recover software from unknown faults while maintaining both system integrity and availability, by mimicking system behavior under known error conditions. Rescue points are locations in existing application code for handling a given set of programmer-anticipated failures, which are automatically repurposed and tested for safely enabling fault recovery from a larger class of (unanticipated) faults. When a fault occurs at an arbitrary location in the program, ASSURE restores execution to an appropriate rescue point and induces the program to recover execution by virtualizing the program's existing error-handling facilities. Rescue points are identified using fuzzing, implemented using a fast coordinated checkpoint-restart mechanism that handles multi-process and multi-threaded applications, and, after testing, are injected into production code using binary patching. We have implemented an ASSURE Linux prototype that operates without application source code and without base operating system kernel changes. Our experimental results on a set of real-world server applications and bugs show that ASSURE enabled recovery for all of the bugs tested with fast recovery times, has modest performance overhead, and provides automatic self-healing orders of magnitude faster than current human-driven patch deployment methods.
- M. Abadi, M. Budiu, U. Erlingsson, and J. Ligatti. Control-flow Integrity. In Proceedings of the ACM conference on Computer and Communications Security (CCS), pages 340--353, November 2005. Google ScholarDigital Library
- J. Boyd. Patterns of Conflict. Unpublished briefing, http://www.d-n-i.net/boyd/pdf/poc.pdf, 1986.Google Scholar
- T. C. Bressoud and F. B. Schneider. Hypervisor-based fault tolerance. ACM Trans. Comput. Syst., 14(1):80--107, 1996. Google ScholarDigital Library
- D. Brumley, H. Wang, S. Jha, and D. Song. Creating vulnerability signatures using weakest pre-conditions. In Proceedings of the 2007 Computer Security Foundations Symposium, Venice, Italy, July 2007. Google ScholarDigital Library
- B. Buck and J. K. Hollingsworth. An API for runtime code patching. The International Journal of High Performance Computing Applications, 14(4):317--329, Winter 2000. Google ScholarDigital Library
- G. Candea and A. Fox. Crash-only software. In Proceedings of the 9th Workshop on Hot Topics in Operating Systems, May 2003. Google ScholarDigital Library
- S. Chandra. An evaluation of the recovery-related properties of Software Faults. PhD thesis, University of Michigan, 2000. Google ScholarDigital Library
- M. Costa, J. Crowcroft, M. Castro, and A. Rowstron. Vigilante: End-to-End Containment of Internet Worms. In Proceedings of the ACM Symposium on Systems and Operating Systems Principles (SOSP), December 2005. Google ScholarDigital Library
- B. Demsky and M. C. Rinard. Automatic detection and repair of errors in data structures. In Proceedings of the ACM Conference on Object Oriented Programming, Systems, Languages, and Applications (OOPSLA), October 2003. Google ScholarDigital Library
- J. Etoh. GCC extension for protecting applications from stack-smashing attacks. http://www.trl.ibm.com/projects/security/ssp/.Google Scholar
- S. T. King, G. W. Dunlap, and P. M. Chen. Debugging operating systems with time-traveling virtual machines. In Proceedings of the USENIX Technical Conference, 2005. Google ScholarDigital Library
- V. Kiriansky, D. Bruening, and S. Amarasinghe. Secure execution via program shepherding. In Proceedings of the USENIX Security Symposium, August 2002. Google ScholarDigital Library
- N. Kolettis and N. D. Fulton. Software rejuvenation: analysis, module and applications. In FTCS '95: Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing, page 381, Washington, DC, USA, 1995. IEEE Computer Society. Google ScholarDigital Library
- O. Laadan and J. Nieh. Transparent checkpoint-restart of multiple processes on commodity operating systems. In Proceedings of the USENIX Technical Conference, 2007. Google ScholarDigital Library
- B. Miller, L. Fredriksen, and B. So. An empirical study of the reliability of unix utilities. Communications of the ACM, 33(12), December 1990. Google ScholarDigital Library
- J. Newsome, D. Brumley, and D. Song. Vulnerability-specific execution filtering for exploit prevention on commodity software. In Proceedings of the Symposium on Network and Distributed System Security (SNDSS), February 2006.Google Scholar
- National Vulnerability Database. http://nvd.nist.gov/statistics.cfm, April 2006.Google Scholar
- S. Osman, D. Subhraveti, G. Su, and J. Nieh. The design and implementation of Zap: A system for migrating computing environments. In Proceedings of the 5th USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 361--376, December 2002. Google ScholarDigital Library
- PaX Project. Address space layout randomization, Mar 2003. http://pageexec.virtualave.net/docs/aslr.txt.Google Scholar
- A. D. Roelker. Snort 2.0: Protocol flow analyzer.Google Scholar
- S. Sidiroglou, Y. Giovanidis, and A. Keromytis. A dynamic mechanism for recovery from buffer overflow attacks. In Proceedings of the Information Security Conference (ISC), September 2005. Google ScholarDigital Library
- S. Sidiroglou, M. E. Locasto, S. W. Boyd, and A. D. Keromytis. Building a reactive immune system for software services. In Proceedings of the USENIX Technical Conference, April 2005. Google ScholarDigital Library
- Y. Song, M. E. Locasto, A. Stavrou, A. D. Keromytis, and S. J. Stolfo. On the infeasibility of modeling polymorphic shellcode. In Proceedings of the 14th ACM conference on Computer and communications security (CCS), 2007. Google ScholarDigital Library
- M. Sullivan and R. Chillarege. Software defects and their impact on system availability -- a study of field failures in operating systems. 21st Int. Symp. on Fault-Tolerant Computing (FTCS--21), pages 2--9, 1991.Google ScholarCross Ref
- J. Tucek, J. Newsome, S. Lu, C. Huang, S. Xanthos, D. Brumley, Y. Zhou, and D. Song. Sweeper: a lightweight end-to-end system for defending against fast worms. In Proceedings of the ACM SIGOPS/EuroSys European Conference on Computer Systems (EUROSYS), 2007. Google ScholarDigital Library
- H. J. Wang, C. Guo, D. R. Simon, and A. Zugenmaier. Shield: vulnerability-driven network filters for Preventing Known Vulnerability Exploits. In Proceedings of the ACM SIGCOMM Conference, August 2004. Google ScholarDigital Library
- V. Paxson. Bro: a system for detecting network intruders in real-time. Computer Networks (Amsterdam, Netherlands: 1999), 31(23-24):2435--2463, 1999. Google ScholarDigital Library
- F. Qin, J. Tucek, J. Sundaresan, and Y. Zhou. Rx: treating bugs as allergies -- a safe method to survive software failures. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP), October 2005. Google ScholarDigital Library
- E. Rescorla. Security holes.. Who cares? In Proceedings of the 12th USENIX Security Symposium, Washington, D.C., 2003. Google ScholarDigital Library
- M. Rinard. Acceptability-oriented Computing. In Proceedings of ACM Conference on Object Oriented Programming, Systems, Languages, and Applications, October 2003. Google ScholarDigital Library
- M. Rinard, C. Cadar, D. Dumitran, D. Roy, T. Leu, and J. W Beebee. Enhancing server availability and security through Failure-Oblivious Computing. In Proceedings of the Symposium on Operating Systems Design and Implementation (OSDI), December 2004. Google ScholarDigital Library
Index Terms
- ASSURE: automatic software self-healing using rescue points
Recommendations
ASSURE: automatic software self-healing using rescue points
ASPLOS 2009Software failures in server applications are a significant problem for preserving system availability. We present ASSURE, a system that introduces rescue points that recover software from unknown faults while maintaining both system integrity and ...
ASSURE: automatic software self-healing using rescue points
ASPLOS 2009Software failures in server applications are a significant problem for preserving system availability. We present ASSURE, a system that introduces rescue points that recover software from unknown faults while maintaining both system integrity and ...
Self-healing multitier architectures using cascading rescue points
ACSAC '12: Proceedings of the 28th Annual Computer Security Applications ConferenceSoftware bugs and vulnerabilities cause serious problems to both home users and the Internet infrastructure, limiting the availability of Internet services, causing loss of data, and reducing system integrity. Software self-healing using rescue points (...
Comments