ABSTRACT
Fault tolerant design can help autonomous vehicle systems address defects, environmental changes and security attacks. Checkpoint and restoration fault tolerance techniques save a copy of an application's state before a problem occurs and restore that state afterwards. However, traditional Checkpoint/Restore techniques still admit high overhead, may carry along tainted data, and rarely operate in tandem with human-written or automated repairs that modify source code or alter data layout. Thus, it can be difficult to apply traditional Checkpoint/Restore techniques to solve the issues of non-environmental defects, security attacks or software bugs. To address such challenges, in this paper, we propose and evaluate a selective checkpoint and restore (SCR) technique that records only critical system state based on types and minimal symbolic annotations to deploy repaired programs. We found that using source-level symbolic information allows an application to be resumed even after its code is modified in our evaluation. We evaluate our approach using a commodity autonomous vehicle system and demonstrate that it admits manual and automated software repairs, does not carry tainted data, and has low overhead.
- J. Ansel, K. Arya, and G. Cooperman. DMTCP: Transparent checkpointing for cluster computations and the desktop. In IEEE International Symposium on Parallel & Distributed Processing, pages 1--12, 2009.Google ScholarDigital Library
- T. Ball and S. K. Rajamani. SLIC: A specification language for interface checking (of C). Technical Report MSR-TR-2001--21, Microsoft Research, 2001.Google Scholar
- C. Bird, N. Nagappan, B. Murphy, H. Gall, and P. Devanbu. Don't touch my code!: Examining the effects of ownership on software quality. In Foundations of Software Engineering, pages 4--14, New York, NY, USA, 2011. ACM.Google Scholar
- G. Bronevetsky, D. Marques, K. Pingali, and P. Stodghill. Automated application-level checkpointing of mpi programs. In ACM Sigplan Notices, volume 38, pages 84--94. ACM, 2003.Google ScholarDigital Library
- V. Chandola, A. Banerjee, and V. Kumar. Anomaly detection: A survey. ACM Computing Surveys (CSUR), 41(3):15, 2009.Google Scholar
- Y. Chen, J. S. Plank, and K. Li. Clip: A checkpointing tool for message-passing parallel programs. In Supercomputing, pages 1--11, 1997.Google Scholar
- F. DeMarco, J. Xuan, D. Le Berre, and M. Monperrus. Automatic repair of buggy if conditions and missing preconditions with smt. In International Workshop on Constraints in Software Testing, Verification, and Analysis, pages 30--39, 2014.Google ScholarDigital Library
- C. Flanagan and K. R. M. Leino. Houdini, an annotation assistant for ESC/Java. In International Symposium of Formal Methods Europe, pages 500--517, 2001.Google ScholarDigital Library
- L. Gazzola, D. Micucci, and L. Mariani. Automatic software repair: a survey. In International Conference on Software Engineering, pages 12--19, 2018.Google ScholarDigital Library
- D. Greenfieldboyce and J. S. Foster. Type qualifier inference for Java. In ACM SIGPLAN Notices, volume 42, pages 321--336. ACM, 2007.Google ScholarDigital Library
- E. Hendriks. Bproc: The beowulf distributed process space. In Supercomputing, pages 129--136, 2002.Google Scholar
- M. Hicks and S. Nettles. Dynamic software updating. Trans. Programming Languages and Systems, 27(6):1049--1096, 2005.Google ScholarDigital Library
- K. Highnam, K. Angstadt, K. Leach, W. Weimer, A. Paulos, and P. Hurley. An uncrewed aerial vehicle attack scenario and trustworthy repair architecture. In Dependable Systems and Networks, pages 222--225, 2016.Google ScholarCross Ref
- R. Hund, C. Willems, and T. Holz. Practical timing side channel attacks against kernel space aslr. In Security and Privacy, pages 191--205, 2013.Google Scholar
- D. Jackson. Alloy: a lightweight object modelling notation. Trans. Software Engineering and Methodology, 11(2):256--290, 2002.Google ScholarDigital Library
- A. Y. Javaid. Cyber security threat analysis and attack simulation for unmanned aerial vehicle network. PhD thesis, University of Toledo, 2015.Google Scholar
- A. Y. Javaid, W. Sun, V. K. Devabhaktuni, and M. Alam. Cyber security threat analysis and modeling of an unmanned aerial vehicle system. In Technologies for Homeland Security, pages 585--590, 2012.Google ScholarCross Ref
- Z. T. Kalbarczyk, R. K. Iyer, S. Bagchi, and K. Whisnant. Chameleon: A software infrastructure for adaptive fault tolerance. Trans. Parallel and Distributed Systems, 10(6):560--579, 1999.Google ScholarDigital Library
- G. S. Kc, A. D. Keromytis, and V. Prevelakis. Countering code-injection attacks with instruction-set randomization. In Computer and Communications Security, pages 272--280, 2003.Google ScholarDigital Library
- A. J. Kennedy. Functional pearl pickler combinators. Journal of Functional Programming, 14(6):727--739, 2004.Google ScholarDigital Library
- G. Kiczales, J. Lamping, A. Mendhekar, C. Maeda, C. Lopes, J.-M. Loingtier, and J. Irwin. Aspect-oriented programming. In European Conference on Object-Oriented Programming, pages 220--242. Springer, 1997.Google ScholarCross Ref
- A. Kim, B. Wampler, J. Goppert, I. Hwang, and H. Aldridge. Cyber attack vulnerabilities analysis for unmanned aerial vehicles. In Infotech@ Aerospace, pages 1--30, 2012.Google Scholar
- J. Kodumal and A. Aiken. Banshee: A scalable constraint-based analysis toolkit. In International Static Analysis Symposium, pages 218--234. Springer, 2005.Google ScholarDigital Library
- O. Laadan and S. E. Hallyn. Linux-cr: Transparent application checkpoint-restart in linux. In Linux Symposium, volume 159, 2010.Google Scholar
- C. Le Goues, M. Dewey-Vogt, S. Forrest, and W. Weimer. A systematic study of automated program repair: Fixing 55 out of 105 bugs for $8 each. In International Conference on Software Engineering, pages 3--13, 2012.Google ScholarDigital Library
- F. Long and M. Rinard. Automatic patch generation by learning correct code. SIGPLAN Notices, 51(1):298--312, 2016.Google ScholarDigital Library
- D. E. Lowell, S. Chandra, and P. M. Chen. Exploring failure transparency and the limits of generic recovery. In Operating System Design and Implementation, page 20, 2000.Google Scholar
- M. Martinez, T. Durieux, R. Sommerard, J. Xuan, and M. Monperrus. Automatic repair of real bugs in Java: A large-scale experiment on the defects4j dataset. Empirical Software Engineering, 22(4):1936--1964, 2017.Google ScholarDigital Library
- J. A. Marty. Vulnerability analysis of the mavlink protocol for command and control of unmanned aircraft. Technical report, Air Force Institute of Technology, 2013.Google Scholar
- S. Mechtaev, J. Yi, and A. Roychoudhury. Angelix: Scalable multiline program patch synthesis via symbolic analysis. In International Conference on Software Engineering, pages 691--701, 2016.Google Scholar
- M. Monperrus. Automatic software repair: a bibliography. ACM Computing Surveys (CSUR), 51(1):17, 2018.Google Scholar
- S. Narayanasamy, C. Pereira, H. Patil, R. Cohn, and B. Calder. Automatic logging of operating system effects to guide application-level architecture simulation. Performance Evaluation Review, 34(1):216--227, 2006.Google ScholarDigital Library
- H. D. T. Nguyen, D. Qi, A. Roychoudhury, and S. Chandra. Semfix: Program repair via semantic analysis. In International Conference on Software Engineering, pages 772--781, 2013.Google Scholar
- S. Osman, D. Subhraveti, G. Su, and J. Nieh. The design and implementation of zap: A system for migrating computing environments. ACM SIGOPS Operating Systems Review, 36(SI):361--376, 2002.Google Scholar
- Pavel Emelyanov. Checkpoint/restore in userspace. In https://criu.org, 2012.Google Scholar
- L. Perkov, N. Pavković, and J. Petrović. High-availability using open source software. In Information and Communication Technology, Electronics and Microelectronics, pages 167--170, 2011.Google Scholar
- E. Pinheiro. Epckpt: Eduardo pinheiro checkpoint project, 2004.Google Scholar
- J. S. Plank, M. Beck, G. Kingsley, and K. Li. Libckpt: Transparent Checkpointing under Unix. January 1995.Google Scholar
- M. L. Powell and B. P. Miller. Process migration in demos/mp. In Operating Systems Review, volume 17, 1983.Google ScholarDigital Library
- Y. Qi, X. Mao, and Y. Lei. Efficient automated program repair through fault-recorded testing prioritization. In International Conference on Software Maintenance, pages 180--189, 2013.Google ScholarDigital Library
- Z. Qi, F. Long, S. Achour, and M. Rinard. An analysis of patch plausibility and correctness for generate-and-validate patch generation systems. In International Symposium on Software Testing and Analysis, pages 24--36, 2015.Google ScholarDigital Library
- High-assurance cyber military systems (HACMS). https://www.darpa.mil/program/high-assurance-cyber-military-systems, 2015.Google Scholar
- F. Rahman and P. Devanbu. Ownership, experience and defects: A fine-grained study of authorship. In International Conference on Software Engineering, pages 491--500, 2011.Google Scholar
- R. Ramey. boost c++ libraries, 2004.Google Scholar
- G. A. Reis, J. Chang, N. Vachharajani, R. Rangan, and D. I. August. SWIFT: Software implemented fault tolerance. In Code Generation and Optimization, pages 243--254, 2005.Google ScholarDigital Library
- P. N. Robillard. The role of knowledge in software development. Communications of the ACM, 42(1):87--92, Jan 1999.Google ScholarDigital Library
- M. Roesch et al. Snort: Lightweight intrusion detection for networks. In Lisa, volume 99, pages 229--238, 1999.Google ScholarDigital Library
- P. M. Rondon, M. Kawaguchi, and R. Jhala. Low-level liquid types. In Sigplan Notices, volume 45, pages 131--144, 2010.Google ScholarDigital Library
- J. C. Sancho, F. Petrini, K. Davis, R. Gioiosa, and S. Jiang. Current practice and a direction forward in checkpoint/restart implementations for fault tolerance. In Parallel and Distributed Processing Symposium, 2005.Google ScholarDigital Library
- S. Sankaran, J. M. Squyres, B. Barrett, V. Sahay, A. Lumsdaine, J. Duell, P. Hargrove, and E. Roman. The lam/mpi checkpoint/restart framework: System-initiated checkpointing. J. High Performance Computing Applications, 19(4):479--493, 2005.Google ScholarDigital Library
- F. B. Schneider. Byzantine generals in action: Implementing fail-stop processors. Trans. Computer Systems, 2(2):145--154, 1984.Google ScholarDigital Library
- M. E. Segal and O. Frieder. On-the-fly program modification: Systems for dynamic updating. IEEE Software, 10(2):53--65, 1993.Google ScholarDigital Library
- U. Shankar, K. Talwar, J. S. Foster, and D. Wagner. Detecting format string vulnerabilities with type qualifiers. In USENIX Security Symposium, pages 201--220, 2001.Google ScholarDigital Library
- D. P. Shepard, J. A. Bhatti, T. E. Humphreys, and A. A. Fansler. Evaluation of smart grid and civilian uav vulnerability to gps spoofing attacks. In Radionavigation Laboratory Conference Proceedings, 2012.Google Scholar
- J. M. Spivey and J. Abrial. The Z notation. Prentice Hall Hemel Hempstead, 1992.Google ScholarDigital Library
- V. L. Thing and J. Wu. Autonomous vehicle security: A taxonomy of attacks and defences. In Cyber, Physical and Social Computing, pages 164--170, 2016.Google ScholarCross Ref
- Y. Tian, K. Pei, S. Jana, and B. Ray. Deeptest: Automated testing of deep-neural-network-driven autonomous cars. In International Conference on Software Engineering, pages 303--314, 2018.Google ScholarDigital Library
- C. Timperley and C. Le Goues. Darjeeling: a language-agnostic search-based program repair tool. In https://github.com/squaresLab/Darjeeling, 2020.Google Scholar
- R. van Tonder and C. Le Goues. Static automated program repair for heap properties. In International Conference on Software Engineering, pages 151--162, 2018.Google ScholarDigital Library
- Vmadump. https://bproc.sourceforge.net, 2002.Google Scholar
- Y.-M. Wang, Y. Huang, K.-P. Vo, P.-Y. Chung, and C. Kintala. Checkpointing and its applications. In Fault-Tolerant Computing, pages 00--22, 1995.Google Scholar
- W. Weimer, S. Forrest, M. Kim, C. Le Goues, and P. Hurley. Trusted software repair for system resiliency. In 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops, pages 238--241, 2016.Google ScholarCross Ref
- W. Weimer, T. Nguyen, C. Le Goues, and S. Forrest. Automatically finding patches using genetic programming. In International Conference on Software Engineering, pages 364--374, 2009.Google ScholarDigital Library
- A. M. Wyglinski, X. Huang, T. Padir, L. Lai, T. R. Eisenbarth, and K. Venkatasubramanian. Security of autonomous systems employing embedded computing and sensors. IEEE Micro, 33(1):80--86, 2013.Google ScholarDigital Library
- H. Zhong and J. Nieh. Crak: Linux checkpoint/restart as a kernel module. Technical report, Technical Report CUCS-014--01, Department of Computer Science, Columbia University, 2001.Google Scholar
Index Terms
- Selective Symbolic Type-Guided Checkpointing and Restoration for Autonomous Vehicle Repair
Recommendations
Multilevel Diskless Checkpointing
Extreme scale systems available before the end of this decade are expected to have 100 million to 1 billion CPU cores. The probability that a failure occurs during an application execution is expected to be much higher than today's systems. ...
N-Level Diskless Checkpointing
HPCC '09: Proceedings of the 2009 11th IEEE International Conference on High Performance Computing and CommunicationsDiskless checkpointing is an efficient technique to tolerate a small number of processor failures in large parallel and distributed systems. In literature, a simultaneous failure of no more than N processors is often tolerated by using a one-level Reed-...
Optimal checkpointing interval of a communication system with rollback recovery
This paper considers a communication system which consists of many processors and studies the problem for improving its reliability by adopting the recovery techniques of checkpoint and rollback. When either processor failure or communication error has ...
Comments