ABSTRACT
As robotic and autonomy systems become progressively more present in industrial and human-interactive applications, it is increasingly critical for them to behave safely in the presence of unexpected inputs. While robustness testing for traditional software systems is long-studied, robustness testing for autonomy systems is relatively uncharted territory. In our role as engineers, testers, and researchers we have observed that autonomy systems are importantly different from traditional systems, requiring novel approaches to effectively test them. We present Automated Stress Testing for Autonomy Architectures (ASTAA), a system that effectively, automatically robustness tests autonomy systems by building on classic principles, with important innovations to support this new domain. Over five years, we have used ASTAA to test 17 real-world autonomy systems, robots, and robotics-oriented libraries, across commercial and academic applications, discovering hundreds of bugs. We outline the ASTAA approach and analyze more than 150 bugs we found in real systems. We discuss what we discovered about testing autonomy systems, specifically focusing on how doing so differs from and is similar to traditional software robustness testing and other high-level lessons.
- European Space Agency. 2016. Schiaparelli Landing Investigation Makes Progress. (23 Nov. 2016). Retrieved January 17, 2017 from http://www.esa.int/Our_Activities/Space_Science/ExoMars/Schiaparelli_landing_investigation_makes_progressGoogle Scholar
- Assistant Secretary of Defense for Research and Engineering (ASD(R&E)). 2011. Technology Readiness Assessment (TRA) Guidance. Technical Report. U.S. Department of Defense. http://www.acq.osd.mil/chieftechnologist/publications/docs/TRA2011.pdfGoogle Scholar
- Greg Banks, Marco Cova, Viktoria Felmetsger, Kevin Almeroth, Richard Kemmerer, and Giovanni Vigna. 2006. SNOOZE: Toward a Stateful NetwOrk prOtocol fuzZEr". In Proceedings of the 9th International Conference on Information Security (ISC '06). 343--358. Google ScholarDigital Library
- Antonia Bertolino. 2003. Software testing research and practice. In International Workshop on Abstract State Machines. Springer, 1--21. Google ScholarDigital Library
- J. Carlson and R. R. Murphy. 2005. How UGVs physically fail in the field. IEEE Transactions on Robotics 21, 3 (June 2005), 423--437. Google ScholarDigital Library
- Paul Caspi and Alain Girault. 1995. Execution of distributed reactive systems. In European Conference on Parallel Processing. Springer, 13--26. Google ScholarDigital Library
- S. R. Choudhary, A. Gorla, and A. Orso. 2015. Automated Test Input Generation for Android: Are We There Yet? (E). In 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE). 429--440.Google Scholar
- Hoang-Nam Chu. 2011. Test and evaluation of the robustness of the functional layer of an autonomous robot. Ph.D. Dissertation. Institut National Polytechnique de Toulouse - INPT. https://tel.archives-ouvertes.fr/tel-00627225Google Scholar
- Christoph Csallner and Yannis Smaragdakis. 2004. JCrasher: an automatic robustness tester for Java. Software: Practice and Experience 34, 11 (2004), 1025--1050. Google ScholarDigital Library
- Siddhartha R Dalal, Ashish Jain, Nachimuthu Karunanithi, JM Leaton, Christopher M Lott, Gardner C Patton, and Bruce M Horowitz. 1999. Model-based testing in practice. In International Conference on Software Engineering (ICSE '99). 285--294. Google ScholarDigital Library
- John DeVale and Philip J. Koopman, Jr. 2002. Robust Software - No More Excuses. In International Conference on Dependable Systems and Networks (DSN '02). IEEE, 145--154. Google ScholarDigital Library
- Arilo C Dias Neto, Rajesh Subramanyan, Marlon Vieira, and Guilherme H Travassos. 2007. A survey on model-based testing approaches: a systematic review. In 1st ACM international workshop on Empirical assessment of software engineering languages and technologies: held in conjunction with ASE 2007. 31--36. Google ScholarDigital Library
- Matthew B. Dwyer, George S. Avrunin, and James C. Corbett. 1999. Patterns in Property Specifications for Finite-state Verification. In Proceedings of the 21st International Conference on Software Engineering (ICSE '99). 411--420. Google ScholarDigital Library
- Christof Fetzer and Zhen Xiao. 2002. An automated approach to increasing the robustness of C libraries. In International Conference on Dependable Systems and Networks (DSN '02). IEEE, 155--164. Google ScholarDigital Library
- Anup K Ghosh and Matthew Schmid. 1999. An approach to testing COTS software for robustness to operating system exceptions and errors. In Proceedings of the 10th International Symposium on Software Reliability Engineering. 166--174. Google ScholarDigital Library
- David Goldberg. 1991. What every computer scientist should know about floating-point arithmetic. ACM Computing Surveys (CSUR) 23, 1 (1991), 5--48. Google ScholarDigital Library
- Alwyn Goodloe and Lee Pike. 2010. Monitoring Distributed Real-Time Systems: A Survey and Future Directions. Technical Report NASA/CR-2010-216724. NASA Langley Research Center.Google Scholar
- Serge Gorbunov and Arnold Rosenbloom. 2010. Autofuzz: Automated network protocol fuzzing framework. IJCSNS 10, 8 (2010), 239.Google Scholar
- Aaron Kane. 2015. Runtime Monitoring for Safety-Critical Embedded Systems. Ph.D. Dissertation. Carnegie Mellon University.Google Scholar
- Aaron Kane, Omar Chowdhury, Anupam Datta, and Philip Koopman. 2015. A Case Study on Runtime Monitoring of an Autonomous Research Vehicle (ARV) System. In Proceedings of the 6th International Conference on Runtime Verification (RV '15), Ezio Bartocci and Rupak Majumdar (Eds.). 102--117.Google ScholarCross Ref
- Aaron Kane, Thomas Fuhrman, and Philip Koopman. 2014. Monitor based oracles for cyber-physical system testing: Practical experience report. In 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks. IEEE, 148--155. Google ScholarDigital Library
- Koorosh Khodabandehloo. 1996. Analyses of robot systems using fault and event trees: case studies. Reliability Engineering & System Safety 53, 3 (1996), 247--264.Google ScholarCross Ref
- Philip Koopman. 2010. Better embedded system software. Drumnadrochit Education.Google Scholar
- Philip Koopman, Kobey DeVale, and John DeVale. 2008. Interface Robustness Testing: Experience and Lessons Learned from the Ballista Project. Wiley-IEEE Press, Chapter 11, 201--226.Google Scholar
- Hermann Kopetz. 2011. Real-time systems: design principles for distributed embedded applications. Springer Science & Business Media. Google ScholarDigital Library
- Nathan P Kropp, Philip J Koopman, and Daniel P Siewiorek. 1998. Automated robustness testing of off-the-shelf software components. In Proceedings of the 28th Annual International Symposium on Fault-Tolerant Computing. IEEE, 230--239. Google ScholarDigital Library
- D. R. Kuhn, D. R. Wallace, and A. M. Gallo, Jr. 2004. Software Fault Interactions and Implications for Software Testing. IEEE Transactions on Software Engineering 30, 6 (2004), 418--421. Google ScholarDigital Library
- S. Kumar, T. W. S. Chow, and M. Pecht. 2010. Approach to Fault Identification for Electronic Products Using Mahalanobis Distance. IEEE Transactions on Instrumentation and Measurement 59, 8 (Aug 2010), 2055--2064.Google ScholarCross Ref
- Manuel Mendonca and Nuno Neves. 2007. Robustness testing of the Windows DDK. In 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07). IEEE, 554--564. Google ScholarDigital Library
- Barton P Miller, Louis Fredriksen, and Bryan So. 1990. An empirical study of the reliability of UNIX utilities. Commun. ACM 33, 12 (1990), 32--44. Google ScholarDigital Library
- Barton P Miller, David Koski, Cjin Pheow Lee, Vivekananda Maganty, Ravi Murthy, Ajitkumar Natarajan, and Jeff Steidl. 1995. Fuzz revisited: A re-examination of the reliability of UNIX utilities and services. Technical Report 1268. University of Wisconsin. http://digital.library.wisc.edu/1793/59964Google Scholar
- Erik Naggum. 1999. The Long, Painful History of Time. (Oct 1999). http://naggum.no/lugm-time.html Presented at Lisp User Group Meeting 1991.Google Scholar
- Changhai Nie and Hareton Leung. 2011. A Survey of Combinatorial Testing. ACM Comput. Surv. 43, 2, Article 11 (Feb. 2011), 29 pages. Google ScholarDigital Library
- Jiantao Pan, Philip Koopman, Yennun Huang, Robert Gruber, and Mimi Ling Jiang. 2001. Robustness testing and hardening of CORBA ORB implementations. In International Conference on Dependable Systems and Networks (DSN'01). 141--150. Google ScholarDigital Library
- Jiantao Pan, Philip Koopman, and Daniel Siewiorek. 1999. A dimensionality model approach to testing and improving software robustness. In IEEE Systems Readiness Technology Conference (AUTOTESTCON'99). IEEE, 493--501.Google ScholarCross Ref
- David Lorge Parnas, GJK Asmis, and Jan Madey. 1991. Assessment of safety-critical software in nuclear power plants. Nuclear safety 32, 2 (1991), 189--198.Google Scholar
- Rodolfo Pellizzoni, Patrick Meredith, Marco Caccamo, and Grigore Rosu. 2008. Hardware Runtime Monitoring for Dependable COTS-based Real-Time Embedded Systems. In Proceedings of the 29th IEEE Real-Time System Symposium (RTSS'08). 481--491. Google ScholarDigital Library
- Jane Radatz, Anne Geraci, and Freny Katki. 1990. IEEE standard glossary of software engineering terminology. IEEE Std 610121990, 121990 (1990), 3.Google Scholar
- Fares Saad-Khorchef, Antoine Rollet, and Richard Castanet. 2007. A framework and a tool for robustness testing of communicating software. In Proceedings of the 2007 ACM symposium on Applied computing. ACM, 1461--1466. Google ScholarDigital Library
- Michael Sutton, Adam Greene, and Pedram Amini. 2007. Fuzzing: brute force vulnerability discovery. Pearson Education. Google ScholarDigital Library
- Ossi Taipale, Jussi Kasurinen, Katja Karhu, and Kari Smolander. 2011. Trade-off between automated and manual software testing. International Journal of System Assurance Engineering and Management 2, 2 (2011), 114--125.Google ScholarCross Ref
- Mark Utting, Alexander Pretschner, and Bruno Legeard. 2012. A taxonomy of model-based testing approaches. Software Testing, Verification and Reliability 22, 5 (2012), 297--312. Google ScholarDigital Library
- Peter Varhol and Gerie Owen. 2013. How Did I Miss That Bug? Pacific North-West Software Quality Conference, Proceedings of 31 (2013).Google Scholar
- V. Verma, G. Gordon, R. Simmons, and S. Thrun. 2004. Real-time fault diagnosis {robot fault diagnosis}. IEEE Robotics Automation Magazine 11, 2 (June 2004), 56--66.Google ScholarCross Ref
- Paul Vernaza, David Guttendorf, Michael Wagner, and Philip Koopman. 2015. Learning product set models of fault triggers in high-dimensional software interfaces. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS '15). 3506--3511.Google ScholarCross Ref
- Maverick Woo, Sang Kil Cha, Samantha Gottlieb, and David Brumley. 2013. Scheduling black-box mutational fuzzing. In Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security. ACM, 511--522. Google ScholarDigital Library
- Andreas Zeller. 1999. Yesterday, my program worked. Today, it does not. Why?. In Software Engineering-ESEC/FSE'99. Springer, 253--267. Google ScholarDigital Library
Recommendations
A case study on state-based robustness testing of an operating system for the avionic domain
SAFECOMP'11: Proceedings of the 30th international conference on Computer safety, reliability, and securityThis paper investigates the impact of state on robustness testing, by enhancing the traditional approach with the inclusion of the OS state in test cases definition. We evaluate the relevance of OS state and the effects of the proposed strategy through ...
Using memetic algorithm for robustness testing of contract-based software models
AbstractGraph Transformation System (GTS) can formally specify the behavioral aspects of complex systems through graph-based contracts. Test suite generation under normal conditions from GTS specifications is a task well-suited to evolutionary algorithms ...
Robustness testing for software components
Component-based development allows one to build software from existing components and promises to improve software reuse and reduce costs. For critical applications, the user of a component must ensure that it fits the requirements of the application. ...
Comments