research-article

Robustness testing of autonomy software

Authors:
Casidhe Hutchison

Carnegie Mellon University

Carnegie Mellon University
View Profile

,
Milda Zizyte

Carnegie Mellon University

Carnegie Mellon University
View Profile

,
Patrick E. Lanigan

Carnegie Mellon University

Carnegie Mellon University
View Profile

,
David Guttendorf

Carnegie Mellon University

Carnegie Mellon University
View Profile

,
Michael Wagner

Carnegie Mellon University

Carnegie Mellon University
View Profile

,
Claire Le Goues

Carnegie Mellon University

Carnegie Mellon University
View Profile

,
Philip Koopman

Carnegie Mellon University

Carnegie Mellon University
View Profile

ICSE-SEIP '18: Proceedings of the 40th International Conference on Software Engineering: Software Engineering in PracticeMay 2018Pages 276–285https://doi.org/10.1145/3183519.3183534

Published:27 May 2018Publication History

ICSE-SEIP '18: Proceedings of the 40th International Conference on Software Engineering: Software Engineering in Practice

Pages 276–285

ABSTRACT

As robotic and autonomy systems become progressively more present in industrial and human-interactive applications, it is increasingly critical for them to behave safely in the presence of unexpected inputs. While robustness testing for traditional software systems is long-studied, robustness testing for autonomy systems is relatively uncharted territory. In our role as engineers, testers, and researchers we have observed that autonomy systems are importantly different from traditional systems, requiring novel approaches to effectively test them. We present Automated Stress Testing for Autonomy Architectures (ASTAA), a system that effectively, automatically robustness tests autonomy systems by building on classic principles, with important innovations to support this new domain. Over five years, we have used ASTAA to test 17 real-world autonomy systems, robots, and robotics-oriented libraries, across commercial and academic applications, discovering hundreds of bugs. We outline the ASTAA approach and analyze more than 150 bugs we found in real systems. We discuss what we discovered about testing autonomy systems, specifically focusing on how doing so differs from and is similar to traditional software robustness testing and other high-level lessons.

References

European Space Agency. 2016. Schiaparelli Landing Investigation Makes Progress. (23 Nov. 2016). Retrieved January 17, 2017 from http://www.esa.int/Our_Activities/Space_Science/ExoMars/Schiaparelli_landing_investigation_makes_progressGoogle Scholar
Assistant Secretary of Defense for Research and Engineering (ASD(R&E)). 2011. Technology Readiness Assessment (TRA) Guidance. Technical Report. U.S. Department of Defense. http://www.acq.osd.mil/chieftechnologist/publications/docs/TRA2011.pdfGoogle Scholar
Greg Banks, Marco Cova, Viktoria Felmetsger, Kevin Almeroth, Richard Kemmerer, and Giovanni Vigna. 2006. SNOOZE: Toward a Stateful NetwOrk prOtocol fuzZEr". In Proceedings of the 9th International Conference on Information Security (ISC '06). 343--358. Google ScholarDigital Library
Antonia Bertolino. 2003. Software testing research and practice. In International Workshop on Abstract State Machines. Springer, 1--21. Google ScholarDigital Library
J. Carlson and R. R. Murphy. 2005. How UGVs physically fail in the field. IEEE Transactions on Robotics 21, 3 (June 2005), 423--437. Google ScholarDigital Library
Paul Caspi and Alain Girault. 1995. Execution of distributed reactive systems. In European Conference on Parallel Processing. Springer, 13--26. Google ScholarDigital Library
S. R. Choudhary, A. Gorla, and A. Orso. 2015. Automated Test Input Generation for Android: Are We There Yet? (E). In 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE). 429--440.Google Scholar
Hoang-Nam Chu. 2011. Test and evaluation of the robustness of the functional layer of an autonomous robot. Ph.D. Dissertation. Institut National Polytechnique de Toulouse - INPT. https://tel.archives-ouvertes.fr/tel-00627225Google Scholar
Christoph Csallner and Yannis Smaragdakis. 2004. JCrasher: an automatic robustness tester for Java. Software: Practice and Experience 34, 11 (2004), 1025--1050. Google ScholarDigital Library
Siddhartha R Dalal, Ashish Jain, Nachimuthu Karunanithi, JM Leaton, Christopher M Lott, Gardner C Patton, and Bruce M Horowitz. 1999. Model-based testing in practice. In International Conference on Software Engineering (ICSE '99). 285--294. Google ScholarDigital Library
John DeVale and Philip J. Koopman, Jr. 2002. Robust Software - No More Excuses. In International Conference on Dependable Systems and Networks (DSN '02). IEEE, 145--154. Google ScholarDigital Library
Arilo C Dias Neto, Rajesh Subramanyan, Marlon Vieira, and Guilherme H Travassos. 2007. A survey on model-based testing approaches: a systematic review. In 1st ACM international workshop on Empirical assessment of software engineering languages and technologies: held in conjunction with ASE 2007. 31--36. Google ScholarDigital Library
Matthew B. Dwyer, George S. Avrunin, and James C. Corbett. 1999. Patterns in Property Specifications for Finite-state Verification. In Proceedings of the 21st International Conference on Software Engineering (ICSE '99). 411--420. Google ScholarDigital Library
Christof Fetzer and Zhen Xiao. 2002. An automated approach to increasing the robustness of C libraries. In International Conference on Dependable Systems and Networks (DSN '02). IEEE, 155--164. Google ScholarDigital Library
Anup K Ghosh and Matthew Schmid. 1999. An approach to testing COTS software for robustness to operating system exceptions and errors. In Proceedings of the 10th International Symposium on Software Reliability Engineering. 166--174. Google ScholarDigital Library
David Goldberg. 1991. What every computer scientist should know about floating-point arithmetic. ACM Computing Surveys (CSUR) 23, 1 (1991), 5--48. Google ScholarDigital Library
Alwyn Goodloe and Lee Pike. 2010. Monitoring Distributed Real-Time Systems: A Survey and Future Directions. Technical Report NASA/CR-2010-216724. NASA Langley Research Center.Google Scholar
Serge Gorbunov and Arnold Rosenbloom. 2010. Autofuzz: Automated network protocol fuzzing framework. IJCSNS 10, 8 (2010), 239.Google Scholar
Aaron Kane. 2015. Runtime Monitoring for Safety-Critical Embedded Systems. Ph.D. Dissertation. Carnegie Mellon University.Google Scholar
Aaron Kane, Omar Chowdhury, Anupam Datta, and Philip Koopman. 2015. A Case Study on Runtime Monitoring of an Autonomous Research Vehicle (ARV) System. In Proceedings of the 6th International Conference on Runtime Verification (RV '15), Ezio Bartocci and Rupak Majumdar (Eds.). 102--117.Google ScholarCross Ref
Aaron Kane, Thomas Fuhrman, and Philip Koopman. 2014. Monitor based oracles for cyber-physical system testing: Practical experience report. In 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks. IEEE, 148--155. Google ScholarDigital Library
Koorosh Khodabandehloo. 1996. Analyses of robot systems using fault and event trees: case studies. Reliability Engineering & System Safety 53, 3 (1996), 247--264.Google ScholarCross Ref
Philip Koopman. 2010. Better embedded system software. Drumnadrochit Education.Google Scholar
Philip Koopman, Kobey DeVale, and John DeVale. 2008. Interface Robustness Testing: Experience and Lessons Learned from the Ballista Project. Wiley-IEEE Press, Chapter 11, 201--226.Google Scholar
Hermann Kopetz. 2011. Real-time systems: design principles for distributed embedded applications. Springer Science & Business Media. Google ScholarDigital Library
Nathan P Kropp, Philip J Koopman, and Daniel P Siewiorek. 1998. Automated robustness testing of off-the-shelf software components. In Proceedings of the 28th Annual International Symposium on Fault-Tolerant Computing. IEEE, 230--239. Google ScholarDigital Library
D. R. Kuhn, D. R. Wallace, and A. M. Gallo, Jr. 2004. Software Fault Interactions and Implications for Software Testing. IEEE Transactions on Software Engineering 30, 6 (2004), 418--421. Google ScholarDigital Library
S. Kumar, T. W. S. Chow, and M. Pecht. 2010. Approach to Fault Identification for Electronic Products Using Mahalanobis Distance. IEEE Transactions on Instrumentation and Measurement 59, 8 (Aug 2010), 2055--2064.Google ScholarCross Ref
Manuel Mendonca and Nuno Neves. 2007. Robustness testing of the Windows DDK. In 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07). IEEE, 554--564. Google ScholarDigital Library
Barton P Miller, Louis Fredriksen, and Bryan So. 1990. An empirical study of the reliability of UNIX utilities. Commun. ACM 33, 12 (1990), 32--44. Google ScholarDigital Library
Barton P Miller, David Koski, Cjin Pheow Lee, Vivekananda Maganty, Ravi Murthy, Ajitkumar Natarajan, and Jeff Steidl. 1995. Fuzz revisited: A re-examination of the reliability of UNIX utilities and services. Technical Report 1268. University of Wisconsin. http://digital.library.wisc.edu/1793/59964Google Scholar
Erik Naggum. 1999. The Long, Painful History of Time. (Oct 1999). http://naggum.no/lugm-time.html Presented at Lisp User Group Meeting 1991.Google Scholar
Changhai Nie and Hareton Leung. 2011. A Survey of Combinatorial Testing. ACM Comput. Surv. 43, 2, Article 11 (Feb. 2011), 29 pages. Google ScholarDigital Library
Jiantao Pan, Philip Koopman, Yennun Huang, Robert Gruber, and Mimi Ling Jiang. 2001. Robustness testing and hardening of CORBA ORB implementations. In International Conference on Dependable Systems and Networks (DSN'01). 141--150. Google ScholarDigital Library
Jiantao Pan, Philip Koopman, and Daniel Siewiorek. 1999. A dimensionality model approach to testing and improving software robustness. In IEEE Systems Readiness Technology Conference (AUTOTESTCON'99). IEEE, 493--501.Google ScholarCross Ref
David Lorge Parnas, GJK Asmis, and Jan Madey. 1991. Assessment of safety-critical software in nuclear power plants. Nuclear safety 32, 2 (1991), 189--198.Google Scholar
Rodolfo Pellizzoni, Patrick Meredith, Marco Caccamo, and Grigore Rosu. 2008. Hardware Runtime Monitoring for Dependable COTS-based Real-Time Embedded Systems. In Proceedings of the 29th IEEE Real-Time System Symposium (RTSS'08). 481--491. Google ScholarDigital Library
Jane Radatz, Anne Geraci, and Freny Katki. 1990. IEEE standard glossary of software engineering terminology. IEEE Std 610121990, 121990 (1990), 3.Google Scholar
Fares Saad-Khorchef, Antoine Rollet, and Richard Castanet. 2007. A framework and a tool for robustness testing of communicating software. In Proceedings of the 2007 ACM symposium on Applied computing. ACM, 1461--1466. Google ScholarDigital Library
Michael Sutton, Adam Greene, and Pedram Amini. 2007. Fuzzing: brute force vulnerability discovery. Pearson Education. Google ScholarDigital Library
Ossi Taipale, Jussi Kasurinen, Katja Karhu, and Kari Smolander. 2011. Trade-off between automated and manual software testing. International Journal of System Assurance Engineering and Management 2, 2 (2011), 114--125.Google ScholarCross Ref
Mark Utting, Alexander Pretschner, and Bruno Legeard. 2012. A taxonomy of model-based testing approaches. Software Testing, Verification and Reliability 22, 5 (2012), 297--312. Google ScholarDigital Library
Peter Varhol and Gerie Owen. 2013. How Did I Miss That Bug? Pacific North-West Software Quality Conference, Proceedings of 31 (2013).Google Scholar
V. Verma, G. Gordon, R. Simmons, and S. Thrun. 2004. Real-time fault diagnosis {robot fault diagnosis}. IEEE Robotics Automation Magazine 11, 2 (June 2004), 56--66.Google ScholarCross Ref
Paul Vernaza, David Guttendorf, Michael Wagner, and Philip Koopman. 2015. Learning product set models of fault triggers in high-dimensional software interfaces. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS '15). 3506--3511.Google ScholarCross Ref
Maverick Woo, Sang Kil Cha, Samantha Gottlieb, and David Brumley. 2013. Scheduling black-box mutational fuzzing. In Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security. ACM, 511--522. Google ScholarDigital Library
Andreas Zeller. 1999. Yesterday, my program worked. Today, it does not. Why?. In Software Engineering-ESEC/FSE'99. Springer, 253--267. Google ScholarDigital Library

Recommendations

A case study on state-based robustness testing of an operating system for the avionic domain
SAFECOMP'11: Proceedings of the 30th international conference on Computer safety, reliability, and security

This paper investigates the impact of state on robustness testing, by enhancing the traditional approach with the inclusion of the OS state in test cases definition. We evaluate the relevance of OS state and the effects of the proposed strategy through ...
Read More
Using memetic algorithm for robustness testing of contract-based software models
Abstract
Graph Transformation System (GTS) can formally specify the behavioral aspects of complex systems through graph-based contracts. Test suite generation under normal conditions from GTS specifications is a task well-suited to evolutionary algorithms ...
Read More
Robustness testing for software components

Component-based development allows one to build software from existing components and promises to improve software reuse and reduce costs. For critical applications, the user of a component must ensure that it fits the requirements of the application. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICSE-SEIP '18: Proceedings of the 40th International Conference on Software Engineering: Software Engineering in Practice
May 2018
336 pages
ISBN:9781450356596
DOI:10.1145/3183519
Conference Chairs:
Frances Paulisch
Siemens Healthineers, Germany
,
Jan Bosch
Chalmers University of Technology
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 May 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
autonomy
dependability
robustness testing
safety-critical systems
Qualifiers
- research-article
Conference

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 33
  Total Citations
  View Citations
- 377
  Total Downloads
- Downloads (Last 12 months)75
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Robustness testing of autonomy software

ICSE-SEIP '18: Proceedings of the 40th International Conference on Software Engineering: Software Engineering in Practice

ABSTRACT

References

Cited By

Recommendations

A case study on state-based robustness testing of an operating system for the avionic domain

Using memetic algorithm for robustness testing of contract-based software models

Robustness testing for software components

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Robustness testing of autonomy software

ICSE-SEIP '18: Proceedings of the 40th International Conference on Software Engineering: Software Engineering in Practice

ABSTRACT

References

Cited By

Recommendations

A case study on state-based robustness testing of an operating system for the avionic domain

Using memetic algorithm for robustness testing of contract-based software models

Robustness testing for software components

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media