skip to main content
research-article
Open Access
Artifacts Available
Artifacts Evaluated & Reusable

Decision procedures for path feasibility of string-manipulating programs with complex operations

Published:02 January 2019Publication History
Skip Abstract Section

Abstract

The design and implementation of decision procedures for checking path feasibility in string-manipulating programs is an important problem, with such applications as symbolic execution of programs with strings and automated detection of cross-site scripting (XSS) vulnerabilities in web applications. A (symbolic) path is given as a finite sequence of assignments and assertions (i.e. without loops), and checking its feasibility amounts to determining the existence of inputs that yield a successful execution. Modern programming languages (e.g. JavaScript, PHP, and Python) support many complex string operations, and strings are also often implicitly modified during a computation in some intricate fashion (e.g. by some autoescaping mechanisms).

In this paper we provide two general semantic conditions which together ensure the decidability of path feasibility: (1) each assertion admits regular monadic decomposition (i.e. is an effectively recognisable relation), and (2) each assignment uses a (possibly nondeterministic) function whose inverse relation preserves regularity. We show that the semantic conditions are expressive since they are satisfied by a multitude of string operations including concatenation, one-way and two-way finite-state transducers, replaceall functions (where the replacement string could contain variables), string-reverse functions, regular-expression matching, and some (restricted) forms of letter-counting/length functions. The semantic conditions also strictly subsume existing decidable string theories (e.g. straight-line fragments, and acyclic logics), and most existing benchmarks (e.g. most of Kaluza’s, and all of SLOG’s, Stranger’s, and SLOTH’s benchmarks). Our semantic conditions also yield a conceptually simple decision procedure, as well as an extensible architecture of a string solver in that a user may easily incorporate his/her own string functions into the solver by simply providing code for the pre-image computation without worrying about other parts of the solver. Despite these, the semantic conditions are unfortunately too general to provide a fast and complete decision procedure. We provide strong theoretical evidence for this in the form of complexity results. To rectify this problem, we propose two solutions. Our main solution is to allow only partial string functions (i.e., prohibit nondeterminism) in condition (2). This restriction is satisfied in many cases in practice, and yields decision procedures that are effective in both theory and practice. Whenever nondeterministic functions are still needed (e.g. the string function split), our second solution is to provide a syntactic fragment that provides a support of nondeterministic functions, and operations like one-way transducers, replaceall (with constant replacement string), the string-reverse function, concatenation, and regular-expression matching. We show that this fragment can be reduced to an existing solver SLOTH that exploits fast model checking algorithms like IC3.

We provide an efficient implementation of our decision procedure (assuming our first solution above, i.e., deterministic partial string functions) in a new string solver OSTRICH. Our implementation provides built-in support for concatenation, reverse, functional transducers (FFT), and replaceall and provides a framework for extensibility to support further string functions. We demonstrate the efficacy of our new solver against other competitive solvers.

Skip Supplemental Material Section

Supplemental Material

a49-wu.webm

webm

58.2 MB

References

  1. Parosh Aziz Abdulla, Mohamed Faouzi Atig, Yu-Fang Chen, Bui Phi Diep, Lukás Holík, Ahmed Rezine, and Philipp Rümmer. 2017. Flatten and conquer: a framework for efficient analysis of string constraints. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, Barcelona, Spain, June 18-23, 2017. ACM, New York, NY, USA, 602–617. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Parosh Aziz Abdulla, Mohamed Faouzi Atig, Yu-Fang Chen, Bui Phi Diep, Lukás Holík, Ahmed Rezine, and Philipp Rümmer. 2018. TRAU: SMT Solver for String Constraints. In Formal Methods in Computer Aided Design, FMCAD 2018. To appear.Google ScholarGoogle Scholar
  3. Parosh Aziz Abdulla, Mohamed Faouzi Atig, Yu-Fang Chen, Lukás Holík, Ahmed Rezine, Philipp Rümmer, and Jari Stenman. 2014. String Constraints for Verification. In Computer Aided Verification - 26th International Conference, CAV 2014. Springer, 150–166. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Rajeev Alur and Jyotirmoy V. Deshmukh. 2011. Nondeterministic Streaming String Transducers. In Automata, Languages and Programming - 38th International Colloquium, ICALP 2011, Zurich, Switzerland, July 4-8, 2011, Proceedings, Part II. Springer, 1–20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Christel Baier and Joost-Pieter Katoen. 2008. Principles of Model Checking (Representation and Mind Series). The MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Pablo Barceló, Diego Figueira, and Leonid Libkin. 2013. Graph Logics with Rational Relations. Logical Methods in Computer Science 9, 3 (2013).Google ScholarGoogle Scholar
  7. Clark W. Barrett, Roberto Sebastiani, Sanjit A. Seshia, and Cesare Tinelli. 2009. Satisfiability Modulo Theories. In Handbook of Satisfiability. IOS Press, 825–885.Google ScholarGoogle Scholar
  8. Michael Benedikt, Leonid Libkin, Thomas Schwentick, and Luc Segoufin. 2003. Definable relations and first-order query languages over strings. J. ACM 50, 5 (2003), 694–751. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Murphy Berzish, Vijay Ganesh, and Yunhui Zheng. 2017. Z3str3: A string solver with theory-aware heuristics. In 2017 Formal Methods in Computer Aided Design, FMCAD 2017, Vienna, Austria, October 2-6, 2017. IEEE, 55–59. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Nikolaj Bjørner, Nikolai Tillmann, and Andrei Voronkov. 2009. Path feasibility analysis for string-manipulating programs. In Tools and Algorithms for the Construction and Analysis of Systems, TACAS 2009. Springer, 307–321. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J Richard Büchi and Steven Senger. 1990. Definability in the existential theory of concatenation and undecidable extensions of this theory. In The Collected Works of J. Richard Büchi. Springer, 671–683.Google ScholarGoogle Scholar
  12. Thierry Cachat and Igor Walukiewicz. 2007. The Complexity of Games on Higher Order Pushdown Automata. CoRR abs/0705.0262 (2007). arXiv: 0705.0262 http://arxiv.org/abs/0705.0262Google ScholarGoogle Scholar
  13. Cristian Cadar, Daniel Dunbar, and Dawson Engler. 2008. KLEE: Unassisted and Automatic Generation of High-coverage Tests for Complex Systems Programs. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation (OSDI’08). USENIX Association, Berkeley, CA, USA, 209–224. http://dl.acm.org/citation.cfm?id=1855741. 1855756 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Cristian Cadar, Vijay Ganesh, Peter M. Pawlowski, David L. Dill, and Dawson R. Engler. 2006. EXE: Automatically Generating Inputs of Death. In Proceedings of the 13th ACM Conference on Computer and Communications Security (CCS ’06). ACM, New York, NY, USA, 322–335. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Cristian Cadar and Koushik Sen. 2013. Symbolic Execution for Software Testing: Three Decades Later. Commun. ACM 56, 2 (Feb. 2013), 82–90. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Olivier Carton, Christian Choffrut, and Serge Grigorieff. 2006. Decision problems among the main subfamilies of rational relations. ITA 40, 2 (2006), 255–275.Google ScholarGoogle Scholar
  17. Taolue Chen, Yan Chen, Matthew Hague, Anthony W. Lin, and Zhilin Wu. 2018a. What is decidable about string constraints with the ReplaceAll function. PACMPL 2, POPL (2018), 3:1–3:29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Taolue Chen, Matthew Hague, Anthony W. Lin, Philipp Rümmer, and Zhilin Wu. 2018b. Decision Procedures for Path Feasibility of String-Manipulating Programs with Complex Operations. CoRR abs/1811.03167 (2018). arXiv: 1811.03167 https://arxiv.org/abs/1811.03167Google ScholarGoogle Scholar
  19. Yan Chen. 2018a. Solving String Constraints with ReplaceAll Function. Master’s thesis. State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, China.Google ScholarGoogle Scholar
  20. Yan Chen. 2018b. Z3-replaceall. https://github.com/TinyYan/z3- replaceAll . Referred in Jan 2018.Google ScholarGoogle Scholar
  21. Christian Choffrut. 2006. Relations over Words and Logic: A Chronology. Bulletin of the EATCS 89 (2006), 159–163.Google ScholarGoogle Scholar
  22. Loris D’Antoni and Margus Veanes. 2013. Static Analysis of String Encoders and Decoders. In Verification, Model Checking, and Abstract Interpretation, VMCAI. Springer, 209–228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Leonardo De Moura and Nikolaj Bjørner. 2011. Satisfiability modulo theories: introduction and applications. Commun. ACM 54, 9 (2011), 69–77. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Dénes. 1967. Connections between transformation semigroups and graphs. In Theory of Graphs. Gordon & Breach.Google ScholarGoogle Scholar
  25. Volker Diekert. 2002. Makanin’s Algorithm. In Algebraic Combinatorics on Words, M. Lothaire (Ed.). Encyclopedia of Mathematics and its Applications, Vol. 90. Cambridge University Press, Chapter 12, 387–442.Google ScholarGoogle Scholar
  26. Joost Engelfriet and Hendrik Jan Hoogeboom. 2001. MSO definable string transductions and two-way finite-state transducers. ACM Trans. Comput. Log. 2, 2 (2001), 216–254. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Emmanuel Filiot, Olivier Gauwin, Pierre-Alain Reynier, and Frédéric Servais. 2013. From Two-Way to One-Way Finite State Transducers. In 28th Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2013, New Orleans, LA, USA, June 25-28, 2013. IEEE Computer Society, 468–477. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Vijay Ganesh, Mia Minnes, Armando Solar-Lezama, and Martin C. Rinard. 2012. Word Equations with Length Constraints: What’s Decidable?. In Hardware and Software: Verification and Testing - 8th International Haifa Verification Conference, HVC 2012, Haifa, Israel, November 6-8, 2012. Revised Selected Papers. Springer, 209–226. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Patrice Godefroid, Nils Klarlund, and Koushik Sen. 2005. DART: Directed Automated Random Testing. SIGPLAN Not. 40, 6 (June 2005), 213–223. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Lukás Holík, Petr Janku, Anthony W. Lin, Philipp Rümmer, and Tomás Vojnar. 2018. String constraints with concatenation and transducers solved efficiently. PACMPL 2, POPL (2018), 4:1–4:32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Pieter Hooimeijer, Benjamin Livshits, David Molnar, Prateek Saxena, and Margus Veanes. 2011. Fast and Precise Sanitizer Analysis with BEK. In 20th USENIX Security Symposium, San Francisco, CA, USA, August 8-12, 2011, Proceedings. USENIX Association. http://static.usenix.org/events/sec11/tech/full_papers/Hooimeijer.pdf Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. John E. Hopcroft and Jeffrey D. Ullman. 1979. Introduction to Automata Theory, Languages and Computation. Addison-Wesley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Artur Jez. 2016. Recompression: A Simple and Powerful Technique for Word Equations. J. ACM 63, 1 (2016), 4:1–4:51. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Neil D. Jones. 2001. The expressive power of higher-order types or, life without CONS. Journal of Functional Programming 11, 1 (2001), 55–94. http://journals.cambridge.org/action/displayAbstract?aid=68581 Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Christoph Kern. 2014. Securing the tangled web. Commun. ACM 57, 9 (2014), 38–47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Adam Kiezun, Vijay Ganesh, Shay Artzi, Philip J. Guo, Pieter Hooimeijer, and Michael D. Ernst. 2012. HAMPI: A solver for word equations over strings, regular expressions, and context-free grammars. ACM Trans. Softw. Eng. Methodol. 21, 4 (2012), 25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. James C. King. 1976. Symbolic Execution and Program Testing. Commun. ACM 19, 7 (1976), 385–394. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Dexter Kozen. 1997. Automata and Computability. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Daniel Kroening and Ofer Strichman. 2008. Decision Procedures. Springer.Google ScholarGoogle Scholar
  40. Tianyi Liang, Andrew Reynolds, Cesare Tinelli, Clark Barrett, and Morgan Deters. 2014. A DPLL(T) Theory Solver for a Theory of Strings and Regular Expressions. In Computer Aided Verification - 26th International Conference, CAV 2014. Springer, 646–662. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Leonid Libkin. 2003. Variable independence for first-order definable constraints. ACM Trans. Comput. Log. 4, 4 (2003), 431–451. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Anthony W. Lin and Pablo Barceló. 2016. String Solving with Word Equations and Transducers: Towards a Logic for Analysing Mutation XSS. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’16). Springer, 123–136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. K. L. McMillan. 1993. Symbolic model checking. Kluwer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Yasuhiko Minamide. 2005. Static approximation of dynamically generated Web pages. In Proceedings of the 14th international conference on World Wide Web, WWW 2005. ACM, 432–441. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Christophe Morvan. 2000. On Rational Graphs. In Foundations of Software Science and Computation Structures, Third International Conference, FOSSACS 2000. Springer, 252–266. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Robert Nieuwenhuis, Albert Oliveras, and Cesare Tinelli. 2004. Abstract DPLL and Abstract DPLL Modulo Theories. In Logic for Programming, Artificial Intelligence, and Reasoning, 11th International Conference, LPAR 2004 (LNCS), Vol. 3452. Springer, 36–50.Google ScholarGoogle Scholar
  47. Nicholas Pippenger. 2010. Theories of Computability. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Wojciech Plandowski. 2004. Satisfiability of word equations with constants is in PSPACE. J. ACM 51, 3 (2004), 483–496. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Philipp Rümmer. 2008. A Constraint Sequent Calculus for First-Order Logic with Linear Integer Arithmetic. In Logic for Programming, Artificial Intelligence, and Reasoning, 15th International Conference, LPAR 2008, LNCS 5330. Springer, 274–289. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Prateek Saxena, Devdatta Akhawe, Steve Hanna, Feng Mao, Stephen McCamant, and Dawn Song. 2010. A Symbolic Execution Framework for JavaScript. In 31st IEEE Symposium on Security and Privacy, S&P 2010, 16-19 May 2010, Berleley/Oakland, California, USA. IEEE, 513–528. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Koushik Sen, Swaroop Kalasapur, Tasneem G. Brutch, and Simon Gibbs. 2013. Jalangi: a selective record-replay and dynamic analysis framework for JavaScript. In Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE’13, Saint Petersburg, Russian Federation, August 18-26, 2013. ACM, 488–498. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Koushik Sen, Darko Marinov, and Gul Agha. 2005. CUTE: a concolic unit testing engine for C. In Proceedings of the 10th European Software Engineering Conference held jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2005, Lisbon, Portugal, September 5-9, 2005, ESEC/SIGSOFT FSE 2005. ACM, 263–272. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Minh-Thai Trinh, Duc-Hiep Chu, and Joxan Jaffar. 2014. S3: A Symbolic String Solver for Vulnerability Detection in Web Applications. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, CCS 2014. ACM, 1232–1243. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Minh-Thai Trinh, Duc-Hiep Chu, and Joxan Jaffar. 2016. Progressive Reasoning over Recursively-Defined Strings. In Computer Aided Verification - 28th International Conference, CAV 2016, Toronto, ON, Canada, July 17-23, 2016, Proceedings, Part I. Springer, 218–240.Google ScholarGoogle Scholar
  55. Andrew van der Stock, Brian Glas, Neil Smithline, and Torsten Gigler. 2017. OWASP Top 10 – 2017. https://www.owasp. org/index.php/Top_10- 2017_Top_10 . Referred January 2018.Google ScholarGoogle Scholar
  56. Margus Veanes, Nikolaj Bjørner, Lev Nachmanson, and Sergey Bereg. 2017. Monadic Decomposition. J. ACM 64, 2 (2017), 14:1–14:28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Hung-En Wang, Tzung-Lin Tsai, Chun-Han Lin, Fang Yu, and Jie-Hong R. Jiang. 2016. String Analysis via Automata Manipulation with Logic Circuit Representation. In Computer Aided Verification - 28th International Conference, CAV 2016, Toronto, ON, Canada, July 17-23, 2016, Proceedings, Part I (Lecture Notes in Computer Science), Vol. 9779. Springer, 241–260.Google ScholarGoogle Scholar
  58. Joel Weinberger, Prateek Saxena, Devdatta Akhawe, Matthew Finifter, Eui Chul Richard Shin, and Dawn Song. 2011. A Systematic Analysis of XSS Sanitization in Web Application Frameworks. In Computer Security - ESORICS 2011 - 16th European Symposium on Research in Computer Security, Leuven, Belgium, September 12-14, 2011. Proceedings. Springer, 150–171. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Fang Yu, Muath Alkhalaf, and Tevfik Bultan. 2010. Stranger: An Automata-Based String Analysis Tool for PHP. In Tools and Algorithms for the Construction and Analysis of Systems, 16th International Conference, TACAS 2010. Springer, 154–157. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Fang Yu, Muath Alkhalaf, Tevfik Bultan, and Oscar H. Ibarra. 2014. Automata-based Symbolic String Analysis for Vulnerability Detection. Form. Methods Syst. Des. 44, 1 (2014), 44–70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Bohdan Zelinka. 1981. Graphs of semigroups. Časopis pro pěstování matematiky 106, 4 (1981), 407–408. http://eudml.org/ doc/19323Google ScholarGoogle Scholar
  62. Yunhui Zheng, Vijay Ganesh, Sanu Subramanian, Omer Tripp, Julian Dolby, and Xiangyu Zhang. 2015. Effective Search-Space Pruning for Solvers of String Equations, Regular Expressions and Length Constraints. In Computer Aided Verification -27th International Conference, CAV 2015, San Francisco, CA, USA, July 18-24, 2015, Proceedings, Part I. Springer, 235–254.Google ScholarGoogle Scholar
  63. Yunhui Zheng, Xiangyu Zhang, and Vijay Ganesh. 2013. Z3-str: a Z3-based string solver for web application analysis. In Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE 2013, Saint Petersburg, Russian Federation, August 18-26, 2013. ACM, 114–124. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Decision procedures for path feasibility of string-manipulating programs with complex operations

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader