Abstract
The design and implementation of decision procedures for checking path feasibility in string-manipulating programs is an important problem, with such applications as symbolic execution of programs with strings and automated detection of cross-site scripting (XSS) vulnerabilities in web applications. A (symbolic) path is given as a finite sequence of assignments and assertions (i.e. without loops), and checking its feasibility amounts to determining the existence of inputs that yield a successful execution. Modern programming languages (e.g. JavaScript, PHP, and Python) support many complex string operations, and strings are also often implicitly modified during a computation in some intricate fashion (e.g. by some autoescaping mechanisms).
In this paper we provide two general semantic conditions which together ensure the decidability of path feasibility: (1) each assertion admits regular monadic decomposition (i.e. is an effectively recognisable relation), and (2) each assignment uses a (possibly nondeterministic) function whose inverse relation preserves regularity. We show that the semantic conditions are expressive since they are satisfied by a multitude of string operations including concatenation, one-way and two-way finite-state transducers, replaceall functions (where the replacement string could contain variables), string-reverse functions, regular-expression matching, and some (restricted) forms of letter-counting/length functions. The semantic conditions also strictly subsume existing decidable string theories (e.g. straight-line fragments, and acyclic logics), and most existing benchmarks (e.g. most of Kaluza’s, and all of SLOG’s, Stranger’s, and SLOTH’s benchmarks). Our semantic conditions also yield a conceptually simple decision procedure, as well as an extensible architecture of a string solver in that a user may easily incorporate his/her own string functions into the solver by simply providing code for the pre-image computation without worrying about other parts of the solver. Despite these, the semantic conditions are unfortunately too general to provide a fast and complete decision procedure. We provide strong theoretical evidence for this in the form of complexity results. To rectify this problem, we propose two solutions. Our main solution is to allow only partial string functions (i.e., prohibit nondeterminism) in condition (2). This restriction is satisfied in many cases in practice, and yields decision procedures that are effective in both theory and practice. Whenever nondeterministic functions are still needed (e.g. the string function split), our second solution is to provide a syntactic fragment that provides a support of nondeterministic functions, and operations like one-way transducers, replaceall (with constant replacement string), the string-reverse function, concatenation, and regular-expression matching. We show that this fragment can be reduced to an existing solver SLOTH that exploits fast model checking algorithms like IC3.
We provide an efficient implementation of our decision procedure (assuming our first solution above, i.e., deterministic partial string functions) in a new string solver OSTRICH. Our implementation provides built-in support for concatenation, reverse, functional transducers (FFT), and replaceall and provides a framework for extensibility to support further string functions. We demonstrate the efficacy of our new solver against other competitive solvers.
Supplemental Material
- Parosh Aziz Abdulla, Mohamed Faouzi Atig, Yu-Fang Chen, Bui Phi Diep, Lukás Holík, Ahmed Rezine, and Philipp Rümmer. 2017. Flatten and conquer: a framework for efficient analysis of string constraints. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, Barcelona, Spain, June 18-23, 2017. ACM, New York, NY, USA, 602–617. Google ScholarDigital Library
- Parosh Aziz Abdulla, Mohamed Faouzi Atig, Yu-Fang Chen, Bui Phi Diep, Lukás Holík, Ahmed Rezine, and Philipp Rümmer. 2018. TRAU: SMT Solver for String Constraints. In Formal Methods in Computer Aided Design, FMCAD 2018. To appear.Google Scholar
- Parosh Aziz Abdulla, Mohamed Faouzi Atig, Yu-Fang Chen, Lukás Holík, Ahmed Rezine, Philipp Rümmer, and Jari Stenman. 2014. String Constraints for Verification. In Computer Aided Verification - 26th International Conference, CAV 2014. Springer, 150–166. Google ScholarDigital Library
- Rajeev Alur and Jyotirmoy V. Deshmukh. 2011. Nondeterministic Streaming String Transducers. In Automata, Languages and Programming - 38th International Colloquium, ICALP 2011, Zurich, Switzerland, July 4-8, 2011, Proceedings, Part II. Springer, 1–20. Google ScholarDigital Library
- Christel Baier and Joost-Pieter Katoen. 2008. Principles of Model Checking (Representation and Mind Series). The MIT Press. Google ScholarDigital Library
- Pablo Barceló, Diego Figueira, and Leonid Libkin. 2013. Graph Logics with Rational Relations. Logical Methods in Computer Science 9, 3 (2013).Google Scholar
- Clark W. Barrett, Roberto Sebastiani, Sanjit A. Seshia, and Cesare Tinelli. 2009. Satisfiability Modulo Theories. In Handbook of Satisfiability. IOS Press, 825–885.Google Scholar
- Michael Benedikt, Leonid Libkin, Thomas Schwentick, and Luc Segoufin. 2003. Definable relations and first-order query languages over strings. J. ACM 50, 5 (2003), 694–751. Google ScholarDigital Library
- Murphy Berzish, Vijay Ganesh, and Yunhui Zheng. 2017. Z3str3: A string solver with theory-aware heuristics. In 2017 Formal Methods in Computer Aided Design, FMCAD 2017, Vienna, Austria, October 2-6, 2017. IEEE, 55–59. Google ScholarDigital Library
- Nikolaj Bjørner, Nikolai Tillmann, and Andrei Voronkov. 2009. Path feasibility analysis for string-manipulating programs. In Tools and Algorithms for the Construction and Analysis of Systems, TACAS 2009. Springer, 307–321. Google ScholarDigital Library
- J Richard Büchi and Steven Senger. 1990. Definability in the existential theory of concatenation and undecidable extensions of this theory. In The Collected Works of J. Richard Büchi. Springer, 671–683.Google Scholar
- Thierry Cachat and Igor Walukiewicz. 2007. The Complexity of Games on Higher Order Pushdown Automata. CoRR abs/0705.0262 (2007). arXiv: 0705.0262 http://arxiv.org/abs/0705.0262Google Scholar
- Cristian Cadar, Daniel Dunbar, and Dawson Engler. 2008. KLEE: Unassisted and Automatic Generation of High-coverage Tests for Complex Systems Programs. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation (OSDI’08). USENIX Association, Berkeley, CA, USA, 209–224. http://dl.acm.org/citation.cfm?id=1855741. 1855756 Google ScholarDigital Library
- Cristian Cadar, Vijay Ganesh, Peter M. Pawlowski, David L. Dill, and Dawson R. Engler. 2006. EXE: Automatically Generating Inputs of Death. In Proceedings of the 13th ACM Conference on Computer and Communications Security (CCS ’06). ACM, New York, NY, USA, 322–335. Google ScholarDigital Library
- Cristian Cadar and Koushik Sen. 2013. Symbolic Execution for Software Testing: Three Decades Later. Commun. ACM 56, 2 (Feb. 2013), 82–90. Google ScholarDigital Library
- Olivier Carton, Christian Choffrut, and Serge Grigorieff. 2006. Decision problems among the main subfamilies of rational relations. ITA 40, 2 (2006), 255–275.Google Scholar
- Taolue Chen, Yan Chen, Matthew Hague, Anthony W. Lin, and Zhilin Wu. 2018a. What is decidable about string constraints with the ReplaceAll function. PACMPL 2, POPL (2018), 3:1–3:29. Google ScholarDigital Library
- Taolue Chen, Matthew Hague, Anthony W. Lin, Philipp Rümmer, and Zhilin Wu. 2018b. Decision Procedures for Path Feasibility of String-Manipulating Programs with Complex Operations. CoRR abs/1811.03167 (2018). arXiv: 1811.03167 https://arxiv.org/abs/1811.03167Google Scholar
- Yan Chen. 2018a. Solving String Constraints with ReplaceAll Function. Master’s thesis. State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, China.Google Scholar
- Yan Chen. 2018b. Z3-replaceall. https://github.com/TinyYan/z3- replaceAll . Referred in Jan 2018.Google Scholar
- Christian Choffrut. 2006. Relations over Words and Logic: A Chronology. Bulletin of the EATCS 89 (2006), 159–163.Google Scholar
- Loris D’Antoni and Margus Veanes. 2013. Static Analysis of String Encoders and Decoders. In Verification, Model Checking, and Abstract Interpretation, VMCAI. Springer, 209–228. Google ScholarDigital Library
- Leonardo De Moura and Nikolaj Bjørner. 2011. Satisfiability modulo theories: introduction and applications. Commun. ACM 54, 9 (2011), 69–77. Google ScholarDigital Library
- J. Dénes. 1967. Connections between transformation semigroups and graphs. In Theory of Graphs. Gordon & Breach.Google Scholar
- Volker Diekert. 2002. Makanin’s Algorithm. In Algebraic Combinatorics on Words, M. Lothaire (Ed.). Encyclopedia of Mathematics and its Applications, Vol. 90. Cambridge University Press, Chapter 12, 387–442.Google Scholar
- Joost Engelfriet and Hendrik Jan Hoogeboom. 2001. MSO definable string transductions and two-way finite-state transducers. ACM Trans. Comput. Log. 2, 2 (2001), 216–254. Google ScholarDigital Library
- Emmanuel Filiot, Olivier Gauwin, Pierre-Alain Reynier, and Frédéric Servais. 2013. From Two-Way to One-Way Finite State Transducers. In 28th Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2013, New Orleans, LA, USA, June 25-28, 2013. IEEE Computer Society, 468–477. Google ScholarDigital Library
- Vijay Ganesh, Mia Minnes, Armando Solar-Lezama, and Martin C. Rinard. 2012. Word Equations with Length Constraints: What’s Decidable?. In Hardware and Software: Verification and Testing - 8th International Haifa Verification Conference, HVC 2012, Haifa, Israel, November 6-8, 2012. Revised Selected Papers. Springer, 209–226. Google ScholarDigital Library
- Patrice Godefroid, Nils Klarlund, and Koushik Sen. 2005. DART: Directed Automated Random Testing. SIGPLAN Not. 40, 6 (June 2005), 213–223. Google ScholarDigital Library
- Lukás Holík, Petr Janku, Anthony W. Lin, Philipp Rümmer, and Tomás Vojnar. 2018. String constraints with concatenation and transducers solved efficiently. PACMPL 2, POPL (2018), 4:1–4:32. Google ScholarDigital Library
- Pieter Hooimeijer, Benjamin Livshits, David Molnar, Prateek Saxena, and Margus Veanes. 2011. Fast and Precise Sanitizer Analysis with BEK. In 20th USENIX Security Symposium, San Francisco, CA, USA, August 8-12, 2011, Proceedings. USENIX Association. http://static.usenix.org/events/sec11/tech/full_papers/Hooimeijer.pdf Google ScholarDigital Library
- John E. Hopcroft and Jeffrey D. Ullman. 1979. Introduction to Automata Theory, Languages and Computation. Addison-Wesley. Google ScholarDigital Library
- Artur Jez. 2016. Recompression: A Simple and Powerful Technique for Word Equations. J. ACM 63, 1 (2016), 4:1–4:51. Google ScholarDigital Library
- Neil D. Jones. 2001. The expressive power of higher-order types or, life without CONS. Journal of Functional Programming 11, 1 (2001), 55–94. http://journals.cambridge.org/action/displayAbstract?aid=68581 Google ScholarDigital Library
- Christoph Kern. 2014. Securing the tangled web. Commun. ACM 57, 9 (2014), 38–47. Google ScholarDigital Library
- Adam Kiezun, Vijay Ganesh, Shay Artzi, Philip J. Guo, Pieter Hooimeijer, and Michael D. Ernst. 2012. HAMPI: A solver for word equations over strings, regular expressions, and context-free grammars. ACM Trans. Softw. Eng. Methodol. 21, 4 (2012), 25. Google ScholarDigital Library
- James C. King. 1976. Symbolic Execution and Program Testing. Commun. ACM 19, 7 (1976), 385–394. Google ScholarDigital Library
- Dexter Kozen. 1997. Automata and Computability. Springer. Google ScholarDigital Library
- Daniel Kroening and Ofer Strichman. 2008. Decision Procedures. Springer.Google Scholar
- Tianyi Liang, Andrew Reynolds, Cesare Tinelli, Clark Barrett, and Morgan Deters. 2014. A DPLL(T) Theory Solver for a Theory of Strings and Regular Expressions. In Computer Aided Verification - 26th International Conference, CAV 2014. Springer, 646–662. Google ScholarDigital Library
- Leonid Libkin. 2003. Variable independence for first-order definable constraints. ACM Trans. Comput. Log. 4, 4 (2003), 431–451. Google ScholarDigital Library
- Anthony W. Lin and Pablo Barceló. 2016. String Solving with Word Equations and Transducers: Towards a Logic for Analysing Mutation XSS. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’16). Springer, 123–136. Google ScholarDigital Library
- K. L. McMillan. 1993. Symbolic model checking. Kluwer. Google ScholarDigital Library
- Yasuhiko Minamide. 2005. Static approximation of dynamically generated Web pages. In Proceedings of the 14th international conference on World Wide Web, WWW 2005. ACM, 432–441. Google ScholarDigital Library
- Christophe Morvan. 2000. On Rational Graphs. In Foundations of Software Science and Computation Structures, Third International Conference, FOSSACS 2000. Springer, 252–266. Google ScholarDigital Library
- Robert Nieuwenhuis, Albert Oliveras, and Cesare Tinelli. 2004. Abstract DPLL and Abstract DPLL Modulo Theories. In Logic for Programming, Artificial Intelligence, and Reasoning, 11th International Conference, LPAR 2004 (LNCS), Vol. 3452. Springer, 36–50.Google Scholar
- Nicholas Pippenger. 2010. Theories of Computability. Cambridge University Press. Google ScholarDigital Library
- Wojciech Plandowski. 2004. Satisfiability of word equations with constants is in PSPACE. J. ACM 51, 3 (2004), 483–496. Google ScholarDigital Library
- Philipp Rümmer. 2008. A Constraint Sequent Calculus for First-Order Logic with Linear Integer Arithmetic. In Logic for Programming, Artificial Intelligence, and Reasoning, 15th International Conference, LPAR 2008, LNCS 5330. Springer, 274–289. Google ScholarDigital Library
- Prateek Saxena, Devdatta Akhawe, Steve Hanna, Feng Mao, Stephen McCamant, and Dawn Song. 2010. A Symbolic Execution Framework for JavaScript. In 31st IEEE Symposium on Security and Privacy, S&P 2010, 16-19 May 2010, Berleley/Oakland, California, USA. IEEE, 513–528. Google ScholarDigital Library
- Koushik Sen, Swaroop Kalasapur, Tasneem G. Brutch, and Simon Gibbs. 2013. Jalangi: a selective record-replay and dynamic analysis framework for JavaScript. In Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE’13, Saint Petersburg, Russian Federation, August 18-26, 2013. ACM, 488–498. Google ScholarDigital Library
- Koushik Sen, Darko Marinov, and Gul Agha. 2005. CUTE: a concolic unit testing engine for C. In Proceedings of the 10th European Software Engineering Conference held jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2005, Lisbon, Portugal, September 5-9, 2005, ESEC/SIGSOFT FSE 2005. ACM, 263–272. Google ScholarDigital Library
- Minh-Thai Trinh, Duc-Hiep Chu, and Joxan Jaffar. 2014. S3: A Symbolic String Solver for Vulnerability Detection in Web Applications. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, CCS 2014. ACM, 1232–1243. Google ScholarDigital Library
- Minh-Thai Trinh, Duc-Hiep Chu, and Joxan Jaffar. 2016. Progressive Reasoning over Recursively-Defined Strings. In Computer Aided Verification - 28th International Conference, CAV 2016, Toronto, ON, Canada, July 17-23, 2016, Proceedings, Part I. Springer, 218–240.Google Scholar
- Andrew van der Stock, Brian Glas, Neil Smithline, and Torsten Gigler. 2017. OWASP Top 10 – 2017. https://www.owasp. org/index.php/Top_10- 2017_Top_10 . Referred January 2018.Google Scholar
- Margus Veanes, Nikolaj Bjørner, Lev Nachmanson, and Sergey Bereg. 2017. Monadic Decomposition. J. ACM 64, 2 (2017), 14:1–14:28. Google ScholarDigital Library
- Hung-En Wang, Tzung-Lin Tsai, Chun-Han Lin, Fang Yu, and Jie-Hong R. Jiang. 2016. String Analysis via Automata Manipulation with Logic Circuit Representation. In Computer Aided Verification - 28th International Conference, CAV 2016, Toronto, ON, Canada, July 17-23, 2016, Proceedings, Part I (Lecture Notes in Computer Science), Vol. 9779. Springer, 241–260.Google Scholar
- Joel Weinberger, Prateek Saxena, Devdatta Akhawe, Matthew Finifter, Eui Chul Richard Shin, and Dawn Song. 2011. A Systematic Analysis of XSS Sanitization in Web Application Frameworks. In Computer Security - ESORICS 2011 - 16th European Symposium on Research in Computer Security, Leuven, Belgium, September 12-14, 2011. Proceedings. Springer, 150–171. Google ScholarDigital Library
- Fang Yu, Muath Alkhalaf, and Tevfik Bultan. 2010. Stranger: An Automata-Based String Analysis Tool for PHP. In Tools and Algorithms for the Construction and Analysis of Systems, 16th International Conference, TACAS 2010. Springer, 154–157. Google ScholarDigital Library
- Fang Yu, Muath Alkhalaf, Tevfik Bultan, and Oscar H. Ibarra. 2014. Automata-based Symbolic String Analysis for Vulnerability Detection. Form. Methods Syst. Des. 44, 1 (2014), 44–70. Google ScholarDigital Library
- Bohdan Zelinka. 1981. Graphs of semigroups. Časopis pro pěstování matematiky 106, 4 (1981), 407–408. http://eudml.org/ doc/19323Google Scholar
- Yunhui Zheng, Vijay Ganesh, Sanu Subramanian, Omer Tripp, Julian Dolby, and Xiangyu Zhang. 2015. Effective Search-Space Pruning for Solvers of String Equations, Regular Expressions and Length Constraints. In Computer Aided Verification -27th International Conference, CAV 2015, San Francisco, CA, USA, July 18-24, 2015, Proceedings, Part I. Springer, 235–254.Google Scholar
- Yunhui Zheng, Xiangyu Zhang, and Vijay Ganesh. 2013. Z3-str: a Z3-based string solver for web application analysis. In Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE 2013, Saint Petersburg, Russian Federation, August 18-26, 2013. ACM, 114–124. Google ScholarDigital Library
Index Terms
- Decision procedures for path feasibility of string-manipulating programs with complex operations
Recommendations
What is decidable about string constraints with the ReplaceAll function
The theory of strings with concatenation has been widely argued as the basis of constraint solving for verifying string-manipulating programs. However, this theory is far from adequate for expressing many string constraints that are also needed in ...
String solving with word equations and transducers: towards a logic for analysing mutation XSS
POPL '16: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming LanguagesWe study the fundamental issue of decidability of satisfiability over string logics with concatenations and finite-state transducers as atomic operations. Although restricting to one type of operations yields decidability, little is known about the ...
Parikh’s Theorem Made Symbolic
Parikh’s Theorem is a fundamental result in automata theory with numerous applications in computer science. These include software verification (e.g. infinite-state verification, string constraints, and theory of arrays), verification of cryptographic ...
Comments