research-article

Open Access

Decision procedures for path feasibility of string-manipulating programs with complex operations

Authors:
Taolue Chen

Birkbeck University of London, UK

Birkbeck University of London, UK
View Profile

,
Matthew Hague

Royal Holloway University of London, UK

Royal Holloway University of London, UK
View Profile

,
Anthony W. Lin

University of Oxford, UK

University of Oxford, UK
View Profile

,
Philipp Rümmer

Uppsala University, Sweden

Uppsala University, Sweden
View Profile

,
Zhilin Wu

Institute of Software at Chinese Academy of Sciences, China

Institute of Software at Chinese Academy of Sciences, China
View Profile

Proceedings of the ACM on Programming Languages Volume 3 Issue POPLArticle No.: 49pp 1–30https://doi.org/10.1145/3290362

Published:02 January 2019Publication History

Related Artifact: OSTRICH String Constraint Solver and Results November 2018 software https://doi.org/10.1145/3291630

Proceedings of the ACM on Programming Languages

Abstract

The design and implementation of decision procedures for checking path feasibility in string-manipulating programs is an important problem, with such applications as symbolic execution of programs with strings and automated detection of cross-site scripting (XSS) vulnerabilities in web applications. A (symbolic) path is given as a finite sequence of assignments and assertions (i.e. without loops), and checking its feasibility amounts to determining the existence of inputs that yield a successful execution. Modern programming languages (e.g. JavaScript, PHP, and Python) support many complex string operations, and strings are also often implicitly modified during a computation in some intricate fashion (e.g. by some autoescaping mechanisms).

In this paper we provide two general semantic conditions which together ensure the decidability of path feasibility: (1) each assertion admits regular monadic decomposition (i.e. is an effectively recognisable relation), and (2) each assignment uses a (possibly nondeterministic) function whose inverse relation preserves regularity. We show that the semantic conditions are expressive since they are satisfied by a multitude of string operations including concatenation, one-way and two-way finite-state transducers, replaceall functions (where the replacement string could contain variables), string-reverse functions, regular-expression matching, and some (restricted) forms of letter-counting/length functions. The semantic conditions also strictly subsume existing decidable string theories (e.g. straight-line fragments, and acyclic logics), and most existing benchmarks (e.g. most of Kaluza’s, and all of SLOG’s, Stranger’s, and SLOTH’s benchmarks). Our semantic conditions also yield a conceptually simple decision procedure, as well as an extensible architecture of a string solver in that a user may easily incorporate his/her own string functions into the solver by simply providing code for the pre-image computation without worrying about other parts of the solver. Despite these, the semantic conditions are unfortunately too general to provide a fast and complete decision procedure. We provide strong theoretical evidence for this in the form of complexity results. To rectify this problem, we propose two solutions. Our main solution is to allow only partial string functions (i.e., prohibit nondeterminism) in condition (2). This restriction is satisfied in many cases in practice, and yields decision procedures that are effective in both theory and practice. Whenever nondeterministic functions are still needed (e.g. the string function split), our second solution is to provide a syntactic fragment that provides a support of nondeterministic functions, and operations like one-way transducers, replaceall (with constant replacement string), the string-reverse function, concatenation, and regular-expression matching. We show that this fragment can be reduced to an existing solver SLOTH that exploits fast model checking algorithms like IC3.

We provide an efficient implementation of our decision procedure (assuming our first solution above, i.e., deterministic partial string functions) in a new string solver OSTRICH. Our implementation provides built-in support for concatenation, reverse, functional transducers (FFT), and replaceall and provides a framework for extensibility to support further string functions. We demonstrate the efficacy of our new solver against other competitive solvers.

Supplemental Material

a49-wu.webm

webm

58.2 MB

Download

References

Parosh Aziz Abdulla, Mohamed Faouzi Atig, Yu-Fang Chen, Bui Phi Diep, Lukás Holík, Ahmed Rezine, and Philipp Rümmer. 2017. Flatten and conquer: a framework for efficient analysis of string constraints. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, Barcelona, Spain, June 18-23, 2017. ACM, New York, NY, USA, 602–617. Google ScholarDigital Library
Parosh Aziz Abdulla, Mohamed Faouzi Atig, Yu-Fang Chen, Bui Phi Diep, Lukás Holík, Ahmed Rezine, and Philipp Rümmer. 2018. TRAU: SMT Solver for String Constraints. In Formal Methods in Computer Aided Design, FMCAD 2018. To appear.Google Scholar
Parosh Aziz Abdulla, Mohamed Faouzi Atig, Yu-Fang Chen, Lukás Holík, Ahmed Rezine, Philipp Rümmer, and Jari Stenman. 2014. String Constraints for Verification. In Computer Aided Verification - 26th International Conference, CAV 2014. Springer, 150–166. Google ScholarDigital Library
Rajeev Alur and Jyotirmoy V. Deshmukh. 2011. Nondeterministic Streaming String Transducers. In Automata, Languages and Programming - 38th International Colloquium, ICALP 2011, Zurich, Switzerland, July 4-8, 2011, Proceedings, Part II. Springer, 1–20. Google ScholarDigital Library
Christel Baier and Joost-Pieter Katoen. 2008. Principles of Model Checking (Representation and Mind Series). The MIT Press. Google ScholarDigital Library
Pablo Barceló, Diego Figueira, and Leonid Libkin. 2013. Graph Logics with Rational Relations. Logical Methods in Computer Science 9, 3 (2013).Google Scholar
Clark W. Barrett, Roberto Sebastiani, Sanjit A. Seshia, and Cesare Tinelli. 2009. Satisfiability Modulo Theories. In Handbook of Satisfiability. IOS Press, 825–885.Google Scholar
Michael Benedikt, Leonid Libkin, Thomas Schwentick, and Luc Segoufin. 2003. Definable relations and first-order query languages over strings. J. ACM 50, 5 (2003), 694–751. Google ScholarDigital Library
Murphy Berzish, Vijay Ganesh, and Yunhui Zheng. 2017. Z3str3: A string solver with theory-aware heuristics. In 2017 Formal Methods in Computer Aided Design, FMCAD 2017, Vienna, Austria, October 2-6, 2017. IEEE, 55–59. Google ScholarDigital Library
Nikolaj Bjørner, Nikolai Tillmann, and Andrei Voronkov. 2009. Path feasibility analysis for string-manipulating programs. In Tools and Algorithms for the Construction and Analysis of Systems, TACAS 2009. Springer, 307–321. Google ScholarDigital Library
J Richard Büchi and Steven Senger. 1990. Definability in the existential theory of concatenation and undecidable extensions of this theory. In The Collected Works of J. Richard Büchi. Springer, 671–683.Google Scholar
Thierry Cachat and Igor Walukiewicz. 2007. The Complexity of Games on Higher Order Pushdown Automata. CoRR abs/0705.0262 (2007). arXiv: 0705.0262 http://arxiv.org/abs/0705.0262Google Scholar
Cristian Cadar, Daniel Dunbar, and Dawson Engler. 2008. KLEE: Unassisted and Automatic Generation of High-coverage Tests for Complex Systems Programs. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation (OSDI’08). USENIX Association, Berkeley, CA, USA, 209–224. http://dl.acm.org/citation.cfm?id=1855741. 1855756 Google ScholarDigital Library
Cristian Cadar, Vijay Ganesh, Peter M. Pawlowski, David L. Dill, and Dawson R. Engler. 2006. EXE: Automatically Generating Inputs of Death. In Proceedings of the 13th ACM Conference on Computer and Communications Security (CCS ’06). ACM, New York, NY, USA, 322–335. Google ScholarDigital Library
Cristian Cadar and Koushik Sen. 2013. Symbolic Execution for Software Testing: Three Decades Later. Commun. ACM 56, 2 (Feb. 2013), 82–90. Google ScholarDigital Library
Olivier Carton, Christian Choffrut, and Serge Grigorieff. 2006. Decision problems among the main subfamilies of rational relations. ITA 40, 2 (2006), 255–275.Google Scholar
Taolue Chen, Yan Chen, Matthew Hague, Anthony W. Lin, and Zhilin Wu. 2018a. What is decidable about string constraints with the ReplaceAll function. PACMPL 2, POPL (2018), 3:1–3:29. Google ScholarDigital Library
Taolue Chen, Matthew Hague, Anthony W. Lin, Philipp Rümmer, and Zhilin Wu. 2018b. Decision Procedures for Path Feasibility of String-Manipulating Programs with Complex Operations. CoRR abs/1811.03167 (2018). arXiv: 1811.03167 https://arxiv.org/abs/1811.03167Google Scholar
Yan Chen. 2018a. Solving String Constraints with ReplaceAll Function. Master’s thesis. State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, China.Google Scholar
Yan Chen. 2018b. Z3-replaceall. https://github.com/TinyYan/z3- replaceAll . Referred in Jan 2018.Google Scholar
Christian Choffrut. 2006. Relations over Words and Logic: A Chronology. Bulletin of the EATCS 89 (2006), 159–163.Google Scholar
Loris D’Antoni and Margus Veanes. 2013. Static Analysis of String Encoders and Decoders. In Verification, Model Checking, and Abstract Interpretation, VMCAI. Springer, 209–228. Google ScholarDigital Library
Leonardo De Moura and Nikolaj Bjørner. 2011. Satisfiability modulo theories: introduction and applications. Commun. ACM 54, 9 (2011), 69–77. Google ScholarDigital Library
J. Dénes. 1967. Connections between transformation semigroups and graphs. In Theory of Graphs. Gordon & Breach.Google Scholar
Volker Diekert. 2002. Makanin’s Algorithm. In Algebraic Combinatorics on Words, M. Lothaire (Ed.). Encyclopedia of Mathematics and its Applications, Vol. 90. Cambridge University Press, Chapter 12, 387–442.Google Scholar
Joost Engelfriet and Hendrik Jan Hoogeboom. 2001. MSO definable string transductions and two-way finite-state transducers. ACM Trans. Comput. Log. 2, 2 (2001), 216–254. Google ScholarDigital Library
Emmanuel Filiot, Olivier Gauwin, Pierre-Alain Reynier, and Frédéric Servais. 2013. From Two-Way to One-Way Finite State Transducers. In 28th Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2013, New Orleans, LA, USA, June 25-28, 2013. IEEE Computer Society, 468–477. Google ScholarDigital Library
Vijay Ganesh, Mia Minnes, Armando Solar-Lezama, and Martin C. Rinard. 2012. Word Equations with Length Constraints: What’s Decidable?. In Hardware and Software: Verification and Testing - 8th International Haifa Verification Conference, HVC 2012, Haifa, Israel, November 6-8, 2012. Revised Selected Papers. Springer, 209–226. Google ScholarDigital Library
Patrice Godefroid, Nils Klarlund, and Koushik Sen. 2005. DART: Directed Automated Random Testing. SIGPLAN Not. 40, 6 (June 2005), 213–223. Google ScholarDigital Library
Lukás Holík, Petr Janku, Anthony W. Lin, Philipp Rümmer, and Tomás Vojnar. 2018. String constraints with concatenation and transducers solved efficiently. PACMPL 2, POPL (2018), 4:1–4:32. Google ScholarDigital Library
Pieter Hooimeijer, Benjamin Livshits, David Molnar, Prateek Saxena, and Margus Veanes. 2011. Fast and Precise Sanitizer Analysis with BEK. In 20th USENIX Security Symposium, San Francisco, CA, USA, August 8-12, 2011, Proceedings. USENIX Association. http://static.usenix.org/events/sec11/tech/full_papers/Hooimeijer.pdf Google ScholarDigital Library
John E. Hopcroft and Jeffrey D. Ullman. 1979. Introduction to Automata Theory, Languages and Computation. Addison-Wesley. Google ScholarDigital Library
Artur Jez. 2016. Recompression: A Simple and Powerful Technique for Word Equations. J. ACM 63, 1 (2016), 4:1–4:51. Google ScholarDigital Library
Neil D. Jones. 2001. The expressive power of higher-order types or, life without CONS. Journal of Functional Programming 11, 1 (2001), 55–94. http://journals.cambridge.org/action/displayAbstract?aid=68581 Google ScholarDigital Library
Christoph Kern. 2014. Securing the tangled web. Commun. ACM 57, 9 (2014), 38–47. Google ScholarDigital Library
Adam Kiezun, Vijay Ganesh, Shay Artzi, Philip J. Guo, Pieter Hooimeijer, and Michael D. Ernst. 2012. HAMPI: A solver for word equations over strings, regular expressions, and context-free grammars. ACM Trans. Softw. Eng. Methodol. 21, 4 (2012), 25. Google ScholarDigital Library
James C. King. 1976. Symbolic Execution and Program Testing. Commun. ACM 19, 7 (1976), 385–394. Google ScholarDigital Library
Dexter Kozen. 1997. Automata and Computability. Springer. Google ScholarDigital Library
Daniel Kroening and Ofer Strichman. 2008. Decision Procedures. Springer.Google Scholar
Tianyi Liang, Andrew Reynolds, Cesare Tinelli, Clark Barrett, and Morgan Deters. 2014. A DPLL(T) Theory Solver for a Theory of Strings and Regular Expressions. In Computer Aided Verification - 26th International Conference, CAV 2014. Springer, 646–662. Google ScholarDigital Library
Leonid Libkin. 2003. Variable independence for first-order definable constraints. ACM Trans. Comput. Log. 4, 4 (2003), 431–451. Google ScholarDigital Library
Anthony W. Lin and Pablo Barceló. 2016. String Solving with Word Equations and Transducers: Towards a Logic for Analysing Mutation XSS. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’16). Springer, 123–136. Google ScholarDigital Library
K. L. McMillan. 1993. Symbolic model checking. Kluwer. Google ScholarDigital Library
Yasuhiko Minamide. 2005. Static approximation of dynamically generated Web pages. In Proceedings of the 14th international conference on World Wide Web, WWW 2005. ACM, 432–441. Google ScholarDigital Library
Christophe Morvan. 2000. On Rational Graphs. In Foundations of Software Science and Computation Structures, Third International Conference, FOSSACS 2000. Springer, 252–266. Google ScholarDigital Library
Robert Nieuwenhuis, Albert Oliveras, and Cesare Tinelli. 2004. Abstract DPLL and Abstract DPLL Modulo Theories. In Logic for Programming, Artificial Intelligence, and Reasoning, 11th International Conference, LPAR 2004 (LNCS), Vol. 3452. Springer, 36–50.Google Scholar
Nicholas Pippenger. 2010. Theories of Computability. Cambridge University Press. Google ScholarDigital Library
Wojciech Plandowski. 2004. Satisfiability of word equations with constants is in PSPACE. J. ACM 51, 3 (2004), 483–496. Google ScholarDigital Library
Philipp Rümmer. 2008. A Constraint Sequent Calculus for First-Order Logic with Linear Integer Arithmetic. In Logic for Programming, Artificial Intelligence, and Reasoning, 15th International Conference, LPAR 2008, LNCS 5330. Springer, 274–289. Google ScholarDigital Library
Prateek Saxena, Devdatta Akhawe, Steve Hanna, Feng Mao, Stephen McCamant, and Dawn Song. 2010. A Symbolic Execution Framework for JavaScript. In 31st IEEE Symposium on Security and Privacy, S&P 2010, 16-19 May 2010, Berleley/Oakland, California, USA. IEEE, 513–528. Google ScholarDigital Library
Koushik Sen, Swaroop Kalasapur, Tasneem G. Brutch, and Simon Gibbs. 2013. Jalangi: a selective record-replay and dynamic analysis framework for JavaScript. In Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE’13, Saint Petersburg, Russian Federation, August 18-26, 2013. ACM, 488–498. Google ScholarDigital Library
Koushik Sen, Darko Marinov, and Gul Agha. 2005. CUTE: a concolic unit testing engine for C. In Proceedings of the 10th European Software Engineering Conference held jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2005, Lisbon, Portugal, September 5-9, 2005, ESEC/SIGSOFT FSE 2005. ACM, 263–272. Google ScholarDigital Library
Minh-Thai Trinh, Duc-Hiep Chu, and Joxan Jaffar. 2014. S3: A Symbolic String Solver for Vulnerability Detection in Web Applications. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, CCS 2014. ACM, 1232–1243. Google ScholarDigital Library
Minh-Thai Trinh, Duc-Hiep Chu, and Joxan Jaffar. 2016. Progressive Reasoning over Recursively-Defined Strings. In Computer Aided Verification - 28th International Conference, CAV 2016, Toronto, ON, Canada, July 17-23, 2016, Proceedings, Part I. Springer, 218–240.Google Scholar
Andrew van der Stock, Brian Glas, Neil Smithline, and Torsten Gigler. 2017. OWASP Top 10 – 2017. https://www.owasp. org/index.php/Top_10- 2017_Top_10 . Referred January 2018.Google Scholar
Margus Veanes, Nikolaj Bjørner, Lev Nachmanson, and Sergey Bereg. 2017. Monadic Decomposition. J. ACM 64, 2 (2017), 14:1–14:28. Google ScholarDigital Library
Hung-En Wang, Tzung-Lin Tsai, Chun-Han Lin, Fang Yu, and Jie-Hong R. Jiang. 2016. String Analysis via Automata Manipulation with Logic Circuit Representation. In Computer Aided Verification - 28th International Conference, CAV 2016, Toronto, ON, Canada, July 17-23, 2016, Proceedings, Part I (Lecture Notes in Computer Science), Vol. 9779. Springer, 241–260.Google Scholar
Joel Weinberger, Prateek Saxena, Devdatta Akhawe, Matthew Finifter, Eui Chul Richard Shin, and Dawn Song. 2011. A Systematic Analysis of XSS Sanitization in Web Application Frameworks. In Computer Security - ESORICS 2011 - 16th European Symposium on Research in Computer Security, Leuven, Belgium, September 12-14, 2011. Proceedings. Springer, 150–171. Google ScholarDigital Library
Fang Yu, Muath Alkhalaf, and Tevfik Bultan. 2010. Stranger: An Automata-Based String Analysis Tool for PHP. In Tools and Algorithms for the Construction and Analysis of Systems, 16th International Conference, TACAS 2010. Springer, 154–157. Google ScholarDigital Library
Fang Yu, Muath Alkhalaf, Tevfik Bultan, and Oscar H. Ibarra. 2014. Automata-based Symbolic String Analysis for Vulnerability Detection. Form. Methods Syst. Des. 44, 1 (2014), 44–70. Google ScholarDigital Library
Bohdan Zelinka. 1981. Graphs of semigroups. Časopis pro pěstování matematiky 106, 4 (1981), 407–408. http://eudml.org/ doc/19323Google Scholar
Yunhui Zheng, Vijay Ganesh, Sanu Subramanian, Omer Tripp, Julian Dolby, and Xiangyu Zhang. 2015. Effective Search-Space Pruning for Solvers of String Equations, Regular Expressions and Length Constraints. In Computer Aided Verification -27th International Conference, CAV 2015, San Francisco, CA, USA, July 18-24, 2015, Proceedings, Part I. Springer, 235–254.Google Scholar
Yunhui Zheng, Xiangyu Zhang, and Vijay Ganesh. 2013. Z3-str: a Z3-based string solver for web application analysis. In Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE 2013, Saint Petersburg, Russian Federation, August 18-26, 2013. ACM, 114–124. Google ScholarDigital Library

Index Terms

Decision procedures for path feasibility of string-manipulating programs with complex operations
1. Theory of computation

Recommendations

What is decidable about string constraints with the ReplaceAll function

The theory of strings with concatenation has been widely argued as the basis of constraint solving for verifying string-manipulating programs. However, this theory is far from adequate for expressing many string constraints that are also needed in ...
Read More
String solving with word equations and transducers: towards a logic for analysing mutation XSS
POPL '16: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages

We study the fundamental issue of decidability of satisfiability over string logics with concatenations and finite-state transducers as atomic operations. Although restricting to one type of operations yields decidability, little is known about the ...
Read More
Parikh’s Theorem Made Symbolic

Parikh’s Theorem is a fundamental result in automata theory with numerous applications in computer science. These include software verification (e.g. infinite-state verification, string constraints, and theory of arrays), verification of cryptographic ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Proceedings of the ACM on Programming Languages Volume 3, Issue POPL
January 2019
2275 pages
EISSN:2475-1421
DOI:10.1145/3302515
Issue’s Table of Contents

Copyright © 2019 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 January 2019
Published in pacmpl Volume 3, Issue POPL

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
- Artifacts Available
- Artifacts Evaluated & Reusable
Author Tags
Decision Procedures
ReplaceAll
Reverse
Straight-Line Programs
String Constraints
Transducers
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 41
  Total Citations
  View Citations
- 1,006
  Total Downloads
- Downloads (Last 12 months)176
- Downloads (Last 6 weeks)17
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Decision procedures for path feasibility of string-manipulating programs with complex operations

Proceedings of the ACM on Programming Languages

Abstract

Supplemental Material

References

Cited By

Index Terms

Recommendations

What is decidable about string constraints with the ReplaceAll function

String solving with word equations and transducers: towards a logic for analysing mutation XSS

Parikh’s Theorem Made Symbolic