Abstract
While editing code, it is common for developers to make multiple related repeated edits that are all instances of a more general program transformation. Since this process can be tedious and error-prone, we study the problem of automatically learning program transformations from past edits, which can then be used to predict future edits. We take a novel view of the problem as a semi-supervised learning problem: apart from the concrete edits that are instances of the general transformation, the learning procedure also exploits access to additional inputs (program subtrees) that are marked as positive or negative depending on whether the transformation applies on those inputs. We present a procedure to solve the semi-supervised transformation learning problem using anti-unification and programming-by-example synthesis technology. To eliminate reliance on access to marked additional inputs, we generalize the semi-supervised learning procedure to a feedback-driven procedure that also generates the marked additional inputs in an iterative loop. We apply these ideas to build and evaluate three applications that use different mechanisms for generating feedback. Compared to existing tools that learn program transformations from edits, our feedback-driven semi-supervised approach is vastly more effective in successfully predicting edits with significantly lesser amounts of past edit data.
Supplemental Material
- R. Alur, R. Bodik, G. Juniwal, M. M. K. Martin, M. Raghothaman, S. A. Seshia, R. Singh, A. Solar-Lezama, E. Torlak, and A. Udupa. Syntax-guided synthesis. In 2013 Formal Methods in Computer-Aided Design, pages 1-8, 2013.Google ScholarCross Ref
- R. Alur, M. M. K. Martin, M. Raghothaman, C. Stergiou, S. Tripakis, and A. Udupa. Synthesizing finite-state protocols from scenarios and requirements. In E. Yahav, editor, 10th International Haifa Verification Conference, volume 8855 of Lecture Notes in Computer Science, pages 75-91. Springer, 2014. doi: 10.1007/978-3-319-13338-6_7. URL https: //doi.org/10.1007/978-3-319-13338-6_7. Google ScholarCross Ref
- R. Alur, P. Černy`, and A. Radhakrishna. Synthesis through unification. In International Conference on Computer Aided Verification, pages 163-179. Springer, 2015.Google ScholarCross Ref
- R. Alur, A. Radhakrishna, and A. Udupa. Scaling enumerative program synthesis via divide and conquer. In International Conference on Tools and Algorithms for the Construction and Analysis of Systems, pages 319-336. Springer, 2017.Google ScholarCross Ref
- S. An, R. Singh, S. Misailovic, and R. Samanta. Augmented example-based synthesis using relational perturbation properties. Proceedings of the ACM on Programming Languages, 4 (POPL): 1-24, 2019.Google Scholar
- J. Bader, A. Scott, M. Pradel, and S. Chandra. Getafix: Learning to fix bugs automatically. Proc. ACM Program. Lang., 3 (OOPSLA), Oct. 2019. doi: 10.1145/3360585. URL https://doi.org/10.1145/3360585. Google ScholarDigital Library
- A. Bessey, K. Block, B. Chelf, A. Chou, B. Fulton, S. Hallem, C. Henri-Gros, A. Kamsky, S. McPeak, and D. Engler. A few billion lines of code later. Commun. ACM, 53 ( 2 ), 2010. ISSN 0001-0782. doi: 10.1145/1646353.1646374. URL https://doi.org/10.1145/1646353.1646374. Google ScholarDigital Library
- J. R. Buchi and L. H. Landweber. Solving sequential conditions by finite-state strategies. Transactions of the American Mathematical Society, 138 : 295-311, 1969. ISSN 00029947. URL http://www.jstor.org/stable/1994916.Google ScholarCross Ref
- P. Cerný, K. Chatterjee, T. A. Henzinger, A. Radhakrishna, and R. Singh. Quantitative synthesis for concurrent programs. In G. Gopalakrishnan and S. Qadeer, editors, 23rd International Conference on Computer Aided Verification (CAV), volume 6806 of Lecture Notes in Computer Science, pages 243-259. Springer, 2011. doi: 10.1007/978-3-642-22110-1_20. URL https://doi.org/10.1007/978-3-642-22110-1_20. Google ScholarCross Ref
- P. Cerný, T. A. Henzinger, A. Radhakrishna, L. Ryzhyk, and T. Tarrach. Eficient synthesis for concurrency by semanticspreserving transformations. In N. Sharygina and H. Veith, editors, 25th International Conference on Computer Aided Verification (CAV), volume 8044 of Lecture Notes in Computer Science, pages 951-967. Springer, 2013. doi: 10.1007/978-3-642-39799-8_68. URL https://doi.org/10.1007/978-3-642-39799-8_68. Google ScholarCross Ref
- Eclipse Foundation. Eclipse. At https://www.eclipse.org/, 2020.Google Scholar
- W. S. Evans, C. W. Fraser, and F. Ma. Clone detection via structural abstraction. Software Quality Journal, 17 ( 4 ): 309-330, 2009.Google ScholarDigital Library
- J. K. Feser, S. Chaudhuri, and I. Dillig. Synthesizing data structure transformations from input-output examples. In D. Grove and S. Blackburn, editors, Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, Portland, OR, USA, June 15-17, 2015, pages 229-239. ACM, 2015. doi: 10.1145/2737924.2737977. URL https://doi.org/10.1145/2737924.2737977. Google ScholarDigital Library
- J. Frankle, P. Osera, D. Walker, and S. Zdancewic. Example-directed synthesis: a type-theoretic interpretation. In R. Bodík and R. Majumdar, editors, Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2016, St. Petersburg, FL, USA, January 20-22, 2016, pages 802-815. ACM, 2016. doi: 10.1145/2837614. 2837629. URL https://doi.org/10.1145/2837614.2837629. Google ScholarDigital Library
- S. Gulwani. Automating string processing in spreadsheets using input-output examples. In Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL '11). ACM New York, NY, USA, 2011.Google ScholarDigital Library
- T. Gvero, V. Kuncak, I. Kuraj, and R. Piskac. Complete completion using types and weights. In Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation, pages 27-38, 2013.Google ScholarDigital Library
- K. Huang, X. Qiu, P. Shen, and Y. Wang. Reconciling enumerative and deductive program synthesis. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 1159-1174, 2020.Google ScholarDigital Library
- JetBrains. IntelliJ. At https://www.jetbrains.com/idea/, 2020a.Google Scholar
- JetBrains. ReSharper. At https://www.jetbrains.com/resharper/, 2020b.Google Scholar
- S. Jha, S. Gulwani, S. A. Seshia, and A. Tiwari. Oracle-guided component-based program synthesis. In 2010 ACM/IEEE 32nd International Conference on Software Engineering, volume 1, pages 215-224. IEEE, 2010.Google ScholarDigital Library
- L. Jiang, G. Misherghi, Z. Su, and S. Glondu. Deckard: Scalable and accurate tree-based detection of code clones. In 29th International Conference on Software Engineering (ICSE'07), pages 96-105. IEEE, 2007.Google ScholarDigital Library
- M. Kim, T. Zimmermann, and N. Nagappan. A field study of refactoring challenges and benefits. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, FSE '12, pages 50 : 1-50 : 11, New York, NY, USA, 2012. ACM. ISBN 978-1-4503-1614-9. doi: 10.1145/2393596.2393655. URL http://doi.acm.org/10.1145/2393596.2393655. Google ScholarDigital Library
- V. Le, D. Perelman, O. Polozov, M. Raza, A. Udupa, and S. Gulwani. Interactive program synthesis. arXiv preprint arXiv:1703.03539, 2017.Google Scholar
- Z. Manna and R. J. Waldinger. A deductive approach to program synthesis. ACM Transactions on Programming Languages and Systems, 2 ( 1 ): 90-121, 1980. doi: 10.1145/357084.357090. URL https://doi.org/10.1145/357084.357090. Google ScholarDigital Library
- M. Mayer, G. Soares, M. Grechkin, V. Le, M. Marron, O. Polozov, R. Singh, B. Zorn, and S. Gulwani. User interaction models for disambiguation in programming by example. In Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology, pages 291-301, 2015.Google ScholarDigital Library
- N. Meng, M. Kim, and K. S. McKinley. Systematic editing: generating program transformations from an example. ACM SIGPLAN Notices, 46 ( 6 ): 329-342, 2011.Google Scholar
- N. Meng, M. Kim, and K. S. McKinley. Lase: locating and applying systematic edits by learning from examples. In 2013 35th International Conference on Software Engineering (ICSE), pages 502-511. IEEE, 2013.Google ScholarDigital Library
- T. Mens and T. Tourwe. A survey of software refactoring. IEEE Transactions on Software Engineering, 30 ( 2 ): 126-139, Feb 2004. ISSN 0098-5589. doi: 10.1109/TSE. 2004. 1265817.Google ScholarDigital Library
- Microsoft. Visual Studio. At https://www.visualstudio.com, 2019.Google Scholar
- Microsoft. Intellicode suggestions. At https://docs.microsoft.com/en-us/visualstudio/intellicode/intellicode-suggestions, 2020.Google Scholar
- A. Miltner, S. Gulwani, V. Le, A. Leung, A. Radhakrishna, G. Soares, A. Tiwari, and A. Udupa. On the fly synthesis of edit suggestions. Proceedings of the ACM on Programming Languages, 3 (OOPSLA): 1-29, 2019.Google ScholarDigital Library
- T. M. Mitchell. Generalization as search. Artificial intelligence, 18 ( 2 ): 203-226, 1982.Google Scholar
- H. A. Nguyen, A. T. Nguyen, T. T. Nguyen, T. N. Nguyen, and H. Rajan. A study of repetitiveness of code changes in software evolution. In 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 180-190. IEEE, 2013.Google ScholarDigital Library
- W. F. Opdyke. Refactoring Object-oriented Frameworks. PhD thesis, Champaign, IL, USA, 1992. UMI Order No. GAX93-05645.Google Scholar
- D. Perelman, S. Gulwani, T. Ball, and D. Grossman. Type-directed completion of partial expressions. In Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation, pages 275-286, 2012.Google ScholarDigital Library
- A. Pnueli and R. Rosner. On the synthesis of a reactive module. In Proceedings of the 16th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 179-190, 1989.Google ScholarDigital Library
- O. Polozov and S. Gulwani. Flashmeta: a framework for inductive program synthesis. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, pages 107-126, 2015.Google ScholarDigital Library
- V. Raychev, M. Vechev, and E. Yahav. Code completion with statistical language models. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 419-428, 2014.Google ScholarDigital Library
- A. Reynolds, M. Deters, V. Kuncak, C. Tinelli, and C. Barrett. Counterexample-guided quantifier instantiation for synthesis in smt. In International Conference on Computer Aided Verification, pages 198-216. Springer, 2015.Google ScholarCross Ref
- R. Rolim, G. Soares, L. D'Antoni, O. Polozov, S. Gulwani, R. Gheyi, R. Suzuki, and B. Hartmann. Learning syntactic program transformations from examples. In 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE), pages 404-415. IEEE, 2017.Google ScholarDigital Library
- R. Rolim, G. Soares, R. Gheyi, T. Barik, and L. D'Antoni. Learning quick fixes from code repositories, 2018.Google Scholar
- R. Singh. Blinkfill: Semi-supervised programming by example for syntactic string transformations. Proc. VLDB Endow., 9 ( 10 ): 816-827, June 2016. ISSN 2150-8097. doi: 10.14778/2977797.2977807. URL https://doi.org/10.14778/2977797.2977807. Google ScholarDigital Library
- R. Singh and A. Solar-Lezama. Synthesizing data structure manipulations from storyboards. In T. Gyimóthy and A. Zeller, editors, SIGSOFT/FSE'11 19th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE-19) and ESEC'11: 13th European Software Engineering Conference (ESEC-13), Szeged, Hungary, September 5-9, 2011, pages 289-299. ACM, 2011. doi: 10.1145/2025113.2025153. URL https://doi.org/10.1145/2025113.2025153. Google ScholarDigital Library
- A. Solar-Lezama, R. M. Rabbah, R. Bodík, and K. Ebcioglu. Programming by sketching for bit-streaming programs. In V. Sarkar and M. W. Hall, editors, Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation, Chicago, IL, USA, June 12-15, 2005, pages 281-294. ACM, 2005. doi: 10.1145/1065010.1065045. URL https://doi.org/10.1145/1065010.1065045. Google ScholarDigital Library
- A. Solar-Lezama, L. Tancau, R. Bodik, S. Seshia, and V. Saraswat. Combinatorial sketching for finite programs. In Proceedings of the 12th international conference on Architectural support for programming languages and operating systems, pages 404-415, 2006.Google ScholarDigital Library
- A. Solar-Lezama, L. Tancau, R. Bodík, S. A. Seshia, and V. A. Saraswat. Combinatorial sketching for finite programs. In J. P. Shen and M. Martonosi, editors, Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2006, San Jose, CA, USA, October 21-25, 2006, pages 404-415. ACM, 2006. doi: 10.1145/1168857.1168907. URL https://doi.org/10.1145/1168857.1168907. Google ScholarDigital Library
- A. Solar-Lezama, G. Arnold, L. Tancau, R. Bodík, V. A. Saraswat, and S. A. Seshia. Sketching stencils. In J. Ferrante and K. S. McKinley, editors, Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation, San Diego, California, USA, June 10-13, 2007, pages 167-178. ACM, 2007. doi: 10.1145/1250734.1250754. URL https://doi.org/10.1145/1250734.1250754. Google ScholarDigital Library
- A. Solar-Lezama, C. G. Jones, and R. Bodík. Sketching concurrent data structures. In R. Gupta and S. P. Amarasinghe, editors, Proceedings of the ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation, Tucson, AZ, USA, June 7-13, 2008, pages 136-148. ACM, 2008. doi: 10.1145/1375581.1375599. URL https://doi.org/10.1145/1375581.1375599. Google ScholarDigital Library
- F. Steimann and J. von Pilgrim. Refactorings without names. In Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering, ASE 2012, pages 290-293, New York, NY, USA, 2012. ACM. ISBN 978-1-4503-1204-2. doi: 10.1145/2351676.2351726. URL http://doi.acm.org/10.1145/2351676.2351726. Google ScholarDigital Library
- A. Udupa, A. Raghavan, J. V. Deshmukh, S. Mador-Haim, M. M. Martin, and R. Alur. Transit: specifying protocols with concolic snippets. ACM SIGPLAN Notices, 48 ( 6 ): 287-296, 2013.Google Scholar
- M. Vakilian, N. Chen, S. Negara, B. A. Rajkumar, B. P. Bailey, and R. E. Johnson. Use, disuse, and misuse of automated refactorings. In 2012 34th International Conference on Software Engineering (ICSE), pages 233-243. IEEE, 2012.Google ScholarCross Ref
- M. T. Vechev, E. Yahav, and G. Yorsh. Abstraction-guided synthesis of synchronization. In M. V. Hermenegildo and J. Palsberg, editors, Proceedings of the 37th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2010, Madrid, Spain, January 17-23, 2010, pages 327-338. ACM, 2010. doi: 10.1145/1706299.1706338. URL https: //doi.org/10.1145/1706299.1706338. Google ScholarDigital Library
- N. Yaghmazadeh, X. Wang, and I. Dillig. Automated migration of hierarchical data to relational tables using programmingby-example. Proc. VLDB Endow., 11 ( 5 ): 580-593, 2018. doi: 10.1145/3187009.3177735. URL http://www.vldb.org/pvldb/ vol11/p580-yaghmazadeh.pdf.Google ScholarDigital Library
- K. Yessenov, Z. Xu, and A. Solar-Lezama. Data-driven synthesis for object-oriented frameworks. ACM SIGPLAN Notices, 46 ( 10 ): 65-82, 2011.Google Scholar
- H. Zhang, A. Jain, G. Khandelwal, C. Kaushik, S. Ge, and W. Hu. Bing developer assistant: improving developer productivity by recommending sample code. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pages 956-961, 2016.Google ScholarDigital Library
- X. Zhu and A. B. Goldberg. Introduction to semi-supervised learning. Synthesis lectures on artificial intelligence and machine learning, 3 ( 1 ): 1-130, 2009.Google Scholar
- X. J. Zhu. Semi-supervised learning literature survey. Technical report, University of Wisconsin-Madison Department of Computer Sciences, 2005.Google Scholar
Index Terms
- Feedback-driven semi-supervised synthesis of program transformations
Recommendations
Learning syntactic program transformations from examples
ICSE '17: Proceedings of the 39th International Conference on Software EngineeringAutomatic program transformation tools can be valuable for programmers to help them with refactoring tasks, and for Computer Science students in the form of tutoring systems that suggest repairs to programming assignments. However, manually creating ...
Spreadsheet table transformations from examples
PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and ImplementationEvery day, millions of computer end-users need to perform tasks over large, tabular data, yet lack the programming knowledge to do such tasks automatically. In this work, we present an automatic technique that takes from a user an example of how the ...
Combinatorially efficient exploration of program transformations for automatic programming
ICCOMP'05: Proceedings of the 9th WSEAS International Conference on ComputersProgram induction, where one or more parts of a potentially huge software system are automatically synthesized, is an emerging technology that will become more and more industrially useful as more computing power becomes available, for example in the ...
Comments