skip to main content
10.1145/2884781.2884808acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

SWIM: synthesizing what i mean: code search and idiomatic snippet synthesis

Published:14 May 2016Publication History

ABSTRACT

Modern programming frameworks come with large libraries, with diverse applications such as for matching regular expressions, parsing XML files and sending email. Programmers often use search engines such as Google and Bing to learn about existing APIs. In this paper, we describe SWIM, a tool which suggests code snippets given API-related natural language queries such as "generate md5 hash code". The query does not need to contain framework-specific trivia such as the type names or methods of interest.

We translate user queries into the APIs of interest using clickthrough data from the Bing search engine. Then, based on patterns learned from open-source code repositories, we synthesize idiomatic code describing the use of these APIs. We introduce structured call sequences to capture API-usage patterns. Structured call sequences are a generalized form of method call sequences, with if-branches and while-loops to represent conditional and repeated API usage patterns, and are simple to extract and amenable to synthesis.

We evaluated swim with 30 common C# API-related queries received by Bing. For 70% of the queries, the first suggested snippet was a relevant solution, and a relevant solution was present in the top 10 results for all benchmarked queries. The online portion of the workflow is also very responsive, at an average of 1.5 seconds per snippet.

References

  1. Apache Lucene. https://lucene.apache.org, 2015. Accessed on August 18, 2015.Google ScholarGoogle Scholar
  2. A. Allamanis, D. Tarlow, A. Gordon, and Y. Wei. Bimodal modelling of source code and natural language. In Proceedings of the 32th International Conference on Machine Learning, ICML 2015, 2015.Google ScholarGoogle Scholar
  3. M. Allamanis and C. Sutton. Mining source code repositories at massive scale using language modeling. In Proceedings of the 10th Working Conference on Mining Software Repositories, MSR '13, pages 207--216. IEEE Press, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Alur, P. Černý, M. Parthasarathy, and W. Nam. Synthesis of interface specifications for Java classes. In Proceedings of the 32nd Symposium on Principles of Programming Languages, POPL '05, pages 98--109. ACM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. Brown, V. D. Pietra, S. D. Pietra, and R. Mercer. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2):263--311, June 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Chatterjee, S. Juvekar, and K. Sen. SNIFF: A search engine for Java using free-form queries. In Fundamental Approaches to Software Engineering, volume 5503 of Lecture Notes in Computer Science, pages 385--400. Springer, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. DARPA. Mining and understanding software enclaves (MUSE). http://www.darpa.mil/program/mining-and-understanding-software-enclaves, 2014. Accessed on August 18, 2015.Google ScholarGoogle Scholar
  8. J. Galenson, P. Reames, R. Bodik, B. Hartmann, and K. Sen. CodeHint: Dynamic and interactive synthesis of code snippets. In Proceedings of the 36th International Conference on Software Engineering, ICSE 2014, pages 653--663. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Gao and J.-Y. Nie. Towards concept-based translation models using search logs for query expansion. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM '12, pages 1:1--1:10. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. N. Gruska, A. Wasylkowski, and A. Zeller. Learning from 6,000 projects: Lightweight cross-project anomaly detection. In Proceedings of the 19th International Symposium on Software Testing and Analysis, ISSTA '10, pages 119--130. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. T. Gvero and V. Kuncak. Interactive synthesis using free-form queries. In Proceedings of the 37th International Conference on Software Engineering, volume 2 of ICSE 2015, pages 689--692, May 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. T. Gvero, V. Kuncak, I. Kuraj, and R. Piskac. Complete completion using types and weights. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '13, pages 27--38. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T. Gvero, V. Kuncak, and R. Piskac. Interactive synthesis of code snippets. In Computer Aided Verification, volume 6806 of Lecture Notes in Computer Science, pages 418--423. Springer, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. G. Kaiser, P. Feiler, and S. Popovich. Intelligent assistance for software development and maintenance. IEEE Software, 5(3):40--49, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. I. Keivanloo, J. Rilling, and Y. Zou. Spotting working code examples. In Proceedings of the 36th International Conference on Software Engineering, ICSE 2014, pages 664--675. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. V. Le, S. Gulwani, and Z. Su. SmartSynth: Synthesizing smartphone automation scripts from natural language. In Proceeding of the 11th Annual International Conference on Mobile Systems, Applications, and Services, MobiSys '13, pages 193--206. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. Mandelin, L. Xu, R. Bodík, and D. Kimelman. Jungloid mining: Helping to navigate the API jungle. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '05, pages 48--61. ACM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. Mishne, S. Shoham, and E. Yahav. Typestate-based semantic code search over partial programs. In Proceedings of the International Conference on Object Oriented Programming Systems Languages and Applications, OOPSLA '12, pages 997--1016. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Monperrus and M. Mezini. Detecting missing method calls as violations of the majority rule. ACM Transactions on Software Engineering and Methodology, 22(1):7:1--7:25, Mar. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. K. Ng, M. Warren, P. Golde, and A. Hejlsberg. The Roslyn project: Exposing the C# and VB compiler's code analysis. Microsoft white paper, Oct. 2011.Google ScholarGoogle Scholar
  21. A. T. Nguyen and T. Nguyen. Graph-based statistical language model for code. In Proceedings of the 37th International Conference on Software Engineering, ICSE 2015. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. T. T. Nguyen, H. A. Nguyen, N. Pham, J. Al-Kofahi, and T. Nguyen. Graph-based mining of multiple object usage patterns. In Proceedings of the the 7th Joint Meeting of the European Software Engineering Conference and the Symposium on The Foundations of Software Engineering, ESEC/FSE '09, pages 383--392. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. D. Perelman, S. Gulwani, T. Ball, and D. Grossman. Type-directed completion of partial expressions. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '12, pages 275--286. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. V. Raychev, M. Vechev, and A. Krause. Predicting program properties from "Big Code". In Proceedings of the 42nd Symposium on Principles of Programming Languages, POPL '15, pages 111--124. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. V. Raychev, M. Vechev, and E. Yahav. Code completion with statistical language models. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '14, pages 419--428. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. R. Strom and S. Yemini. Typestate: A programming language concept for enhancing software reliability. IEEE Transactions on Software Engineering, 12(1):157--171, Jan. 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. P. Urzyczyn. Inhabitation in typed lambda-calculi (a syntactic approach). In 3rd International Conference on Typed Lambda Calculi and Applications, pages 373--389. Springer, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Y. Wei, N. Chandrasekaran, S. Gulwani, and Y. Hamadi. Building Bing Developer Assistant. http://research.microsoft.com/apps/pubs/default.aspx?id=245188, 2015.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    ICSE '16: Proceedings of the 38th International Conference on Software Engineering
    May 2016
    1235 pages
    ISBN:9781450339001
    DOI:10.1145/2884781

    Copyright © 2016 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 14 May 2016

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    Overall Acceptance Rate276of1,856submissions,15%

    Upcoming Conference

    ICSE 2025

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader