ABSTRACT
Modern programming frameworks come with large libraries, with diverse applications such as for matching regular expressions, parsing XML files and sending email. Programmers often use search engines such as Google and Bing to learn about existing APIs. In this paper, we describe SWIM, a tool which suggests code snippets given API-related natural language queries such as "generate md5 hash code". The query does not need to contain framework-specific trivia such as the type names or methods of interest.
We translate user queries into the APIs of interest using clickthrough data from the Bing search engine. Then, based on patterns learned from open-source code repositories, we synthesize idiomatic code describing the use of these APIs. We introduce structured call sequences to capture API-usage patterns. Structured call sequences are a generalized form of method call sequences, with if-branches and while-loops to represent conditional and repeated API usage patterns, and are simple to extract and amenable to synthesis.
We evaluated swim with 30 common C# API-related queries received by Bing. For 70% of the queries, the first suggested snippet was a relevant solution, and a relevant solution was present in the top 10 results for all benchmarked queries. The online portion of the workflow is also very responsive, at an average of 1.5 seconds per snippet.
- Apache Lucene. https://lucene.apache.org, 2015. Accessed on August 18, 2015.Google Scholar
- A. Allamanis, D. Tarlow, A. Gordon, and Y. Wei. Bimodal modelling of source code and natural language. In Proceedings of the 32th International Conference on Machine Learning, ICML 2015, 2015.Google Scholar
- M. Allamanis and C. Sutton. Mining source code repositories at massive scale using language modeling. In Proceedings of the 10th Working Conference on Mining Software Repositories, MSR '13, pages 207--216. IEEE Press, 2013. Google ScholarDigital Library
- R. Alur, P. Černý, M. Parthasarathy, and W. Nam. Synthesis of interface specifications for Java classes. In Proceedings of the 32nd Symposium on Principles of Programming Languages, POPL '05, pages 98--109. ACM, 2005. Google ScholarDigital Library
- P. Brown, V. D. Pietra, S. D. Pietra, and R. Mercer. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2):263--311, June 1993. Google ScholarDigital Library
- S. Chatterjee, S. Juvekar, and K. Sen. SNIFF: A search engine for Java using free-form queries. In Fundamental Approaches to Software Engineering, volume 5503 of Lecture Notes in Computer Science, pages 385--400. Springer, 2009. Google ScholarDigital Library
- DARPA. Mining and understanding software enclaves (MUSE). http://www.darpa.mil/program/mining-and-understanding-software-enclaves, 2014. Accessed on August 18, 2015.Google Scholar
- J. Galenson, P. Reames, R. Bodik, B. Hartmann, and K. Sen. CodeHint: Dynamic and interactive synthesis of code snippets. In Proceedings of the 36th International Conference on Software Engineering, ICSE 2014, pages 653--663. ACM, 2014. Google ScholarDigital Library
- J. Gao and J.-Y. Nie. Towards concept-based translation models using search logs for query expansion. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM '12, pages 1:1--1:10. ACM, 2012. Google ScholarDigital Library
- N. Gruska, A. Wasylkowski, and A. Zeller. Learning from 6,000 projects: Lightweight cross-project anomaly detection. In Proceedings of the 19th International Symposium on Software Testing and Analysis, ISSTA '10, pages 119--130. ACM, 2010. Google ScholarDigital Library
- T. Gvero and V. Kuncak. Interactive synthesis using free-form queries. In Proceedings of the 37th International Conference on Software Engineering, volume 2 of ICSE 2015, pages 689--692, May 2015. Google ScholarDigital Library
- T. Gvero, V. Kuncak, I. Kuraj, and R. Piskac. Complete completion using types and weights. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '13, pages 27--38. ACM, 2013. Google ScholarDigital Library
- T. Gvero, V. Kuncak, and R. Piskac. Interactive synthesis of code snippets. In Computer Aided Verification, volume 6806 of Lecture Notes in Computer Science, pages 418--423. Springer, 2011. Google ScholarDigital Library
- G. Kaiser, P. Feiler, and S. Popovich. Intelligent assistance for software development and maintenance. IEEE Software, 5(3):40--49, 1988. Google ScholarDigital Library
- I. Keivanloo, J. Rilling, and Y. Zou. Spotting working code examples. In Proceedings of the 36th International Conference on Software Engineering, ICSE 2014, pages 664--675. ACM, 2014. Google ScholarDigital Library
- V. Le, S. Gulwani, and Z. Su. SmartSynth: Synthesizing smartphone automation scripts from natural language. In Proceeding of the 11th Annual International Conference on Mobile Systems, Applications, and Services, MobiSys '13, pages 193--206. ACM, 2013. Google ScholarDigital Library
- D. Mandelin, L. Xu, R. Bodík, and D. Kimelman. Jungloid mining: Helping to navigate the API jungle. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '05, pages 48--61. ACM, 2005. Google ScholarDigital Library
- A. Mishne, S. Shoham, and E. Yahav. Typestate-based semantic code search over partial programs. In Proceedings of the International Conference on Object Oriented Programming Systems Languages and Applications, OOPSLA '12, pages 997--1016. ACM, 2012. Google ScholarDigital Library
- M. Monperrus and M. Mezini. Detecting missing method calls as violations of the majority rule. ACM Transactions on Software Engineering and Methodology, 22(1):7:1--7:25, Mar. 2013. Google ScholarDigital Library
- K. Ng, M. Warren, P. Golde, and A. Hejlsberg. The Roslyn project: Exposing the C# and VB compiler's code analysis. Microsoft white paper, Oct. 2011.Google Scholar
- A. T. Nguyen and T. Nguyen. Graph-based statistical language model for code. In Proceedings of the 37th International Conference on Software Engineering, ICSE 2015. ACM, 2015. Google ScholarDigital Library
- T. T. Nguyen, H. A. Nguyen, N. Pham, J. Al-Kofahi, and T. Nguyen. Graph-based mining of multiple object usage patterns. In Proceedings of the the 7th Joint Meeting of the European Software Engineering Conference and the Symposium on The Foundations of Software Engineering, ESEC/FSE '09, pages 383--392. ACM, 2009. Google ScholarDigital Library
- D. Perelman, S. Gulwani, T. Ball, and D. Grossman. Type-directed completion of partial expressions. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '12, pages 275--286. ACM, 2012. Google ScholarDigital Library
- V. Raychev, M. Vechev, and A. Krause. Predicting program properties from "Big Code". In Proceedings of the 42nd Symposium on Principles of Programming Languages, POPL '15, pages 111--124. ACM, 2015. Google ScholarDigital Library
- V. Raychev, M. Vechev, and E. Yahav. Code completion with statistical language models. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '14, pages 419--428. ACM, 2014. Google ScholarDigital Library
- R. Strom and S. Yemini. Typestate: A programming language concept for enhancing software reliability. IEEE Transactions on Software Engineering, 12(1):157--171, Jan. 1986. Google ScholarDigital Library
- P. Urzyczyn. Inhabitation in typed lambda-calculi (a syntactic approach). In 3rd International Conference on Typed Lambda Calculi and Applications, pages 373--389. Springer, 1997. Google ScholarDigital Library
- Y. Wei, N. Chandrasekaran, S. Gulwani, and Y. Hamadi. Building Bing Developer Assistant. http://research.microsoft.com/apps/pubs/default.aspx?id=245188, 2015.Google Scholar
Recommendations
SWIM: fostering social network based information search
CHI EA '04: CHI '04 Extended Abstracts on Human Factors in Computing SystemsCompare to searching online information directly, asking friends or finding referral to a human expert is preferred in many information-gathering tasks. It's easier to judge the quality of the information from a personal referral as well as to obtain ...
A study of results overlap and uniqueness among major web search engines
The performance and capabilities of Web search engines is an important and significant area of research. Millions of people world wide use Web search engines very day. This paper reports the results of a major study examining the overlap among results ...
Re-ranking search results using query logs
CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge managementThis work addresses two common problems in search, frequently occurring with underspecified user queries: the top-ranked results for such queries may not contain documents relevant to the user's search intent, and fresh and relevant pages may not get ...
Comments