research-article

SWIM: synthesizing what i mean: code search and idiomatic snippet synthesis

Authors:
Mukund Raghothaman

University of Pennsylvania

University of Pennsylvania
View Profile

,
Yi Wei

Microsoft Research, Cambridge

Microsoft Research, Cambridge
View Profile

,
Youssef Hamadi

Laboratoire d'Informatique, École Polytechnique, Palaiseau

Laboratoire d'Informatique, École Polytechnique, Palaiseau
View Profile

ICSE '16: Proceedings of the 38th International Conference on Software EngineeringMay 2016Pages 357–367https://doi.org/10.1145/2884781.2884808

Published:14 May 2016Publication History

ICSE '16: Proceedings of the 38th International Conference on Software Engineering

Pages 357–367

ABSTRACT

Modern programming frameworks come with large libraries, with diverse applications such as for matching regular expressions, parsing XML files and sending email. Programmers often use search engines such as Google and Bing to learn about existing APIs. In this paper, we describe SWIM, a tool which suggests code snippets given API-related natural language queries such as "generate md5 hash code". The query does not need to contain framework-specific trivia such as the type names or methods of interest.

We translate user queries into the APIs of interest using clickthrough data from the Bing search engine. Then, based on patterns learned from open-source code repositories, we synthesize idiomatic code describing the use of these APIs. We introduce structured call sequences to capture API-usage patterns. Structured call sequences are a generalized form of method call sequences, with if-branches and while-loops to represent conditional and repeated API usage patterns, and are simple to extract and amenable to synthesis.

We evaluated swim with 30 common C# API-related queries received by Bing. For 70% of the queries, the first suggested snippet was a relevant solution, and a relevant solution was present in the top 10 results for all benchmarked queries. The online portion of the workflow is also very responsive, at an average of 1.5 seconds per snippet.

References

Apache Lucene. https://lucene.apache.org, 2015. Accessed on August 18, 2015.Google Scholar
A. Allamanis, D. Tarlow, A. Gordon, and Y. Wei. Bimodal modelling of source code and natural language. In Proceedings of the 32th International Conference on Machine Learning, ICML 2015, 2015.Google Scholar
M. Allamanis and C. Sutton. Mining source code repositories at massive scale using language modeling. In Proceedings of the 10th Working Conference on Mining Software Repositories, MSR '13, pages 207--216. IEEE Press, 2013. Google ScholarDigital Library
R. Alur, P. Černý, M. Parthasarathy, and W. Nam. Synthesis of interface specifications for Java classes. In Proceedings of the 32nd Symposium on Principles of Programming Languages, POPL '05, pages 98--109. ACM, 2005. Google ScholarDigital Library
P. Brown, V. D. Pietra, S. D. Pietra, and R. Mercer. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2):263--311, June 1993. Google ScholarDigital Library
S. Chatterjee, S. Juvekar, and K. Sen. SNIFF: A search engine for Java using free-form queries. In Fundamental Approaches to Software Engineering, volume 5503 of Lecture Notes in Computer Science, pages 385--400. Springer, 2009. Google ScholarDigital Library
DARPA. Mining and understanding software enclaves (MUSE). http://www.darpa.mil/program/mining-and-understanding-software-enclaves, 2014. Accessed on August 18, 2015.Google Scholar
J. Galenson, P. Reames, R. Bodik, B. Hartmann, and K. Sen. CodeHint: Dynamic and interactive synthesis of code snippets. In Proceedings of the 36th International Conference on Software Engineering, ICSE 2014, pages 653--663. ACM, 2014. Google ScholarDigital Library
J. Gao and J.-Y. Nie. Towards concept-based translation models using search logs for query expansion. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM '12, pages 1:1--1:10. ACM, 2012. Google ScholarDigital Library
N. Gruska, A. Wasylkowski, and A. Zeller. Learning from 6,000 projects: Lightweight cross-project anomaly detection. In Proceedings of the 19th International Symposium on Software Testing and Analysis, ISSTA '10, pages 119--130. ACM, 2010. Google ScholarDigital Library
T. Gvero and V. Kuncak. Interactive synthesis using free-form queries. In Proceedings of the 37th International Conference on Software Engineering, volume 2 of ICSE 2015, pages 689--692, May 2015. Google ScholarDigital Library
T. Gvero, V. Kuncak, I. Kuraj, and R. Piskac. Complete completion using types and weights. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '13, pages 27--38. ACM, 2013. Google ScholarDigital Library
T. Gvero, V. Kuncak, and R. Piskac. Interactive synthesis of code snippets. In Computer Aided Verification, volume 6806 of Lecture Notes in Computer Science, pages 418--423. Springer, 2011. Google ScholarDigital Library
G. Kaiser, P. Feiler, and S. Popovich. Intelligent assistance for software development and maintenance. IEEE Software, 5(3):40--49, 1988. Google ScholarDigital Library
I. Keivanloo, J. Rilling, and Y. Zou. Spotting working code examples. In Proceedings of the 36th International Conference on Software Engineering, ICSE 2014, pages 664--675. ACM, 2014. Google ScholarDigital Library
V. Le, S. Gulwani, and Z. Su. SmartSynth: Synthesizing smartphone automation scripts from natural language. In Proceeding of the 11th Annual International Conference on Mobile Systems, Applications, and Services, MobiSys '13, pages 193--206. ACM, 2013. Google ScholarDigital Library
D. Mandelin, L. Xu, R. Bodík, and D. Kimelman. Jungloid mining: Helping to navigate the API jungle. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '05, pages 48--61. ACM, 2005. Google ScholarDigital Library
A. Mishne, S. Shoham, and E. Yahav. Typestate-based semantic code search over partial programs. In Proceedings of the International Conference on Object Oriented Programming Systems Languages and Applications, OOPSLA '12, pages 997--1016. ACM, 2012. Google ScholarDigital Library
M. Monperrus and M. Mezini. Detecting missing method calls as violations of the majority rule. ACM Transactions on Software Engineering and Methodology, 22(1):7:1--7:25, Mar. 2013. Google ScholarDigital Library
K. Ng, M. Warren, P. Golde, and A. Hejlsberg. The Roslyn project: Exposing the C# and VB compiler's code analysis. Microsoft white paper, Oct. 2011.Google Scholar
A. T. Nguyen and T. Nguyen. Graph-based statistical language model for code. In Proceedings of the 37th International Conference on Software Engineering, ICSE 2015. ACM, 2015. Google ScholarDigital Library
T. T. Nguyen, H. A. Nguyen, N. Pham, J. Al-Kofahi, and T. Nguyen. Graph-based mining of multiple object usage patterns. In Proceedings of the the 7th Joint Meeting of the European Software Engineering Conference and the Symposium on The Foundations of Software Engineering, ESEC/FSE '09, pages 383--392. ACM, 2009. Google ScholarDigital Library
D. Perelman, S. Gulwani, T. Ball, and D. Grossman. Type-directed completion of partial expressions. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '12, pages 275--286. ACM, 2012. Google ScholarDigital Library
V. Raychev, M. Vechev, and A. Krause. Predicting program properties from "Big Code". In Proceedings of the 42nd Symposium on Principles of Programming Languages, POPL '15, pages 111--124. ACM, 2015. Google ScholarDigital Library
V. Raychev, M. Vechev, and E. Yahav. Code completion with statistical language models. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '14, pages 419--428. ACM, 2014. Google ScholarDigital Library
R. Strom and S. Yemini. Typestate: A programming language concept for enhancing software reliability. IEEE Transactions on Software Engineering, 12(1):157--171, Jan. 1986. Google ScholarDigital Library
P. Urzyczyn. Inhabitation in typed lambda-calculi (a syntactic approach). In 3rd International Conference on Typed Lambda Calculi and Applications, pages 373--389. Springer, 1997. Google ScholarDigital Library
Y. Wei, N. Chandrasekaran, S. Gulwani, and Y. Hamadi. Building Bing Developer Assistant. http://research.microsoft.com/apps/pubs/default.aspx?id=245188, 2015.Google Scholar

Recommendations

SWIM: fostering social network based information search
CHI EA '04: CHI '04 Extended Abstracts on Human Factors in Computing Systems

Compare to searching online information directly, asking friends or finding referral to a human expert is preferred in many information-gathering tasks. It's easier to judge the quality of the information from a personal referral as well as to obtain ...
Read More
A study of results overlap and uniqueness among major web search engines

The performance and capabilities of Web search engines is an important and significant area of research. Millions of people world wide use Web search engines very day. This paper reports the results of a major study examining the overlap among results ...
Read More
Re-ranking search results using query logs
CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge management

This work addresses two common problems in search, frequently occurring with underspecified user queries: the top-ranked results for such queries may not contain documents relevant to the user's search intent, and fresh and relevant pages may not get ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICSE '16: Proceedings of the 38th International Conference on Software Engineering
May 2016
1235 pages
ISBN:9781450339001
DOI:10.1145/2884781
General Chair:
Laura Dillon
Michigan State University
,
Program Chairs:
Willem Visser
Stellenbosch University, South Africa
,
Laurie Williams
North Carolina State University
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 May 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate276of1,856submissions,15%

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 115
  Total Citations
  View Citations
- 673
  Total Downloads
- Downloads (Last 12 months)30
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

SWIM: synthesizing what i mean: code search and idiomatic snippet synthesis

ICSE '16: Proceedings of the 38th International Conference on Software Engineering

ABSTRACT

References

Cited By

Recommendations

SWIM: fostering social network based information search

A study of results overlap and uniqueness among major web search engines

Re-ranking search results using query logs