skip to main content
10.1145/2597073.2597087acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
Article

Thesaurus-based automatic query expansion for interface-driven code search

Published:31 May 2014Publication History

ABSTRACT

Software engineers often resort to code search practices to support software maintenance and evolution tasks, in particular code reuse. An issue that affects code search is the vocabulary mismatch problem: while searching for a particular function, users have to guess the exact words that were chosen by original developers to name code entities. In this paper we present an automatic query expansion (AQE) approach that uses word relations to increase the chances of finding relevant code. The approach is applied on top of Test-Driven Code Search (TDCS), a promising code retrieval technique that uses test cases as inputs to formulate the search query, but can also be used with other techniques that handle interface definitions to produce queries (interface-driven code search). Since these techniques rely on keywords and types, the vocabulary mismatch problem is also relevant. AQE is carried out by leveraging WordNet, a type thesaurus for expanding types, and another thesaurus containing only software-related word relations. Our approach is general but was specifically designed for non-native English speakers, who are frequently unaware of the most common terms used to name functions in software. Our evaluation with 36 non-native subjects - including developers and senior Computer Science students - provides evidence that our approach can improve the chances of finding relevant functions by 41% (recall improvement of 30%, on average), without hurting precision.

References

  1. E. Arisholm and D. I. K. Sjøberg. A controlled experiment with professionals to evaluate the effect of a delegated versus centralized control style on the maintainability of object-oriented software. Technical Report 6, Simula Research Laboratory, June 2003.Google ScholarGoogle Scholar
  2. S. K. Bajracharya and C. V. Lopes. Analyzing and mining a code search engine usage log. Empirical Softw. Engg., 17(4-5):424–466, Aug. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. K. Bajracharya, J. Ossher, and C. V. Lopes. Leveraging usage similarity for effective retrieval of examples in code repositories. In Proc. of the FSE 2010, pages 157–166, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. V. R. Basili, F. Shull, and F. Lanubile. Building knowledge through families of experiments. IEEE Trans. Softw. Eng., 25:456–473, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. L. Briand, P. Devanbu, and W. L. Melo. An investigation into coupling measures for C++. pages 412–421. ACM, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. R. Burrows, F. C. Ferrari, O. A. L. Lemos, A. Garcia, and F. Taiani. The impact of coupling on the fault-proneness of aspect-oriented programs: An empirical study. In Proc. of the ISSRE 2010, pages 329–338, Washington, DC, USA, 2010. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Carpineto and G. Romano. A survey of automatic query expansion in information retrieval. ACM Comput. Surv., 44(1):1:1–1:50, Jan. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. Fraser and A. Arcuri. Sound empirical evidence in software testing. In Proc. of the ICSE 2012, pages 178–188, Piscataway, NJ, USA, 2012. IEEE Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. G. Gay, S. Haiduc, A. Marcus, and T. Menzies. On the use of relevance feedback in ir-based concept location. In Proc. of the ICSM 2009, pages 351–360. IEEE, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  10. S. Gupta, S. Malik, L. Pollock, and K. Vijay-Shanker. Part-of-speech tagging of program identifiers for improved text-based software engineering tools. In Proc. of the ICPC 2013, pages 3–12, May 2013.Google ScholarGoogle ScholarCross RefCross Ref
  11. S. Haiduc, G. Bavota, A. Marcus, R. Oliveto, A. De Lucia, and T. Menzies. Automatic query reformulations for text retrieval in software engineering. In Proc. of the ICSE 2013, pages 842–851, Piscataway, NJ, USA, 2013. IEEE Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. Hoffmann, J. Fogarty, and D. S. Weld. Assieme: finding and leveraging implicit references in a web search interface for programmers. In Proc. of the UIST ’07, pages 13–22, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. Holmes and G. C. Murphy. Using structural context to recommend source code examples. In Proc. of the ICSE 2005, pages 117–125, New York, NY, USA, 2005. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. J. Howard, S. Gupta, L. Pollock, and K. Vijay-Shanker. Automatically mining software-based, semantically-similar words from comment-code mappings. In Proc. of the MSR 2013, pages 377–386, Piscataway, NJ, USA, 2013. IEEE Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. O. Hummel and W. Janjic. Test-driven reuse: Key to improving precision of search engines for software reuse. In S. E. Sim and R. E. Gallardo-Valencia, editors, Finding Source Code on the Web for Remix and Reuse, pages 227–250. Springer New York, 2013.Google ScholarGoogle Scholar
  16. O. Hummel, W. Janjic, and C. Atkinson. Code conjurer: Pulling reusable software out of thin air. IEEE Softw., 25:45–52, September 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. B. A. Kitchenham, S. L. Pfleeger, L. M. Pickard, P. W. Jones, D. C. Hoaglin, K. E. Emam, and J. Rosenberg. Preliminary guidelines for empirical research in software engineering. IEEE Trans. Softw. Eng., 28:721–734, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. O. Laitenberger and J.-M. DeBaud. Perspective-based reading of code documents at Robert Bosch GmbH. Information and Software Technology, 39(11):781–791, 1997.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. O. A. L. Lemos, S. Bajracharya, J. Ossher, P. C. Masiero, and C. Lopes. A test-driven approach to code search and its application to the reuse of auxiliary functionality. Inf. Softw. Technol., 53:294–306, April 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. O. A. L. Lemos, S. K. Bajracharya, and J. Ossher. Codegenie: a tool for test-driven source code search. In Companion to the 22nd ACM SIGPLAN OOPSLA, pages 917–918, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. O. A. L. Lemos, A. C. de Paula, G. Konishi, J. Ossher, S. Bajracharya, and C. Lopes. Using thesaurus-based tag clouds to improve test-driven code search. In Proc. of the SBCARS 2013, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. O. A. L. Lemos, F. C. Ferrari, F. F. Silveira, and A. Garcia. Development of auxiliary functions: should you be agile? an empirical assessment of pair programming and test-first programming. In Proc. of the ICSE 2012, pages 529–539, Piscataway, NJ, USA, 2012. IEEE Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Li, R. Conradi, C. Bunse, M. Torchiano, O. P. N. Slyngstad, and M. Morisio. Development with off-the-shelf components: 10 facts. IEEE Softw., 26(2):80–87, Mar. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. E. Linstead, S. Bajracharya, T. Ngo, P. Rigor, C. Lopes, and P. Baldi. Sourcerer: mining and searching internet-scale software repositories. Data Mining and Knowledge Discovery, 18:300–336, 2009. 10.1007/s10618-008-0118-x. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. D. Mandelin, L. Xu, R. Bod´ık, and D. Kimelman. Jungloid mining: helping to navigate the api jungle. In Proc. of the PLDI 2005, pages 48–61, New York, NY, USA, 2005. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. C. D. Manning, P. Raghavan, and H. Schtze. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. G. A. Miller. Wordnet: a lexical database for english. Commun. ACM, 38(11):39–41, Nov. 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. P. Mohagheghi, R. Conradi, O. M. Killi, and H. Schwarz. An empirical study of software reuse vs. defect-density and stability. In Proc. of the ICSE 2004, pages 282–292. IEEE, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. D. C. Montgomery. Design and Analysis of Experiments. John Wiley & Sons, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. H. Ossher and P. Tarr. Hyper/j: multi-dimensional separation of concerns for java. In Proc. of the ICSE 2000, pages 734–737, New York, NY, USA, 2000. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. A. Podgurski and L. Pierce. Retrieving reusable software by sampling behavior. ACM Trans. Softw. Eng. Methodol., 2(3):286–303, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. D. Poshyvanyk, A. Marcus, and Y. Dong. JIRiSS - an eclipse plug-in for source code exploration. In Proc. of the ICPC 2006, pages 252–255, Washington, DC, USA, 2006. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. S. P. Reiss. Semantics-based code search. In Proc. of the ICSE 2009, pages 243–253, Washington, DC, USA, 2009. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. N. Sahavechaphan and K. Claypool. Xsnippet: mining for sample code. In Proc. of the OOPSLA 2006, pages 413–430, New York, NY, USA, 2006. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. D. Shepherd, Z. P. Fry, E. Hill, L. Pollock, and K. Vijay-Shanker. Using natural language program analysis to locate and understand action-oriented concerns. In Proc. of the AOSD 2007, pages 212–224, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. F. Shull, J. Singer, and D. I. Sjøberg. Guide to Advanced Empirical Software Engineering. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. R. Sindhgatta. Using an information retrieval system to retrieve source code samples. In L. J. Osterweil, H. D. Rombach, and M. L. Soffa, editors, Proc. of the ICSE 2006, pages 905–908. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. J. Singer, T. Lethbridge, N. Vinson, and N. Anquetil. An examination of software engineering work practices. In Proc. of the 1997 conference of the Centre for Advanced Studies on Collaborative research, CASCON ’97, pages 21–. IBM Press, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. B. Sisman and A. C. Kak. Assisting code search with automatic query reformulation for bug localization. In Proc. of the MSR 2013, pages 309–318, Piscataway, NJ, USA, 2013. IEEE Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. M. Sojer and J. Henkel. License risks from ad hoc reuse of code from the internet. Commun. ACM, 54(12):74–81, Dec. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. D. Spinellis and C. Szyperski. Guest editors’ introduction: How is open source affecting software development? IEEE Softw., 21(1):28–33, Jan. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. G. Sridhara, E. Hill, L. Pollock, and K. Vijay-Shanker. Identifying word relations in software: A comparative study of semantic similarity tools. In Proc. of the ICPC 2008, pages 123–132, Washington, DC, USA, 2008. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. S. Thummalapenta and T. Xie. Parseweb: a programmer assistant for reusing open source code on the web. In Proc. of the ASE 2007, pages 204–213, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. C. Wohlin et al. Experimentation in Software Engineering: an Introduction. Kluwer, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. J. Xu and W. B. Croft. Query expansion using local and global document analysis. In Proc. of the SIGIR 1996, pages 4–11, New York, NY, USA, 1996. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. J. Yang and L. Tan. Inferring semantically related words from software context. In Prof. of the MSR 2012, pages 161–170, Zurich, 2012. IEEE.Google ScholarGoogle Scholar
  47. M. Zhao, C. Wohlin, N. Ohlsson, and M. Xie. A comparison between software design and code metrics for the prediction of software fault content. Inf. and Soft. Technology, 40(14):801–809, 1998.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Thesaurus-based automatic query expansion for interface-driven code search

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        MSR 2014: Proceedings of the 11th Working Conference on Mining Software Repositories
        May 2014
        427 pages
        ISBN:9781450328630
        DOI:10.1145/2597073
        • General Chair:
        • Premkumar Devanbu,
        • Program Chairs:
        • Sung Kim,
        • Martin Pinzger

        Copyright © 2014 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 31 May 2014

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Upcoming Conference

        ICSE 2025

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader