Abstract
Engineers in large-scale software development have to manage large amounts of information, spread across many artifacts. Several researchers have proposed expressing retrieval of trace links among artifacts, i.e. trace recovery, as an Information Retrieval (IR) problem. The objective of this study is to produce a map of work on IR-based trace recovery, with a particular focus on previous evaluations and strength of evidence. We conducted a systematic mapping of IR-based trace recovery. Of the 79 publications classified, a majority applied algebraic IR models. While a set of studies on students indicate that IR-based trace recovery tools support certain work tasks, most previous studies do not go beyond reporting precision and recall of candidate trace links from evaluations using datasets containing less than 500 artifacts. Our review identified a need of industrial case studies. Furthermore, we conclude that the overall quality of reporting should be improved regarding both context and tool details, measures reported, and use of IR terminology. Finally, based on our empirical findings, we present suggestions on how to advance research on IR-based trace recovery.
Similar content being viewed by others
Notes
We use an asterisk (‘*’) to distinguish primary publications in the systematic mapping from general references.
The gold standard was not considered the end goal of our study, but was the target during the iterative development of the search string described next.
References
Abadi A, Nisenson M, Simionovici Y (2008*) A traceability technique for specifications. In: Proceedings of the 16th international conference on program comprehension, pp 103–112
Aitchison J, Bawden D, Gilchrist A (2000) Thesaurus construction and use: a practical manual, 4th edn. Routledge
Ali N, Guéhéneuc Y, Antoniol G (2011*a) Requirements traceability for object oriented systems by partitioning source code. In: Proceedings of the 18th working conference on reverse engineering, pp 45–54
Ali N, Guéhéneuc Y, Antoniol G (2011*b) Trust-Based requirements traceability. In: Proceedings of the 19th international conference on program comprehension, pp 111–120
Ali N, Guéhéneuc Y, Antoniol G (2012) Factors impacting the inputs of traceability recovery approaches. In: Cleland-Huang J, Gotel O, Zisman A (eds) Software and systems traceability, Springer
Antoniol G, Potrich A, Tonella P, Fiutem R (1999) Evolving object oriented design to improve code traceability. In: Proceedings of the 7th international workshop on program comprehension, pp 151–160
Antoniol G, Canfora G, De Lucia A, Merlo E (1999*) Recovering code to documentation links in OO systems. In: Proceedings of the 6th working conference on reverse engineering, pp 136–144
Antoniol G, Canfora G, Casazza G, De Lucia A (2000) Information retrieval models for recovering traceability links between code and documentation. In: Conference on software maintenance, pp 40–49
Antoniol G, Canfora G, Casazza G, De Lucia A, Merlo E (2000*) Tracing object-oriented code into functional requirements. In: Proceedings of the 8th international workshop on program comprehension, pp 79–86
Antoniol G, Canfora G, Casazza G, De Lucia A, Merlo E (2002*) Recovering traceability links between code and documentation. In: Transactions on software engineering, vol 28, pp 970–983
Assawamekin N, Sunetnanta T, Pluempitiwiriyawej C (2010) Ontology-based multiperspective requirements traceability framework. Knowl Inf Syst 25(3):493–522
Asuncion H, Asuncion A, Taylor R (2010*) Software traceability with topic modeling. In: Proceedings of the international conference on software engineering, pp 95–104
Ayari K, Meshkinfam P, Antoniol G, Di Penta M (2007) Threats on building models from CVS and bugzilla repositories: the mozilla case study. In: Proceedings of the conference of the center for advanced studies on collaborative research, pp 215–228
Bacchelli A, Lanza M, Robbes R (2010) Linking e-mails and source code artifacts. In: Proceedings of the 32nd international conference on software engineering, pp 375–384
Baeza-Yates R, Ribeiro-Neto B (2011) Modern information retrieval: the concepts and technology behind search. Addison-Wesley
Banko M, Brill E (2001) Scaling to very very large corpora for natural language disambiguation. In: Proceedings of the 39th annual meeting on association for computational linguistics, pp 26–33
Ben Charrada E, Caspar D, Jeanneret C, Glinz M (2011*) Towards a benchmark for traceability. In: Proceedings of the 12th international workshop on principles on Software evolution, pp 21–30
Bianchi A, Fasolino A, Visaggio G (2000) An exploratory case study of the maintenance effectiveness of traceability models. In: Proceedings of the 8th international workshop on program comprehension, pp 149–158
Binkley D, Lawrie D (2010) Information retrieval applications in software maintenance and evolution. In: Marciniak J (ed) Encyclopedia of software engineering, 2nd edn, Taylor & Francis
Blei D, Lafferty J (2007) A correlated topic model of science. Ann Appl Stat 1(1):17–35
Blei D, Ng A, Jordan M (2003) Latent dirichlet allocation. J Mach Learn Res 3(4–5):993–1022
Borg M, Pfahl D (2011*) Do better IR tools improve the accuracy of engineers’ traceability recovery? In: Proceedings of the international workshop on machine learning technologies in software engineering, pp 27–34
Borg M, Runeson P, Brodén L (2012a) Evaluation of traceability recovery in context: a taxonomy for information retrieval tools. In: Proceedings of the 16th international conference on evaluation & assessment in software engineering
Borg M, Wnuk K, Pfahl D (2012b) Industrial comparability of student artifacts in traceability recovery research - an exploratory survey. In: Proceedings of the 16th european conference on software maintenance and reengineering
Borillo M, Borillo A, Castell N, Latour D, Toussaint Y, Felisa Verdejo M (1992) Applying linguistic engineering to spatial software engineering: the traceability problem. In: Proceedings of the 10th european conference on artificial intelligence, pp 593–595
Bras M, Toussaint Y (1993) Artificial intelligence tools for software engineering: Processing natural language requirements. In: Applications of artificial intelligence in engineering, pp 275–290
Brereton P, Kitchenham B, Budgen D, Turner M, Khalil M (2007) Lessons from applying the systematic literature review process within the software engineering domain. J Syst Software 80(4):571–583
Canfora G, Cerulo L (2006*) Fine grained indexing of software repositories to support impact analysis. In: Proceedings of the international workshop on mining software repositories, pp 105–111
Capobianco G, De Lucia A, Oliveto R, Panichella A, Panichella S (2009*a) On the role of the nouns in IR-based traceability recovery. In: Proceedings of the 17th international conference on program comprehension, pp 148–157
Capobianco G, De Lucia A, Oliveto R, Panichella A, Panichella S (2009*b) Traceability recovery using numerical analysis. In: Proceedings of the 16th working conference on reverse engineering, pp 195–204
Carnegie Mellon Software Engineering Institute (2010) CMMI for development, version 1.3
Castell N, Slavkova O, Toussaint Y, Tuells A (1994) Quality control of software specifications written in natural language. In: Proceedings of the 7th international conference on industrial and engineering applications of artificial intelligence and expert systems, pp 37–44
Chang J, Blei D (2010) Hierarchical relational models for document networks. Ann Appl Stat 4(1):124–150
Charikar M, Chekuri C, Feder T, Motwani R (1997) Incremental clustering and dynamic information retrieval. In: Proceedings of the 29th annual ACM symposium on theory of computing, pp 626–635
Chen X (2010*) Extraction and visualization of traceability relationships between documents and source code. In: Proceedings of the international conference on automated software engineering, pp 505–509
Chen X, Grundy J (2011*) Improving automated documentation to code traceability by combining retrieval techniques. In: Proceedings of the 26th international conference on automated software engineering, pp 223–232
Chen X, Hosking J, Grundy J (2011*) A combination approach for enhancing automated traceability. In: Proceeding of the 33rd international conference on software engineering, (NIER track), pp 912–915
Cleland-Huang J, Chang CK, Christensen M (2003) Event-based traceability for managing evolutionary change. Trans Software Eng 29(9):796–810
Cleland-Huang J, Settimi R, Duan C, Zou XC (2005*) Utilizing supporting evidence to improve dynamic requirements traceability. In: Proceedings of the 13th international conference on requirements engineering, pp 135–144
Cleland-Huang J, Huffman Hayes J, Dekhtyar A (2006) Center of excellence for traceability: problem statement and grand challenges in traceability (v0.1). Technical Report COET-GCT-06-01-0.9
Cleland-Huang J, Settimi R, Romanova E, Berenbach B, Clark S (2007*) Best practices for automated traceability. Computer 40(6):27–35
Cleland-Huang J, Marrero W, Berenbach B (2008) Goal-Centric traceability: Using virtual plumblines to maintain critical systemic qualities. Trans Software Eng 34(5):685–699
Cleland-Huang J, Czauderna A, Gibiec M, Emenecker J (2010*) A machine learning approach for tracing regulatory codes to product specific requirements. In: Proceedings international conference on software engineering, pp 155–164
Cleland-Huang J, Czauderna A, Dekhtyar A, Gotel O, Huffman Hayes J, Keenan E, Maletic J, Poshyvanyk D, Shin Y, Zisman A, Antoniol G, Berenbach B, Egyed A, Maeder P (2011) Grand challenges, benchmarks, and TraceLab: developing infrastructure for the software traceability research community. In: Proceedings of the 6th international workshop on traceability in emerging forms of software engineering
Cleland-Huang J, Gotel O, Zisman A (eds) (2012) Software and systems traceability. Springer
Cleverdon C (1991) The significance of the cranfield tests on index languages. In: Proceedings of the 14th annual international SIGIR conference on research and development in information retrieval, pp 3–12
Croft B, Turtle H, Lewis D (1991) The use of phrases and structured queries in information retrieval. In: Proceedings of the 14th annual international ACM SIGIR conference on research and development in information retrieval, pp 32–45
Cuddeback D, Dekhtyar A, Huffman Hayes J (2010*) Automated requirements traceability: the study of human analysts. In: Proceedings of the 18th international requirements engineering conference, pp 231–240
Czauderna A, Gibiec M, Leach G, Li Y, Shin Y, Keenan E, Cleland-Huang J (2011*) Traceability challenge 2011: using TraceLab to evaluate the impact of local versus global idf on trace retrieval. In: Proceeding of the 6th international workshop on traceability in emerging forms of software engineering, pp 75–78
De Lucia A, Fasano F, Oliveto R, Tortora G (2004*) Enhancing an artefact management system with traceability recovery features. In: Proceedings of the 20th international conference on software maintenance, pp 306–315
De Lucia A, Fasano F, Oliveto R, Tortora G (2005*) ADAMS re-trace: A traceability recovery tool. In: Proceedings of the 9th European conference on software maintenance and reengineering, pp 32–41
De Lucia A, Di Penta M, Oliveto R, Zurolo F (2006a) COCONUT: COde COmprehension nurturant using traceability. In: Proceedings of the 22nd international conference on software maintenance, pp 274–275
De Lucia A, Di Penta M, Oliveto R, Zurolo F (2006b) Improving comprehensibility of source code via traceability information: A controlled experiment. In: Proceedings of the 14th international conference on program comprehension, pp 317–326
De Lucia A, Fasano F, Oliveto R, Tortora G (2006*a) Can information retrieval techniques effectively support traceability link recovery? In: Proceedings of the 14th international conference on program comprehension, pp 307–316
De Lucia A, Oliveto R, Sgueglia P (2006*b) Incremental approach and user feedbacks: A silver bullet for traceability recovery? In: Proceedings of the international conference on software maintenance, pp 299–308
De Lucia A, Fasano F, Oliveto R, Tortora G (2007*) Recovering traceability links in software artifact management systems using information retrieval methods. Trans Softw Eng Methodol 16(4)
De Lucia A, Fasano F, Oliveto R (2008) Traceability management for impact analysis. In: Frontiers of software maintenance, pp 21–30
De Lucia A, Oliveto R, Tortora G (2008*) IR-based traceability recovery processes: An empirical comparison of “one-shot” and incremental processes. In: Proceedings of the 23rd international conference on automated software engineering, pp 39–48
De Lucia A, Oliveto R, Tortora G (2009*a) Assessing IR-based traceability recovery tools through controlled experiments. Empir Software Eng 14(1):57–92
De Lucia A, Oliveto R, Tortora G (2009*b) The role of the coverage analysis during IR-based traceability recovery: a controlled experiment. In: Proceedings of the 25th international conference on software maintenance, pp 371–380
De Lucia A, Di Penta M, Oliveto R, Panichella A, Panichella S (2011*) Improving IR-based traceability recovery using smoothing filters. In: Proceedings of the 19th international conference on program comprehension, pp 21–30
De Lucia A, Marcus A, Oliveto R, Poshyvanyk D (2012) Information retrieval methods for automated traceability recovery. In: Cleland-Huang J, Gotel O, Zisman A (eds) Software and systems traceability, Springer
Deerwester S, Dumais S, Furnas G, Landauer T, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407
Dekhtyar A, Huffman Hayes J (2006) Good benchmarks are hard to find: Toward the benchmark for information retrieval applications in software engineering. In: Proceedings of the 22th international conference on software maintenance
Dekhtyar A, Huffman Hayes J, Antoniol G (2007) Benchmarks for traceability? In: Proceedings of the international symposium on grand challenges in traceability
Dekhtyar A, Huffman Hayes J, Larsen J (2007*a) Make the most of your time: how should the analyst work with automated traceability tools? In: Proceedings of the 3rd international workshop on predictor models in software engineering
Dekhtyar A, Huffman Hayes J, Sundaram S, Holbrook A, Dekhtyar O (2007*b) Technique integration for requirements assessment. In: Proceedings of the 15th international requirements engineering conference, pp 141–152
Dekhtyar A, Dekhtyar O, Holden J, Huffman Hayes J, Cuddeback D, Kong W (2011*) On human analyst performance in assisted requirements tracing: statistical analysis. In: Proceedings of the 19th international requirements engineering conference, pp 111–120
Di F, Zhang M (2009*) An improving approach for recovering requirements-to-design traceability links. In: Proceedings of the international conference on computational intelligence and software engineering, pp 1–6
Di Penta M, Gradara S, Antoniol G (2002*) Traceability recovery in RAD software systems. In: Proceedings of the 10th international workshop on program comprehension, pp 207–216
Dit B, Revelle M, Gethers M, Poshyvanyk D (2011) Feature location in source code: a taxonomy and survey. J Softw Main Evol (25)1:53–95
Dömges R, Pohl K (1998) Adapting traceability environments to project-specific needs. Commun ACM 41(12):54–62
Duan C, Cleland-Huang J (2007*) Clustering support for automated tracing. In: Proceedings of the international conference on automated software engineering, pp 244–253
Egyed A, Grunbacher P (2002) Automating requirements traceability: beyond the record replay paradigm. In: Proceedings of the 17th international conference on automated software engineering, pp 163–171
Eisenbarth T, Koschke R, Simon D (2003) Locating features in source code. Trans Software Eng 29(3):210– 224
Falessi D, Cantone G, Canfora G (2010) A comprehensive characterization of NLP techniques for identifying equivalent requirements. In: Proceedings of the 4th international symposium on empirical software engineering and measurement
Felizardo KR, Salleh N, Martins RM, Mendes E, MacDonell SG, Maldonado JC (2011) Using visual text mining to support the study selection activity in systematic literature reviews. In: Proceedings of the 5th international symposium on empirical software engineering and measurement, pp 77–86
Fiutem R, Antoniol G (1998) Identifying design-code inconsistencies in object-oriented software: a case study. In: Proceedings of the international conference on software maintenance, pp 94–102
Gay G, Haiduc S, Marcus A, Menzies T (2009) On the use of relevance feedback in IR-based concept location. In: Proceedings of the 25th international conference on software maintenance, pp 351–360
Gethers M, Kagdi H, Dit B, Poshyvanyk D (2011) An adaptive approach to impact analysis from change requests to source code. In: Proceedings of the 26th international conference on automated software engineering, pp 540–543
Gethers M, Oliveto R, Poshyvanyk D, De Lucia A (2011*) On integrating orthogonal information retrieval methods to improve traceability recovery. In: Proceedings of the 27th international conference on software maintenance, pp 133–142
Gibiec M, Czauderna A, Cleland-Huang J (2010*) Towards mining replacement queries for hard-to-retrieve traces. In: Proceedings of the international conference on automated software engineering, pp 245–254
Gotel O, Finkelstein C (1994) An analysis of the requirements traceability problem. In: Proceedings of the first international conference on requirements engineering, pp 94–101
Gotel O, Cleland-Huang J, Huffman Hayes J, Zisman A, Egyed A, Grünbacher P, Dekhtyar A, Antoniol G, Maletic J (2012) The grand challenge of traceability (v1.0). In: Cleland-Huang J, Gotel O, Zisman A (eds) Software and systems traceability, Springer
Heindl M, Biffl S (2005) A case study on value-based requirements tracing. In: Proceedings of the 10th European software engineering conference held jointly with the 13th SIGSOFT international symposium on foundations of software engineering, pp 60–69
Hofman T (2001) Unsupervised learning by probabilistic latent semantic analysis. Mach Learn 42(1–2):177–196
Huffman Hayes J, Dekhtyar A (2005a) A framework for comparing requirements tracing experiments. Int J Softw Eng Knowl Eng 15(5):751–781
Huffman Hayes J, Dekhtyar A (2005b) Humans in the traceability loop: can’t live with ’em, can’t live without ’em. In: Proceedings of the 3rd international workshop on traceability in emerging forms of software engineering, pp 20–23
Huffman Hayes J, Dekhtyar A, Osborne J (2003*) Improving requirements tracing via information retrieval. In: Proceedings of the 11th international requirements engineering conference, pp 138–147
Huffman Hayes J, Dekhtyar A, Sundaram S, Howard S (2004*) Helping analysts trace requirements: An objective look. In: Proceedings of the 12th international conference on requirements engineering, pp 249–259
Huffman Hayes J, Dekhtyar A, Sundaram S (2005*) Text mining for software engineering: how analyst feedback impacts final results. In: Proceedings of the international workshop on mining software repositories, pp 1–5
Huffman Hayes J, Dekhtyar A, Sundaram S (2006*) Advancing candidate link generation for requirements tracing: the study of methods. Trans Softw Eng 32(1):4–19
Huffman Hayes J, Dekhtyar A, Sundaram S, Holbrook A, Vadlamudi S, April A (2007*) REquirements TRacing on target (RETRO): improving software maintenance through traceability recovery. Innov Syst Softw Eng 3(3):193–202
Huffman Hayes J, Antoniol G, Guéhéneuc Y (2008) PREREQIR: recovering Pre-Requirements via cluster analysis. In: Proceedings of the 15th working conference on reverse engineering, pp 165–174
Huffman Hayes J, Sultanov H, Kong W, Li W (2011*) Software verification and validation research laboratory (SVVRL) of the university of kentucky: traceability challenge 2011: language translation. In: Proceeding of the 6th international workshop on traceability in emerging forms of software engineering, ACM, pp 50–53
Ingwersen P, Järvelin K (2005) The turn: integration of information seeking and retrieval in context. Springer
International Electrotechnical Commission (2003) IEC 61511-1 ed 1.0, safety instrumented systems for the process industry sector
International Organization for Standardization (2011) ISO 26262-1:2011 road vehicles – functional safety –
Järvelin K, Kekäläinen J (2000) IR evaluation methods for retrieving highly relevant documents. In: Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval, pp 41–48
Jedlitschka A, Ciolkowski M, Pfahl D (2008) Reporting experiments in software engineering. In: Shull F, Singer J, Sjoberg D (eds) Guide to advanced empirical software engineering, Springer, London, pp 201–228
Jiang H, Nguyen T, Chen I, Jaygarl H, Chang C (2008*) Incremental latent semantic indexing for automatic traceability link evolution management. In: Proceedings of the 23rd international conference on automated software engineering, pp 59–68
Katta V, Stålhane T (2011) A conceptual model of traceability for safety systems. In: Proceedings of the complex systems design & management conference
Kaushik N, Tahvildari L, Moore M (2011*) Reconstructing traceability between bugs and test cases: an experimental study. In: Proceedings of the 18th working conference on reverse engineering, pp 411–414
Kekäläinen J, Järvelin K (2002) Evaluating information retrieval systems under the challenges of interaction and multidimensional dynamic relevance. In: Proceedings of the COLIS 4 conference pp 253–270
Kitchenham B, Charters S (2007) Guidelines for performing systematic literature reviews in software engineering. EBSE Technical Report
Kitchenham B, Pfleeger S, Pickard L, Jones P, Hoaglin D, El Emam K, Rosenberg J (2002) Preliminary guidelines for empirical research in software engineering. Trans Softw Eng Methodol 28(8):721–734
Kitchenham B, Budgen D, Brereton P (2011) Using mapping studies as the basis for further research—a participant-observer case study. Inform Softw Technol 53(6):638–651
Klock S, Gethers M, Dit B, Poshyvanyk D (2011*) Traceclipse: an eclipse plug-in for traceability link recovery and management. In: Proceedings of the 6th international workshop on traceability in emerging forms of software engineering, pp 24–30
Kong L, Li J, Li Y, Yang Y, Wang Q (2009*) A requirement traceability refinement method based on relevance feedback. In: Proceedings of the 21st international conference on software engineering and knowledge engineering
Kong W, Huffman Hayes J (2011*) Proximity-based traceability: an empirical validation using ranked retrieval and set-based measures. In: Proceedings of the 1st international workshop on empirical requirements engineering, pp 45–52
Kong W, Huffman Hayes J, Dekhtyar A, Holden J (2011*) How do we trace requirements: an initial study of analyst behavior in trace validation tasks. In: Proceeding of the 4th international workshop on cooperative and human aspects of software engineering, pp 32–39
Kruchten P (2004) The rational unified process: an introduction. Addison-Wesley Professional
Leuser J (2009*) Challenges for semi-automatic trace recovery in the automotive domain. In: Proceedings of the international workshop on traceability in emerging forms of software engineering, pp 31–35
Leuser J, Ott D (2010*) Tackling semi-automatic trace recovery for large specifications. In: Requirements engineering: foundation for software quality, pp 203–217
Lewis D (1998) Naive (Bayes) at forty: The independence assumption in information retrieval. In: Machine learning: ECML-98, vol 1398, Springer, pp 4–15
Li Y, Li J, Yang Y, Li M (2008*) Requirement-centric traceability for change impact analysis: a case study. In: International conference on software process, pp 100–111
Liddy E (2001) Natural language processing, 2nd edn. Encyclopedia of Library and Information Science, Marcel Decker
Lin J, Chan L, Cleland-Huang J, Settimi R, Amaya J, Bedford G, Berenbach B, Khadra OB, Chuan D, Zou X (2006) Poirot: A distributed tool supporting Enterprise-Wide automated traceability. In: Proceedings of the 14th international conference on requirements engineering, pp 363–364
Lindvall M, Feldmann R, Karabatis G, Chen Z, Janeja V (2009) Searching for relevant software change artifacts using semantic networks. In: Proceedings of the symposium on applied computing, pp 496–500
Lormans M, van Deursen A (2006*) Can LSI help reconstructing requirements traceability in design and test? In: Proceedings of the 10th European conference on software maintenance and reengineering, pp 45–54
Lormans M, Gross H, van Deursen A, van Solingen R, Stehouwer A (2006*) Monitoring requirements coverage using reconstructed views: An industrial case study. In: Procedings of the 13th working conference on reverse engineering, pp 275–284
Lormans M, Van Deursen A, Gross H (2008*) An industrial case study in reconstructing requirements views. Empir Software Eng 13(6):727–760
Mahmoud A, Niu N (2010*) Using semantics-enabled information retrieval in requirements tracing: An ongoing experimental investigation. In: Proceedings of the international computer software and applications conference, pp 246–247
Mahmoud A, Niu N (2011*) Source code indexing for automated tracing. In: Proceeding of the 6th international workshop on traceability in emerging forms of software engineering, pp 3–9
Manning C, Raghavan P, Schutze H (2008) Introduction to information retrieval. Cambridge University Press
Marcus A, Maletic J (2003) Recovering documentation-to-source-code traceability links using latent semantic indexing. In: Proceedings of the 25th international conference on software engineering, pp 125–135
Marcus A, Sergeyev A, Rajlich V, Maletic JI (2004) An information retrieval approach to concept location in source code. In: Proceedings of the 11th working conference on reverse engineering, pp 214–223
Marcus A, Maletic J, Sergeyev A (2005*) Recovery of traceability links between software documentation and source code. Int J Softw Eng Knowl Eng 15(5):811–836
Maron M, Kuhns J (1960) On relevance, probabilistic indexing and information retrieval. J ACM 7(3):216–244
McMillan C, Poshyvanyk D, Revelle M (2009*) Combining textual and structural analysis of software artifacts for traceability link recovery. In: Proceedings of the international workshop on traceability in emerging forms of software engineering, pp 41–48
Natt och Dag J, Regnell B, Carlshamre P, Andersson M, Karlsson J (2002*) A feasibility study of automated natural language requirements analysis in market-driven development. Requirements Eng 7(1):20–33
Natt och Dag J, Gervasi V, Brinkkemper S, Regnell B (2004*) Speeding up requirements management in a product software company: linking customer wishes to product requirements through linguistic engineering. In: Proceedings of the 12th international requirements engineering conference, pp 283–294
Natt och Dag J, Thelin T, Regnell B (2006*) An experiment on linguistic tool support for consolidation of requirements from multiple sources in market-driven product development. Empir Software Eng 11(2):303–329
Oliveto R (2008) Traceability management meets information retrieval methods: strengths and limitations. PhD thesis, University of Salerno
Oliveto R, Gethers M, Poshyvanyk D, De Lucia A (2010*) On the equivalence of information retrieval methods for automated traceability link recovery. In: Proceedings of the 18th international conference on program comprehension, pp 68–71
Olsson T (2002) Software information management in requirements and test documentation. Licentiate thesis, Lund University
Park S, Kim H, Ko Y, Seo J (2000*) Implementation of an efficient requirements analysis supporting system using similarity measure techniques. Inform Softw Technol 42(6):429–438
Parvathy AG, Vasudevan BG, Balakrishnan R (2008*) A comparative study of document correlation techniques for traceability analysis. In: Proceedings of the 10th international conference on enterprise information systems, information systems analysis and specification, pp 64–69
Petersen K, Wohlin C (2009) Context in industrial software engineering research. In: Proceedings of the 3rd international symposium on empirical software engineering and measurement, pp 401–404
Petersen K, Feldt R, Mujtaba S, Mattsson M (2008) Systematic mapping studies in software engineering. In: Proceedings of the 12th international conference on evaluation and assessment in software engineering, pp 71–80
Pohl K, Bockle G, van der Linden F (2005) Software product line engineering: foundations, principles, and techniques. Birkhäuser
Ponte J, Croft B (1998) A language modeling approach to information retrieval. In: Proceedings of the 21st annual international SIGIR conference on research and development in information retrieval, pp 275–281
Port D, Nikora A, Hihn J, Huang L (2011*) Experiences with text mining large collections of unstructured systems development artifacts at JPL. In: Proceedings of the 33rd international conference on software engineering, pp 701–710
Randolph J (2005) Free-Marginal multirater kappa (multirater k[free]): an alternative to fleiss’ Fixed-Marginal multirater kappa. In: Joensuu learning and instruction symposium
Robertson S (1977) The probability ranking principle in IR. J Doc 33(4):294–304
Robertson S, Robertson J (1999) Mastering the requirements process. Addison-Wesley Professional
Robertson S, Zaragoza H (2009) The probabilistic relevance framework: BM25 and beyond. Foundations and Trends in Information Retrieval 3(4):333–389
Robertson SE, Jones S (1976) Relevance weighting of search terms. J Am Soc Inform Sci27(3):129–146
Rocchio J (1971) Relevance feedback in information retrieval. In: Salton G (ed) The SMART retrieval system: experiments in automatic document processing. Prentice-Hall, pp 313–323
Runeson P, Alexandersson M, Nyholm O (2007) Detection of duplicate defect reports using natural language processing. In: Proceedings of the 29th international conference on software engineering, pp 499–510
Runeson P, Höst M, Rainer A, Regnell B (2012) Case study research in software engineering. Guidelines and examples. Wiley
Sabaliauskaite G, Loconsole A, Engström E, Unterkalmsteiner M, Regnell B, Runeson P, Gorschek T, Feldt R (2010) Challenges in aligning requirements engineering and verification in a Large-Scale industrial context. In: requirements engineering: foundation for software quality, pp 128–142
Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manage 24(5):513–523
Salton G, Wong A, Yang C (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620
Scacchi W (2002) Understanding the requirements for developing open source software systems. IEEE Software 149(1):24–39
Settimi R, Cleland-Huang J, Ben Khadra O, Mody J, Lukasik W, DePalma C (2004*) Supporting software evolution through dynamically retrieving traces to UML artifacts. In: Proceedings of the 7th international workhop on principles of software evolution, pp 49–54
Shull F, Carver J, Vegas S, Juristo N (2008) The role of replications in empirical software engineering. Empir Software Eng 13(2):211–218
Singhal A (2001) Modern information retrieval: a brief overview. Data Eng Bull 24(2):1–9
Smeaton A, Harman D (1997) The TREC experiments and their impact on europe. J Inf Sci 23(2):169–174
Spanoudakis G, d’Avila-Garcez A, Zisman A (2003) Revising rules to capture requirements traceability relations: A machine learning approach. In: Proceedings of the 15th international conference in software engineering and knowledge engineering
Spanoudakis G, Zisman A, Perez-Minana E, Krause P (2004) Rule-based generation of requirements traceability relations. J Syst Softw 72(2):105–127
Spärck Jones K, Walker S, Robertson SE (2000) A probabilistic model of information retrieval: development and comparative experiments. Inf Process Manage 36(6):779–808
Stone A, Sawyer P (2006) Using pre-requirements tracing to investigate requirements based on tacit knowledge. In: Proceedings of the 1st international conference on software and data technologies, pp 139–144
Sultanov H, Huffman Hayes J (2010*) Application of swarm techniques to requirements engineering: Requirements tracing. In: Proceedings of the 18th international requirements engineering conference, pp 211–220
Sundaram S, Huffman Hayes J, Dekhtyar A (2005*) Baselines in requirements tracing. In: Proceedings of the workshop on predictor models in software engineering, pp 1–6
Sundaram S, Huffman Hayes J, Dekhtyar A, Holbrook A (2010*) Assessing traceability of software engineering artifacts. Requirements Eng 15(3):313–335
Torchiano M, Ricca F (2010) Impact analysis by means of unstructured knowledge in the context of bug repositories. In: Proceedings of the 4th international symposium on empirical software engineering and measurement, pp 47:1–47:4
Turtle H, Croft B (1991) Evaluation of an inference network-based retrieval model. Trans Inf Syst 9(3):187–222
Van Rompaey B, Demeyer S (2009) Establishing traceability links between unit test cases and units under test. In: Proceedings of the 13th European conference on software maintenance and reengineering, pp 209–218
Voorhees E (2005) TREC: Experiment and evaluation in information retrieval. MIT Press
Wang X, Lai G, Liu C (2009*) Recovering relationships between documentation and source code based on the characteristics of software engineering. Electron Notes Theor Comput Sci 243:121–137
Winkler S (2009*) Trace retrieval for evolving artifacts. In: Proceedings of the international workshop on traceability in emerging forms of software engineering, pp 49–56
Winkler S, Pilgrim J (2010) A survey of traceability in requirements engineering and model-driven development. Softw Syst Model 9(4):529–565
Wohlin C, Runeson P, M Höst, Ohlsson M, Regnell B, Wesslén A (2012) Experimentation in software engineering: a practical guide. Springer
Yadla S, Huffman Hayes J, Dekhtyar A (2005*) Tracing requirements to defect reports: an application of information retrieval techniques. Innov Syst Softw Eng 1:116–124
Zhai C (2007) A brief review of information retrieval models. Technical report, University of Illinois at Urbana-Champaign
Zhai C (2008) Statistical language models for information retrieval a critical review. Foundations and Trends Information Retrieval 2(3):137–213
Zhai C, Lafferty J (2001) Model-based feedback in the language modeling approach to information retrieval. In: Proceedings of the 10th international conference on information and knowledge management, pp 403–410
Zhao W, Zhang L, Liu Y, Luo J, Sun JS (2003*) Understanding how the requirements are implemented in source code. In: Proceedings of the 10th Asia-Pacific software engineering conference, pp 68–77
Zhou X, Yu H (2007*) A clustering-based approach for tracing object-oriented design to requirement. In: Proceedings of the 10th international conference on fundamental approaches to software engineering, pp 412–422
Zou X, Settimi R, Cleland-Huang J (2006*) Phrasing in dynamic requirements trace retrieval. In: Proceedings of the 30th international computer software and applications conference, pp 265–272
Zou X, Settimi R, Cleland-Huang J (2008*) Evaluating the use of project glossaries in automated trace retrieval. In: Proceedings of the international conference on software engineering research and practice, pp 157–163
Zou X, Settimi R, Cleland-Huang J (2010*) Improving automated requirements trace retrieval: A study of term-based enhancement methods. Empir Software Eng 15(2):119–146
Acknowledgements
This work was funded by the Industrial Excellence Center EASE – Embedded Applications Software Engineering.Footnote 6 Thanks go to our librarian Mats Berglund for working on the search strings, and Lorand Dali for excellent comments on IR details.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Giulio Antoniol
Appendix: Classification of Primary Publications
Appendix: Classification of Primary Publications
Tables 8 present our classification of the primary publications, sorted by number of citations according to Google Scholar (July 1, 2012). Note that the well-cited works by Marcus and Maletic (2003) (354 citations) and Antoniol et al. (2000) (85 citations) are not listed. Applied IR models are reported in the fourth column. For LSI, the number of dimensions (k) in the reduced term-document space is reported in parenthesis, divided per dataset when possible. The number of dimensions is reported either as a fixed number of dimensions, an interval of dimensions, a dimensionality reduction in percent, or ‘N/A’ when the information is not available. A bold number represents that the best choice, as concluded by the original authors. Regarding LDA, the number of topics (t) is reported. Datasets are classified according to origin: proprietary (Ind), open source (OS), university (Univ), student (Stud), not clearly reported (Unclear), and mixed origin (Mixed). Numbers in parentheses show the number of artifacts studied, i.e. the total number of artifacts in the dataset, ‘N/A’ is used when it is not reported. Unless the full dataset name is presented, the following abbreviations are used: IBS (Ice Breaker System), EBT (Event-Based Traceability), LC (Light Control system), TM (Transient Meter). Evaluation, the rightmost column, maps primary publications to the context taxonomy described in Section 3 (Level 1–4 = retrieval context, seeking context, work task context, project context). Finally, Table 9 shows the distinctly most productive authors and affiliations, based upon our primary publications.
Rights and permissions
About this article
Cite this article
Borg, M., Runeson, P. & Ardö, A. Recovering from a decade: a systematic mapping of information retrieval approaches to software traceability. Empir Software Eng 19, 1565–1616 (2014). https://doi.org/10.1007/s10664-013-9255-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-013-9255-y