Skip to main content
Log in

Recovering from a decade: a systematic mapping of information retrieval approaches to software traceability

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Engineers in large-scale software development have to manage large amounts of information, spread across many artifacts. Several researchers have proposed expressing retrieval of trace links among artifacts, i.e. trace recovery, as an Information Retrieval (IR) problem. The objective of this study is to produce a map of work on IR-based trace recovery, with a particular focus on previous evaluations and strength of evidence. We conducted a systematic mapping of IR-based trace recovery. Of the 79 publications classified, a majority applied algebraic IR models. While a set of studies on students indicate that IR-based trace recovery tools support certain work tasks, most previous studies do not go beyond reporting precision and recall of candidate trace links from evaluations using datasets containing less than 500 artifacts. Our review identified a need of industrial case studies. Furthermore, we conclude that the overall quality of reporting should be improved regarding both context and tool details, measures reported, and use of IR terminology. Finally, based on our empirical findings, we present suggestions on how to advance research on IR-based trace recovery.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. We use an asterisk (‘*’) to distinguish primary publications in the systematic mapping from general references.

  2. www.coest.org.

  3. www.zotero.org.

  4. The gold standard was not considered the end goal of our study, but was the target during the iterative development of the search string described next.

  5. coest.org.

  6. http://ease.cs.lth.se.

References

  • Abadi A, Nisenson M, Simionovici Y (2008*) A traceability technique for specifications. In: Proceedings of the 16th international conference on program comprehension, pp 103–112

  • Aitchison J, Bawden D, Gilchrist A (2000) Thesaurus construction and use: a practical manual, 4th edn. Routledge

  • Ali N, Guéhéneuc Y, Antoniol G (2011*a) Requirements traceability for object oriented systems by partitioning source code. In: Proceedings of the 18th working conference on reverse engineering, pp 45–54

  • Ali N, Guéhéneuc Y, Antoniol G (2011*b) Trust-Based requirements traceability. In: Proceedings of the 19th international conference on program comprehension, pp 111–120

  • Ali N, Guéhéneuc Y, Antoniol G (2012) Factors impacting the inputs of traceability recovery approaches. In: Cleland-Huang J, Gotel O, Zisman A (eds) Software and systems traceability, Springer

  • Antoniol G, Potrich A, Tonella P, Fiutem R (1999) Evolving object oriented design to improve code traceability. In: Proceedings of the 7th international workshop on program comprehension, pp 151–160

  • Antoniol G, Canfora G, De Lucia A, Merlo E (1999*) Recovering code to documentation links in OO systems. In: Proceedings of the 6th working conference on reverse engineering, pp 136–144

  • Antoniol G, Canfora G, Casazza G, De Lucia A (2000) Information retrieval models for recovering traceability links between code and documentation. In: Conference on software maintenance, pp 40–49

  • Antoniol G, Canfora G, Casazza G, De Lucia A, Merlo E (2000*) Tracing object-oriented code into functional requirements. In: Proceedings of the 8th international workshop on program comprehension, pp 79–86

  • Antoniol G, Canfora G, Casazza G, De Lucia A, Merlo E (2002*) Recovering traceability links between code and documentation. In: Transactions on software engineering, vol 28, pp 970–983

  • Assawamekin N, Sunetnanta T, Pluempitiwiriyawej C (2010) Ontology-based multiperspective requirements traceability framework. Knowl Inf Syst 25(3):493–522

    Article  Google Scholar 

  • Asuncion H, Asuncion A, Taylor R (2010*) Software traceability with topic modeling. In: Proceedings of the international conference on software engineering, pp 95–104

  • Ayari K, Meshkinfam P, Antoniol G, Di Penta M (2007) Threats on building models from CVS and bugzilla repositories: the mozilla case study. In: Proceedings of the conference of the center for advanced studies on collaborative research, pp 215–228

  • Bacchelli A, Lanza M, Robbes R (2010) Linking e-mails and source code artifacts. In: Proceedings of the 32nd international conference on software engineering, pp 375–384

  • Baeza-Yates R, Ribeiro-Neto B (2011) Modern information retrieval: the concepts and technology behind search. Addison-Wesley

  • Banko M, Brill E (2001) Scaling to very very large corpora for natural language disambiguation. In: Proceedings of the 39th annual meeting on association for computational linguistics, pp 26–33

  • Ben Charrada E, Caspar D, Jeanneret C, Glinz M (2011*) Towards a benchmark for traceability. In: Proceedings of the 12th international workshop on principles on Software evolution, pp 21–30

  • Bianchi A, Fasolino A, Visaggio G (2000) An exploratory case study of the maintenance effectiveness of traceability models. In: Proceedings of the 8th international workshop on program comprehension, pp 149–158

  • Binkley D, Lawrie D (2010) Information retrieval applications in software maintenance and evolution. In: Marciniak J (ed) Encyclopedia of software engineering, 2nd edn, Taylor & Francis

  • Blei D, Lafferty J (2007) A correlated topic model of science. Ann Appl Stat 1(1):17–35

    Article  MathSciNet  MATH  Google Scholar 

  • Blei D, Ng A, Jordan M (2003) Latent dirichlet allocation. J Mach Learn Res 3(4–5):993–1022

    MATH  Google Scholar 

  • Borg M, Pfahl D (2011*) Do better IR tools improve the accuracy of engineers’ traceability recovery? In: Proceedings of the international workshop on machine learning technologies in software engineering, pp 27–34

  • Borg M, Runeson P, Brodén L (2012a) Evaluation of traceability recovery in context: a taxonomy for information retrieval tools. In: Proceedings of the 16th international conference on evaluation & assessment in software engineering

  • Borg M, Wnuk K, Pfahl D (2012b) Industrial comparability of student artifacts in traceability recovery research - an exploratory survey. In: Proceedings of the 16th european conference on software maintenance and reengineering

  • Borillo M, Borillo A, Castell N, Latour D, Toussaint Y, Felisa Verdejo M (1992) Applying linguistic engineering to spatial software engineering: the traceability problem. In: Proceedings of the 10th european conference on artificial intelligence, pp 593–595

  • Bras M, Toussaint Y (1993) Artificial intelligence tools for software engineering: Processing natural language requirements. In: Applications of artificial intelligence in engineering, pp 275–290

  • Brereton P, Kitchenham B, Budgen D, Turner M, Khalil M (2007) Lessons from applying the systematic literature review process within the software engineering domain. J Syst Software 80(4):571–583

    Article  Google Scholar 

  • Canfora G, Cerulo L (2006*) Fine grained indexing of software repositories to support impact analysis. In: Proceedings of the international workshop on mining software repositories, pp 105–111

  • Capobianco G, De Lucia A, Oliveto R, Panichella A, Panichella S (2009*a) On the role of the nouns in IR-based traceability recovery. In: Proceedings of the 17th international conference on program comprehension, pp 148–157

  • Capobianco G, De Lucia A, Oliveto R, Panichella A, Panichella S (2009*b) Traceability recovery using numerical analysis. In: Proceedings of the 16th working conference on reverse engineering, pp 195–204

  • Carnegie Mellon Software Engineering Institute (2010) CMMI for development, version 1.3

  • Castell N, Slavkova O, Toussaint Y, Tuells A (1994) Quality control of software specifications written in natural language. In: Proceedings of the 7th international conference on industrial and engineering applications of artificial intelligence and expert systems, pp 37–44

  • Chang J, Blei D (2010) Hierarchical relational models for document networks. Ann Appl Stat 4(1):124–150

    Article  MathSciNet  MATH  Google Scholar 

  • Charikar M, Chekuri C, Feder T, Motwani R (1997) Incremental clustering and dynamic information retrieval. In: Proceedings of the 29th annual ACM symposium on theory of computing, pp 626–635

  • Chen X (2010*) Extraction and visualization of traceability relationships between documents and source code. In: Proceedings of the international conference on automated software engineering, pp 505–509

  • Chen X, Grundy J (2011*) Improving automated documentation to code traceability by combining retrieval techniques. In: Proceedings of the 26th international conference on automated software engineering, pp 223–232

  • Chen X, Hosking J, Grundy J (2011*) A combination approach for enhancing automated traceability. In: Proceeding of the 33rd international conference on software engineering, (NIER track), pp 912–915

  • Cleland-Huang J, Chang CK, Christensen M (2003) Event-based traceability for managing evolutionary change. Trans Software Eng 29(9):796–810

    Article  Google Scholar 

  • Cleland-Huang J, Settimi R, Duan C, Zou XC (2005*) Utilizing supporting evidence to improve dynamic requirements traceability. In: Proceedings of the 13th international conference on requirements engineering, pp 135–144

  • Cleland-Huang J, Huffman Hayes J, Dekhtyar A (2006) Center of excellence for traceability: problem statement and grand challenges in traceability (v0.1). Technical Report COET-GCT-06-01-0.9

  • Cleland-Huang J, Settimi R, Romanova E, Berenbach B, Clark S (2007*) Best practices for automated traceability. Computer 40(6):27–35

    Article  Google Scholar 

  • Cleland-Huang J, Marrero W, Berenbach B (2008) Goal-Centric traceability: Using virtual plumblines to maintain critical systemic qualities. Trans Software Eng 34(5):685–699

    Article  Google Scholar 

  • Cleland-Huang J, Czauderna A, Gibiec M, Emenecker J (2010*) A machine learning approach for tracing regulatory codes to product specific requirements. In: Proceedings international conference on software engineering, pp 155–164

  • Cleland-Huang J, Czauderna A, Dekhtyar A, Gotel O, Huffman Hayes J, Keenan E, Maletic J, Poshyvanyk D, Shin Y, Zisman A, Antoniol G, Berenbach B, Egyed A, Maeder P (2011) Grand challenges, benchmarks, and TraceLab: developing infrastructure for the software traceability research community. In: Proceedings of the 6th international workshop on traceability in emerging forms of software engineering

  • Cleland-Huang J, Gotel O, Zisman A (eds) (2012) Software and systems traceability. Springer

  • Cleverdon C (1991) The significance of the cranfield tests on index languages. In: Proceedings of the 14th annual international SIGIR conference on research and development in information retrieval, pp 3–12

  • Croft B, Turtle H, Lewis D (1991) The use of phrases and structured queries in information retrieval. In: Proceedings of the 14th annual international ACM SIGIR conference on research and development in information retrieval, pp 32–45

  • Cuddeback D, Dekhtyar A, Huffman Hayes J (2010*) Automated requirements traceability: the study of human analysts. In: Proceedings of the 18th international requirements engineering conference, pp 231–240

  • Czauderna A, Gibiec M, Leach G, Li Y, Shin Y, Keenan E, Cleland-Huang J (2011*) Traceability challenge 2011: using TraceLab to evaluate the impact of local versus global idf on trace retrieval. In: Proceeding of the 6th international workshop on traceability in emerging forms of software engineering, pp 75–78

  • De Lucia A, Fasano F, Oliveto R, Tortora G (2004*) Enhancing an artefact management system with traceability recovery features. In: Proceedings of the 20th international conference on software maintenance, pp 306–315

  • De Lucia A, Fasano F, Oliveto R, Tortora G (2005*) ADAMS re-trace: A traceability recovery tool. In: Proceedings of the 9th European conference on software maintenance and reengineering, pp 32–41

  • De Lucia A, Di Penta M, Oliveto R, Zurolo F (2006a) COCONUT: COde COmprehension nurturant using traceability. In: Proceedings of the 22nd international conference on software maintenance, pp 274–275

  • De Lucia A, Di Penta M, Oliveto R, Zurolo F (2006b) Improving comprehensibility of source code via traceability information: A controlled experiment. In: Proceedings of the 14th international conference on program comprehension, pp 317–326

  • De Lucia A, Fasano F, Oliveto R, Tortora G (2006*a) Can information retrieval techniques effectively support traceability link recovery? In: Proceedings of the 14th international conference on program comprehension, pp 307–316

  • De Lucia A, Oliveto R, Sgueglia P (2006*b) Incremental approach and user feedbacks: A silver bullet for traceability recovery? In: Proceedings of the international conference on software maintenance, pp 299–308

  • De Lucia A, Fasano F, Oliveto R, Tortora G (2007*) Recovering traceability links in software artifact management systems using information retrieval methods. Trans Softw Eng Methodol 16(4)

  • De Lucia A, Fasano F, Oliveto R (2008) Traceability management for impact analysis. In: Frontiers of software maintenance, pp 21–30

  • De Lucia A, Oliveto R, Tortora G (2008*) IR-based traceability recovery processes: An empirical comparison of “one-shot” and incremental processes. In: Proceedings of the 23rd international conference on automated software engineering, pp 39–48

  • De Lucia A, Oliveto R, Tortora G (2009*a) Assessing IR-based traceability recovery tools through controlled experiments. Empir Software Eng 14(1):57–92

    Article  Google Scholar 

  • De Lucia A, Oliveto R, Tortora G (2009*b) The role of the coverage analysis during IR-based traceability recovery: a controlled experiment. In: Proceedings of the 25th international conference on software maintenance, pp 371–380

  • De Lucia A, Di Penta M, Oliveto R, Panichella A, Panichella S (2011*) Improving IR-based traceability recovery using smoothing filters. In: Proceedings of the 19th international conference on program comprehension, pp 21–30

  • De Lucia A, Marcus A, Oliveto R, Poshyvanyk D (2012) Information retrieval methods for automated traceability recovery. In: Cleland-Huang J, Gotel O, Zisman A (eds) Software and systems traceability, Springer

  • Deerwester S, Dumais S, Furnas G, Landauer T, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407

    Article  Google Scholar 

  • Dekhtyar A, Huffman Hayes J (2006) Good benchmarks are hard to find: Toward the benchmark for information retrieval applications in software engineering. In: Proceedings of the 22th international conference on software maintenance

  • Dekhtyar A, Huffman Hayes J, Antoniol G (2007) Benchmarks for traceability? In: Proceedings of the international symposium on grand challenges in traceability

  • Dekhtyar A, Huffman Hayes J, Larsen J (2007*a) Make the most of your time: how should the analyst work with automated traceability tools? In: Proceedings of the 3rd international workshop on predictor models in software engineering

  • Dekhtyar A, Huffman Hayes J, Sundaram S, Holbrook A, Dekhtyar O (2007*b) Technique integration for requirements assessment. In: Proceedings of the 15th international requirements engineering conference, pp 141–152

  • Dekhtyar A, Dekhtyar O, Holden J, Huffman Hayes J, Cuddeback D, Kong W (2011*) On human analyst performance in assisted requirements tracing: statistical analysis. In: Proceedings of the 19th international requirements engineering conference, pp 111–120

  • Di F, Zhang M (2009*) An improving approach for recovering requirements-to-design traceability links. In: Proceedings of the international conference on computational intelligence and software engineering, pp 1–6

  • Di Penta M, Gradara S, Antoniol G (2002*) Traceability recovery in RAD software systems. In: Proceedings of the 10th international workshop on program comprehension, pp 207–216

  • Dit B, Revelle M, Gethers M, Poshyvanyk D (2011) Feature location in source code: a taxonomy and survey. J Softw Main Evol (25)1:53–95

    Article  Google Scholar 

  • Dömges R, Pohl K (1998) Adapting traceability environments to project-specific needs. Commun ACM 41(12):54–62

    Article  Google Scholar 

  • Duan C, Cleland-Huang J (2007*) Clustering support for automated tracing. In: Proceedings of the international conference on automated software engineering, pp 244–253

  • Egyed A, Grunbacher P (2002) Automating requirements traceability: beyond the record replay paradigm. In: Proceedings of the 17th international conference on automated software engineering, pp 163–171

  • Eisenbarth T, Koschke R, Simon D (2003) Locating features in source code. Trans Software Eng 29(3):210– 224

    Article  Google Scholar 

  • Falessi D, Cantone G, Canfora G (2010) A comprehensive characterization of NLP techniques for identifying equivalent requirements. In: Proceedings of the 4th international symposium on empirical software engineering and measurement

  • Felizardo KR, Salleh N, Martins RM, Mendes E, MacDonell SG, Maldonado JC (2011) Using visual text mining to support the study selection activity in systematic literature reviews. In: Proceedings of the 5th international symposium on empirical software engineering and measurement, pp 77–86

  • Fiutem R, Antoniol G (1998) Identifying design-code inconsistencies in object-oriented software: a case study. In: Proceedings of the international conference on software maintenance, pp 94–102

  • Gay G, Haiduc S, Marcus A, Menzies T (2009) On the use of relevance feedback in IR-based concept location. In: Proceedings of the 25th international conference on software maintenance, pp 351–360

  • Gethers M, Kagdi H, Dit B, Poshyvanyk D (2011) An adaptive approach to impact analysis from change requests to source code. In: Proceedings of the 26th international conference on automated software engineering, pp 540–543

  • Gethers M, Oliveto R, Poshyvanyk D, De Lucia A (2011*) On integrating orthogonal information retrieval methods to improve traceability recovery. In: Proceedings of the 27th international conference on software maintenance, pp 133–142

  • Gibiec M, Czauderna A, Cleland-Huang J (2010*) Towards mining replacement queries for hard-to-retrieve traces. In: Proceedings of the international conference on automated software engineering, pp 245–254

  • Gotel O, Finkelstein C (1994) An analysis of the requirements traceability problem. In: Proceedings of the first international conference on requirements engineering, pp 94–101

  • Gotel O, Cleland-Huang J, Huffman Hayes J, Zisman A, Egyed A, Grünbacher P, Dekhtyar A, Antoniol G, Maletic J (2012) The grand challenge of traceability (v1.0). In: Cleland-Huang J, Gotel O, Zisman A (eds) Software and systems traceability, Springer

  • Heindl M, Biffl S (2005) A case study on value-based requirements tracing. In: Proceedings of the 10th European software engineering conference held jointly with the 13th SIGSOFT international symposium on foundations of software engineering, pp 60–69

  • Hofman T (2001) Unsupervised learning by probabilistic latent semantic analysis. Mach Learn 42(1–2):177–196

    Article  Google Scholar 

  • Huffman Hayes J, Dekhtyar A (2005a) A framework for comparing requirements tracing experiments. Int J Softw Eng Knowl Eng 15(5):751–781

    Article  Google Scholar 

  • Huffman Hayes J, Dekhtyar A (2005b) Humans in the traceability loop: can’t live with ’em, can’t live without ’em. In: Proceedings of the 3rd international workshop on traceability in emerging forms of software engineering, pp 20–23

  • Huffman Hayes J, Dekhtyar A, Osborne J (2003*) Improving requirements tracing via information retrieval. In: Proceedings of the 11th international requirements engineering conference, pp 138–147

  • Huffman Hayes J, Dekhtyar A, Sundaram S, Howard S (2004*) Helping analysts trace requirements: An objective look. In: Proceedings of the 12th international conference on requirements engineering, pp 249–259

  • Huffman Hayes J, Dekhtyar A, Sundaram S (2005*) Text mining for software engineering: how analyst feedback impacts final results. In: Proceedings of the international workshop on mining software repositories, pp 1–5

  • Huffman Hayes J, Dekhtyar A, Sundaram S (2006*) Advancing candidate link generation for requirements tracing: the study of methods. Trans Softw Eng 32(1):4–19

    Article  Google Scholar 

  • Huffman Hayes J, Dekhtyar A, Sundaram S, Holbrook A, Vadlamudi S, April A (2007*) REquirements TRacing on target (RETRO): improving software maintenance through traceability recovery. Innov Syst Softw Eng 3(3):193–202

    Article  Google Scholar 

  • Huffman Hayes J, Antoniol G, Guéhéneuc Y (2008) PREREQIR: recovering Pre-Requirements via cluster analysis. In: Proceedings of the 15th working conference on reverse engineering, pp 165–174

  • Huffman Hayes J, Sultanov H, Kong W, Li W (2011*) Software verification and validation research laboratory (SVVRL) of the university of kentucky: traceability challenge 2011: language translation. In: Proceeding of the 6th international workshop on traceability in emerging forms of software engineering, ACM, pp 50–53

  • Ingwersen P, Järvelin K (2005) The turn: integration of information seeking and retrieval in context. Springer

  • International Electrotechnical Commission (2003) IEC 61511-1 ed 1.0, safety instrumented systems for the process industry sector

  • International Organization for Standardization (2011) ISO 26262-1:2011 road vehicles – functional safety –

  • Järvelin K, Kekäläinen J (2000) IR evaluation methods for retrieving highly relevant documents. In: Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval, pp 41–48

  • Jedlitschka A, Ciolkowski M, Pfahl D (2008) Reporting experiments in software engineering. In: Shull F, Singer J, Sjoberg D (eds) Guide to advanced empirical software engineering, Springer, London, pp 201–228

    Chapter  Google Scholar 

  • Jiang H, Nguyen T, Chen I, Jaygarl H, Chang C (2008*) Incremental latent semantic indexing for automatic traceability link evolution management. In: Proceedings of the 23rd international conference on automated software engineering, pp 59–68

  • Katta V, Stålhane T (2011) A conceptual model of traceability for safety systems. In: Proceedings of the complex systems design & management conference

  • Kaushik N, Tahvildari L, Moore M (2011*) Reconstructing traceability between bugs and test cases: an experimental study. In: Proceedings of the 18th working conference on reverse engineering, pp 411–414

  • Kekäläinen J, Järvelin K (2002) Evaluating information retrieval systems under the challenges of interaction and multidimensional dynamic relevance. In: Proceedings of the COLIS 4 conference pp 253–270

  • Kitchenham B, Charters S (2007) Guidelines for performing systematic literature reviews in software engineering. EBSE Technical Report

  • Kitchenham B, Pfleeger S, Pickard L, Jones P, Hoaglin D, El Emam K, Rosenberg J (2002) Preliminary guidelines for empirical research in software engineering. Trans Softw Eng Methodol 28(8):721–734

    Article  Google Scholar 

  • Kitchenham B, Budgen D, Brereton P (2011) Using mapping studies as the basis for further research—a participant-observer case study. Inform Softw Technol 53(6):638–651

    Article  Google Scholar 

  • Klock S, Gethers M, Dit B, Poshyvanyk D (2011*) Traceclipse: an eclipse plug-in for traceability link recovery and management. In: Proceedings of the 6th international workshop on traceability in emerging forms of software engineering, pp 24–30

  • Kong L, Li J, Li Y, Yang Y, Wang Q (2009*) A requirement traceability refinement method based on relevance feedback. In: Proceedings of the 21st international conference on software engineering and knowledge engineering

  • Kong W, Huffman Hayes J (2011*) Proximity-based traceability: an empirical validation using ranked retrieval and set-based measures. In: Proceedings of the 1st international workshop on empirical requirements engineering, pp 45–52

  • Kong W, Huffman Hayes J, Dekhtyar A, Holden J (2011*) How do we trace requirements: an initial study of analyst behavior in trace validation tasks. In: Proceeding of the 4th international workshop on cooperative and human aspects of software engineering, pp 32–39

  • Kruchten P (2004) The rational unified process: an introduction. Addison-Wesley Professional

  • Leuser J (2009*) Challenges for semi-automatic trace recovery in the automotive domain. In: Proceedings of the international workshop on traceability in emerging forms of software engineering, pp 31–35

  • Leuser J, Ott D (2010*) Tackling semi-automatic trace recovery for large specifications. In: Requirements engineering: foundation for software quality, pp 203–217

  • Lewis D (1998) Naive (Bayes) at forty: The independence assumption in information retrieval. In: Machine learning: ECML-98, vol 1398, Springer, pp 4–15

  • Li Y, Li J, Yang Y, Li M (2008*) Requirement-centric traceability for change impact analysis: a case study. In: International conference on software process, pp 100–111

  • Liddy E (2001) Natural language processing, 2nd edn. Encyclopedia of Library and Information Science, Marcel Decker

  • Lin J, Chan L, Cleland-Huang J, Settimi R, Amaya J, Bedford G, Berenbach B, Khadra OB, Chuan D, Zou X (2006) Poirot: A distributed tool supporting Enterprise-Wide automated traceability. In: Proceedings of the 14th international conference on requirements engineering, pp 363–364

  • Lindvall M, Feldmann R, Karabatis G, Chen Z, Janeja V (2009) Searching for relevant software change artifacts using semantic networks. In: Proceedings of the symposium on applied computing, pp 496–500

  • Lormans M, van Deursen A (2006*) Can LSI help reconstructing requirements traceability in design and test? In: Proceedings of the 10th European conference on software maintenance and reengineering, pp 45–54

  • Lormans M, Gross H, van Deursen A, van Solingen R, Stehouwer A (2006*) Monitoring requirements coverage using reconstructed views: An industrial case study. In: Procedings of the 13th working conference on reverse engineering, pp 275–284

  • Lormans M, Van Deursen A, Gross H (2008*) An industrial case study in reconstructing requirements views. Empir Software Eng 13(6):727–760

    Article  Google Scholar 

  • Mahmoud A, Niu N (2010*) Using semantics-enabled information retrieval in requirements tracing: An ongoing experimental investigation. In: Proceedings of the international computer software and applications conference, pp 246–247

  • Mahmoud A, Niu N (2011*) Source code indexing for automated tracing. In: Proceeding of the 6th international workshop on traceability in emerging forms of software engineering, pp 3–9

  • Manning C, Raghavan P, Schutze H (2008) Introduction to information retrieval. Cambridge University Press

  • Marcus A, Maletic J (2003) Recovering documentation-to-source-code traceability links using latent semantic indexing. In: Proceedings of the 25th international conference on software engineering, pp 125–135

  • Marcus A, Sergeyev A, Rajlich V, Maletic JI (2004) An information retrieval approach to concept location in source code. In: Proceedings of the 11th working conference on reverse engineering, pp 214–223

  • Marcus A, Maletic J, Sergeyev A (2005*) Recovery of traceability links between software documentation and source code. Int J Softw Eng Knowl Eng 15(5):811–836

    Article  Google Scholar 

  • Maron M, Kuhns J (1960) On relevance, probabilistic indexing and information retrieval. J ACM 7(3):216–244

    Article  Google Scholar 

  • McMillan C, Poshyvanyk D, Revelle M (2009*) Combining textual and structural analysis of software artifacts for traceability link recovery. In: Proceedings of the international workshop on traceability in emerging forms of software engineering, pp 41–48

  • Natt och Dag J, Regnell B, Carlshamre P, Andersson M, Karlsson J (2002*) A feasibility study of automated natural language requirements analysis in market-driven development. Requirements Eng 7(1):20–33

    Article  MATH  Google Scholar 

  • Natt och Dag J, Gervasi V, Brinkkemper S, Regnell B (2004*) Speeding up requirements management in a product software company: linking customer wishes to product requirements through linguistic engineering. In: Proceedings of the 12th international requirements engineering conference, pp 283–294

  • Natt och Dag J, Thelin T, Regnell B (2006*) An experiment on linguistic tool support for consolidation of requirements from multiple sources in market-driven product development. Empir Software Eng 11(2):303–329

    Article  Google Scholar 

  • Oliveto R (2008) Traceability management meets information retrieval methods: strengths and limitations. PhD thesis, University of Salerno

  • Oliveto R, Gethers M, Poshyvanyk D, De Lucia A (2010*) On the equivalence of information retrieval methods for automated traceability link recovery. In: Proceedings of the 18th international conference on program comprehension, pp 68–71

  • Olsson T (2002) Software information management in requirements and test documentation. Licentiate thesis, Lund University

  • Park S, Kim H, Ko Y, Seo J (2000*) Implementation of an efficient requirements analysis supporting system using similarity measure techniques. Inform Softw Technol 42(6):429–438

    Article  Google Scholar 

  • Parvathy AG, Vasudevan BG, Balakrishnan R (2008*) A comparative study of document correlation techniques for traceability analysis. In: Proceedings of the 10th international conference on enterprise information systems, information systems analysis and specification, pp 64–69

  • Petersen K, Wohlin C (2009) Context in industrial software engineering research. In: Proceedings of the 3rd international symposium on empirical software engineering and measurement, pp 401–404

  • Petersen K, Feldt R, Mujtaba S, Mattsson M (2008) Systematic mapping studies in software engineering. In: Proceedings of the 12th international conference on evaluation and assessment in software engineering, pp 71–80

  • Pohl K, Bockle G, van der Linden F (2005) Software product line engineering: foundations, principles, and techniques. Birkhäuser

  • Ponte J, Croft B (1998) A language modeling approach to information retrieval. In: Proceedings of the 21st annual international SIGIR conference on research and development in information retrieval, pp 275–281

  • Port D, Nikora A, Hihn J, Huang L (2011*) Experiences with text mining large collections of unstructured systems development artifacts at JPL. In: Proceedings of the 33rd international conference on software engineering, pp 701–710

  • Randolph J (2005) Free-Marginal multirater kappa (multirater k[free]): an alternative to fleiss’ Fixed-Marginal multirater kappa. In: Joensuu learning and instruction symposium

  • Robertson S (1977) The probability ranking principle in IR. J Doc 33(4):294–304

    Article  Google Scholar 

  • Robertson S, Robertson J (1999) Mastering the requirements process. Addison-Wesley Professional

  • Robertson S, Zaragoza H (2009) The probabilistic relevance framework: BM25 and beyond. Foundations and Trends in Information Retrieval 3(4):333–389

    Article  Google Scholar 

  • Robertson SE, Jones S (1976) Relevance weighting of search terms. J Am Soc Inform Sci27(3):129–146

    Article  Google Scholar 

  • Rocchio J (1971) Relevance feedback in information retrieval. In: Salton G (ed) The SMART retrieval system: experiments in automatic document processing. Prentice-Hall, pp 313–323

  • Runeson P, Alexandersson M, Nyholm O (2007) Detection of duplicate defect reports using natural language processing. In: Proceedings of the 29th international conference on software engineering, pp 499–510

  • Runeson P, Höst M, Rainer A, Regnell B (2012) Case study research in software engineering. Guidelines and examples. Wiley

  • Sabaliauskaite G, Loconsole A, Engström E, Unterkalmsteiner M, Regnell B, Runeson P, Gorschek T, Feldt R (2010) Challenges in aligning requirements engineering and verification in a Large-Scale industrial context. In: requirements engineering: foundation for software quality, pp 128–142

  • Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manage 24(5):513–523

    Article  Google Scholar 

  • Salton G, Wong A, Yang C (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620

    Article  MATH  Google Scholar 

  • Scacchi W (2002) Understanding the requirements for developing open source software systems. IEEE Software 149(1):24–39

    Article  Google Scholar 

  • Settimi R, Cleland-Huang J, Ben Khadra O, Mody J, Lukasik W, DePalma C (2004*) Supporting software evolution through dynamically retrieving traces to UML artifacts. In: Proceedings of the 7th international workhop on principles of software evolution, pp 49–54

  • Shull F, Carver J, Vegas S, Juristo N (2008) The role of replications in empirical software engineering. Empir Software Eng 13(2):211–218

    Article  Google Scholar 

  • Singhal A (2001) Modern information retrieval: a brief overview. Data Eng Bull 24(2):1–9

    Google Scholar 

  • Smeaton A, Harman D (1997) The TREC experiments and their impact on europe. J Inf Sci 23(2):169–174

    Article  Google Scholar 

  • Spanoudakis G, d’Avila-Garcez A, Zisman A (2003) Revising rules to capture requirements traceability relations: A machine learning approach. In: Proceedings of the 15th international conference in software engineering and knowledge engineering

  • Spanoudakis G, Zisman A, Perez-Minana E, Krause P (2004) Rule-based generation of requirements traceability relations. J Syst Softw 72(2):105–127

    Article  Google Scholar 

  • Spärck Jones K, Walker S, Robertson SE (2000) A probabilistic model of information retrieval: development and comparative experiments. Inf Process Manage 36(6):779–808

    Article  Google Scholar 

  • Stone A, Sawyer P (2006) Using pre-requirements tracing to investigate requirements based on tacit knowledge. In: Proceedings of the 1st international conference on software and data technologies, pp 139–144

  • Sultanov H, Huffman Hayes J (2010*) Application of swarm techniques to requirements engineering: Requirements tracing. In: Proceedings of the 18th international requirements engineering conference, pp 211–220

  • Sundaram S, Huffman Hayes J, Dekhtyar A (2005*) Baselines in requirements tracing. In: Proceedings of the workshop on predictor models in software engineering, pp 1–6

  • Sundaram S, Huffman Hayes J, Dekhtyar A, Holbrook A (2010*) Assessing traceability of software engineering artifacts. Requirements Eng 15(3):313–335

    Article  Google Scholar 

  • Torchiano M, Ricca F (2010) Impact analysis by means of unstructured knowledge in the context of bug repositories. In: Proceedings of the 4th international symposium on empirical software engineering and measurement, pp 47:1–47:4

  • Turtle H, Croft B (1991) Evaluation of an inference network-based retrieval model. Trans Inf Syst 9(3):187–222

    Article  Google Scholar 

  • Van Rompaey B, Demeyer S (2009) Establishing traceability links between unit test cases and units under test. In: Proceedings of the 13th European conference on software maintenance and reengineering, pp 209–218

  • Voorhees E (2005) TREC: Experiment and evaluation in information retrieval. MIT Press

  • Wang X, Lai G, Liu C (2009*) Recovering relationships between documentation and source code based on the characteristics of software engineering. Electron Notes Theor Comput Sci 243:121–137

    Article  Google Scholar 

  • Winkler S (2009*) Trace retrieval for evolving artifacts. In: Proceedings of the international workshop on traceability in emerging forms of software engineering, pp 49–56

  • Winkler S, Pilgrim J (2010) A survey of traceability in requirements engineering and model-driven development. Softw Syst Model 9(4):529–565

    Article  Google Scholar 

  • Wohlin C, Runeson P, M Höst, Ohlsson M, Regnell B, Wesslén A (2012) Experimentation in software engineering: a practical guide. Springer

  • Yadla S, Huffman Hayes J, Dekhtyar A (2005*) Tracing requirements to defect reports: an application of information retrieval techniques. Innov Syst Softw Eng 1:116–124

    Article  Google Scholar 

  • Zhai C (2007) A brief review of information retrieval models. Technical report, University of Illinois at Urbana-Champaign

  • Zhai C (2008) Statistical language models for information retrieval a critical review. Foundations and Trends Information Retrieval 2(3):137–213

    Article  Google Scholar 

  • Zhai C, Lafferty J (2001) Model-based feedback in the language modeling approach to information retrieval. In: Proceedings of the 10th international conference on information and knowledge management, pp 403–410

  • Zhao W, Zhang L, Liu Y, Luo J, Sun JS (2003*) Understanding how the requirements are implemented in source code. In: Proceedings of the 10th Asia-Pacific software engineering conference, pp 68–77

  • Zhou X, Yu H (2007*) A clustering-based approach for tracing object-oriented design to requirement. In: Proceedings of the 10th international conference on fundamental approaches to software engineering, pp 412–422

  • Zou X, Settimi R, Cleland-Huang J (2006*) Phrasing in dynamic requirements trace retrieval. In: Proceedings of the 30th international computer software and applications conference, pp 265–272

  • Zou X, Settimi R, Cleland-Huang J (2008*) Evaluating the use of project glossaries in automated trace retrieval. In: Proceedings of the international conference on software engineering research and practice, pp 157–163

  • Zou X, Settimi R, Cleland-Huang J (2010*) Improving automated requirements trace retrieval: A study of term-based enhancement methods. Empir Software Eng 15(2):119–146

    Article  Google Scholar 

Download references

Acknowledgements

This work was funded by the Industrial Excellence Center EASE – Embedded Applications Software Engineering.Footnote 6 Thanks go to our librarian Mats Berglund for working on the search strings, and Lorand Dali for excellent comments on IR details.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Markus Borg.

Additional information

Communicated by: Giulio Antoniol

Appendix: Classification of Primary Publications

Appendix: Classification of Primary Publications

Tables 8 present our classification of the primary publications, sorted by number of citations according to Google Scholar (July 1, 2012). Note that the well-cited works by Marcus and Maletic (2003) (354 citations) and Antoniol et al. (2000) (85 citations) are not listed. Applied IR models are reported in the fourth column. For LSI, the number of dimensions (k) in the reduced term-document space is reported in parenthesis, divided per dataset when possible. The number of dimensions is reported either as a fixed number of dimensions, an interval of dimensions, a dimensionality reduction in percent, or ‘N/A’ when the information is not available. A bold number represents that the best choice, as concluded by the original authors. Regarding LDA, the number of topics (t) is reported. Datasets are classified according to origin: proprietary (Ind), open source (OS), university (Univ), student (Stud), not clearly reported (Unclear), and mixed origin (Mixed). Numbers in parentheses show the number of artifacts studied, i.e. the total number of artifacts in the dataset, ‘N/A’ is used when it is not reported. Unless the full dataset name is presented, the following abbreviations are used: IBS (Ice Breaker System), EBT (Event-Based Traceability), LC (Light Control system), TM (Transient Meter). Evaluation, the rightmost column, maps primary publications to the context taxonomy described in Section 3 (Level 1–4 = retrieval context, seeking context, work task context, project context). Finally, Table 9 shows the distinctly most productive authors and affiliations, based upon our primary publications.

Table 8 Classification of primary publications, part I
Table 9 Most productive authors and affiliations

Rights and permissions

Reprints and permissions

About this article

Cite this article

Borg, M., Runeson, P. & Ardö, A. Recovering from a decade: a systematic mapping of information retrieval approaches to software traceability. Empir Software Eng 19, 1565–1616 (2014). https://doi.org/10.1007/s10664-013-9255-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-013-9255-y

Keywords

Navigation