Neural networks for predicting the duration of new software projects

https://doi.org/10.1016/j.jss.2014.12.002Get rights and content

Highlights

  • Two neural networks are applied for predicting the development duration of new software projects.

  • The software projects are obtained from the ISBSG dataset release 11.

  • Adjusted function points and team size are used as independent variables.

  • Prediction accuracy is calculated from the absolute residuals.

  • Prediction accuracy of the neural networks resulted statistically better than that of a statistical regression.

Abstract

The duration of software development projects has become a competitive issue: only 39% of them are finished on time relative to the duration planned originally. The techniques for predicting project duration are most often based on expert judgment and mathematical models, such as statistical regression or machine learning. The contribution of this study is to investigate whether or not the duration prediction accuracy obtained with a multilayer feedforward neural network model, also called a multilayer perceptron (MLP), and with a radial basis function neural network (RBFNN) model is statistically better than that obtained by a multiple linear regression (MLR) model when functional size and the maximum size of the team of developers are used as the independent variables. The three models mentioned above are trained and tested by predicting the duration of new software development projects with a set of projects from the International Software Benchmarking Standards Group (ISBSG) release 11. Results based on absolute residuals, Pred(l) and a Friedman statistical test show that prediction accuracy with the MLP and the RBFNN is statistically better than with the MLR model.

Introduction

Software engineering (SE) involves a number of product development activities, including engineering management. Software project planning (SPP) addresses the activities undertaken to prepare for a successful SE project from a management perspective. SPP involves process planning, determination of deliverables, software prediction (also referred to as software estimation), risk management, quality management, and plan management (Abran and Moore, 2004).

The term development is frequently used with reference to three generic stages: design, construction, and testing; and the term maintenance refers to anomalies uncovered, operating environments changed, and additional user requirements put forward after the software product has been delivered (Abran and Moore, 2004). A maintained software project is also referred to as an enhancement development project (ISBSG, 2011).

Both development and maintenance are typically accompanied by other activities, such as documentation, risk analysis, verification, validation, and measurement. For both development and maintenance, the following types of predictions can be made once the software requirements have been specified (Abran and Moore, 2004):

  • a)

    The number of person-hours required to complete the development or maintenance (effort).

  • b)

    The duration of tasks with projected start times, individual duration by task, and end times. In SPP, the critical path is usually called duration (Berlin et al., 2009), schedule (Alyahya et al., 2009), or cycle time (Agrawal and Chari, 2007).

  • c)

    The cost of the project based on the resource requirements, such as people or tools.

Duration prediction is necessary for budgeting purposes, and is typically handled on a monthly basis in software development organizations (i.e. building rental, employee health or life insurance, and so on). The duration of projects has also been used as a reference for the maturity of processes in software enterprises (Agrawal and Chari, 2007, Alyahya et al., 2009, Harter, Krishnan and Slaughter, 2000), since the under prediction or over prediction of project duration at the planning stage can negatively impact budgets.

A 2013 study based on an analysis of 50,000 projects developed between the years 2003 and 2012 in real environments from USA (60%), Europe (25%) and the remaining 15% representing the rest of the world, reports that only 39% of projects were delivered on time, on budget, and with the required features and functions; 43% of them were challenged (late, over budget, and/or with less than the required features and functions); and 18% failed (cancelled prior to completion or delivered and never used) (Chaos Report, 2013).

We found only a few studies on duration prediction in the literature, published in the past 15 years: 2002 (Kitchenham et al., 2002), 2007 (Bourque et al., 2007), 2009 (Berlin et al., 2009), 2012 (Wang et al., 2012), and 2013 (López-Martín, Chavoya and Meda-Campaña, 2013, Zapata and Chaudron, 2013). Bourque et al. (2007 ) report that duration studies prior to 2002 were published in the late 1970s and during the 1980s. However, a 2013 study analyzing 171 projects developed by 1000 practitioners working in 50 countries (Zapata and Chaudron, 2013) notes that, while most of the studies on accuracy have focused on effort prediction, the main prediction issue in practice is duration. Actually, development duration has become a competitive issue in many industries (Alyahya et al., 2009, Harter, Krishnan and Slaughter, 2000, Zapata and Chaudron, 2013), therefore, this study investigates the prediction of the duration of software development projects.

The techniques for predicting the duration have been based upon expert judgment (Kitchenham et al., 2002, Zapata and Chaudron, 2013), statistical regression (Berlin et al., 2009, Bourque et al., 2007, Kitchenham et al., 2002, López-Martín, Chavoya and Meda-Campaña, 2013, Oligny, Bourque and Abran, 1997, Wang, Yu and Chan, 2012), artificial neural networks (Berlin et al., 2009, López-Martín, Chavoya and Meda-Campaña, 2013, Wang, Yu and Chan, 2012) and support vector machines (Wang et al., 2012).

A neural network can learn complex (nonlinear) functions (Anderson, 1995), and nonlinear relationships are common among dependent and independent variables in software projects (Chao-Jung and Chin-Yu, 2011). The kind of neural network used for predicting the duration of software projects has been the multilayer feedforward neural network, also termed multilayer perceptron (MLP) (Berlin et al., 2009, López-Martín, Chavoya and Meda-Campaña, 2013, Wang, Yu and Chan, 2012). In this study, another kind of neural network referred to as radial basis function neural network (RBFNN) is proposed.

In the software prediction field, it is common practice to use software size as the independent variable for predicting project effort, and the predicted effort as the independent variable for predicting the duration of the project (Ahmed, Ahmad and AlGhamdi, 2013, Berlin et al., 2009, Boehm et al., 2000, Bourque et al., 2007). The size of a software product is mainly measured in function points or source lines of code (Sheetz et al., 2009), while the duration of a software project is usually measured in months.

Taking into consideration that actual effort is not known at the start of the project and cannot be used as an independent variable in a duration prediction model at prediction time (Bourque et al., 2007, Kitchenham et al., 2002), and that the duration of a software project also depends on the number of developers it involves (Kitchenham et al., 2002), the models proposed in this study are trained and tested using the size of the projects and the number of developers as independent variables instead of using development effort as the independent variable. Sixteen publicly available datasets of software projects were analyzed for identifying the availability of these two independent variables for our research purposes: 15 datasets from the PROMISE repository (PROMISE, 2014) and release 11 of the International Software Benchmarking Standards Group (ISBSG) (ISBSG, 2011). In the PROMISE repository, all of the 15 datasets had an attribute related to the size of the projects (either in function points or lines of code), while 8 of them (China, COCOMO 81, COSMIC, ISBSG release 10, Kemerer, Kitchenham, Maxwell, and Nasa93) included an attribute related to duration (reported either in months or days). Excluding the 2 datasets (COSMIC and ISBSG release 10) corresponding to a subset of the ISBSG dataset, none 13 PROMISE datasets had an attribute related to the number of developers. In the release 11 of the ISBSG dataset, the required attributes related to the size of projects, the number of developers, and the duration of software projects were available. Therefore, only the ISBSG dataset could be used for our research purposes. In addition, the ISBSG dataset also allowed to identify the software projects based upon their type of development (new and enhancement), development platform, and programming language type (ISBSG, 2011).

The contribution of this study is to investigate whether or not the duration prediction accuracy obtained with an MLP and with a RBFNN model is better than that obtained by a multiple linear regression (MLR) model when functional size and the maximum size of the team of developers are used as the independent variables.

The comparison among prediction accuracies of the MLP, RBFNN and MLR model, was achieved using absolute residuals (AR) and Pred(l) as the accuracy criteria.

Specifically, the hypothesis investigated in this research is the following:

H1

The accuracy of duration prediction with an MLP and a RBFNN is statistically better than the accuracy obtained by MLR when adjusted function point data and the maximum team size of developers are used as the independent variables.

The rest of this study is organized as follows: Section 2 presents related works on duration prediction for software projects. Section 3 describes and compares the MLP and RBFNN models. Section 4 presents the criteria for evaluating the accuracy of the models, as well as for selecting the sample data from the ISBSG dataset. Section 5 describes the training and testing for the three models. Section 6 compares the accuracy results obtained for the models. Finally, section 7 presents a discussion, including our conclusions, the limitations of our study, and directions for future work.

Section snippets

Related work

In the software project prediction field, techniques have mainly been used for predicting software product size, project effort, and project duration. These prediction techniques have been based on informal models, such as expert judgment, or on mathematical models, such as statistical and machine learning techniques.

Regarding software product size prediction, the techniques reported in the literature are either expert judgment (Wilkie et al., 2011) or mathematical models, such as statistical

MLP and RBF neural networks

A neural network (NN) is a model inspired by the processing performed by a network of biological neurons. The basis for the construction of an NN is the artificial neuron. The input of an artificial neuron is a vector of numeric values. The neuron receives the vector and perceives each value, or component, of the vector with a particular independent sensitivity called weight. Upon receiving the input vector, the neuron first calculates its internal state and then its output value. The internal

Accuracy criteria

The accuracy criteria for evaluating the models of this study is based on absolute residuals (AR) and Pred(l). The AR is defined as follows: ARi=|actualdurationipredicteddurationi|

It is calculated for each observation i, the duration of which is predicted. The aggregation of the AR over multiple observations (N) can be obtained by the mean (MAR) as follows: MAR=1Ni=1NAR

The second criterion is calculated as follows: Pred(l) = k/N, where k is the number of software projects for which the AR is

Training and testing the models

The methods most frequently used to evaluate the generalization level of an NN are the holdout, leave-one-out cross-validation, and k-fold cross validation (k > 1) methods (Bishop, 1995). In the holdout method, the sample is partitioned into two mutually exclusive subsamples, termed training and test. In the k-fold cross validation (k > 1) method, the sample is divided into k mutually exclusive subsamples, k – 1, which are used for training, and the kth subsample, which is used for testing.

Results

Table 4 shows the MAR, MdAR and Pred(25) by model following application of a LOOCV method in the three models. A statistical test for comparing the three sets of ARs by model should be selected taking into account the assumptions of data dependence, normality, and variance (Ross, 2004):

  • a)

    Dependence: Software project data can be described by n sets of three dimensions (Xi, Yi, Zi), i = 1, …, n, where i is the ith project; n is the number of projects; Xi, Yi, and Zi are the ARs obtained from the

Discussion

An inaccurate duration prediction on a software project could mean late delivery of a software product or service (Zapata and Chaudron, 2013). A variety of techniques have been used to predict duration, such as expert judgment, statistical regression, neural networks, and support vector machines. In the study reported here, a multilayer feedforward neural network (MLP) and a radial basis function neural network (RBFNN) are proposed. The sample for training and testing the MLP and RBFNN was

Acknowledgments

The authors thank the CUCEA of Universidad de Guadalajara, Jalisco, México, Programa de Mejoramiento del Profesorado (PROMEP), and the Consejo Nacional de Ciencia y Tecnología (Conacyt).

Cuauhtémoc López-Martín is a researcher with the Information Systems Department at the Universidad de Guadalajara, Jalisco, Mexico. He received his Ph.D. in Computer Science in the Center for Computing Research of the National Polytechnic Institute of Mexico in 2007. His research is related to software prediction techniques, software processes and statistics applied to software engineering. He is a member of the Mexican National System of Researchers. Dr. Lopez-Martín has more than 15 years of

References (49)

  • ParkH. et al.

    An empirical validation of a neural network model for software effort estimation

    J. Expert Syst. Appl.

    (2008)
  • SheetzS.D. et al.

    Understanding developer and manager perceptions of function points and source lines of code

    J. Syst. Softw.

    (2009)
  • ShepperdM. et al.

    Evaluating prediction systems in software project estimation

    Inform. Softw. Technol.

    (2012)
  • WangY.R. et al.

    Predicting construction cost and schedule success using artificial neural network ensemble and support vector machine classification models

    Int. J. Project Manage.

    (2012)
  • WenJ. et al.

    Systematic literature review of machine learning based software development effort estimation models

    Inform. Softw. Technol.

    (2012)
  • WilkieF.G. et al.

    The value of software sizing

    Inform. Softw. Technol.

    (2011)
  • YangY. et al.

    Analyzing and handling local bias for calibrating parametric cost estimation models

    Inform. Softw. Technol.

    (2013)
  • AbranA. et al.

    The Guide to the Software Engineering Body of Knowledge

    (2004)
  • AgrawalM. et al.

    Software effort, quality, and cycle time: a study of CMM Level 5 projects

    IEEE Trans. Softw. Eng.

    (2007)
  • AlyahyaM.A. et al.

    Effect of CMMI-based software process maturity on software schedule estimation

    Malays. J. Comput. Sci.

    (2009)
  • AndersonJ.A.

    An Introduction to Neural Networks

    (1995)
  • BishopC.M.

    Neural Networks for Pattern Recognition

    (1995)
  • BoehmB. et al.

    COCOMO II

    (2000)
  • BourqueP. et al.

    Developing project duration models in software engineering

    J. Comput. Sci. Technol.

    (2007)
  • Cited by (0)

    Cuauhtémoc López-Martín is a researcher with the Information Systems Department at the Universidad de Guadalajara, Jalisco, Mexico. He received his Ph.D. in Computer Science in the Center for Computing Research of the National Polytechnic Institute of Mexico in 2007. His research is related to software prediction techniques, software processes and statistics applied to software engineering. He is a member of the Mexican National System of Researchers. Dr. Lopez-Martín has more than 15 years of industry and higher education experience in information systems development and software engineering. Web page: http://dti.cucea.udg.mx/?q=directorio/dr-cuauht-moc-l-pez-mart-n.

    Alain Abran is a professor and the director of the Software Engineering Research Laboratory at the École de Technologie Supérieure (ETS) – Université du Québec. He is currently a co-executive editor of the Guide to the Software Engineering Body of Knowledge project. He is also actively involved in international software engineering standards and is a co-chair of the Common Software Metrics International Consortium (COSMIC).  Dr. Abran has more than 20 years of industry experience in information systems development and software engineering. The maintenance measurement program he developed and implemented at Montreal Trust, Canada, received one of the 1993 best of the best awards from the Quality Assurance Institute. Web page: http://profs.etsmtl.ca/aabran/English/index.html.

    View full text