Skip to main content
Log in

Assessing the Predictive Utility of Logistic Regression, Classification and Regression Tree, Chi-Squared Automatic Interaction Detection, and Neural Network Models in Predicting Inmate Misconduct

  • Published:
American Journal of Criminal Justice Aims and scope Submit manuscript

Abstract

This study assesses the relative utility of a traditional regression approach - logistic regression (LR) - and three classification techniques - classification and regression tree (CART), chi-squared automatic interaction detection (CHAID), and multi-layer perceptron neural network (MLPNN)—in predicting inmate misconduct. The four models were tested using a sample of inmates held in state and federal prisons and predictors derived from the importation model on inmate adaptation. Multi-validation procedure and multiple evaluation indicators were used to evaluate and report the predictive accuracy. The overall accuracy of the four models varied between 0.60 and 0.66 with an overall AUC range of 0.60–0.70. The LR and MLPNN methods performed significantly better than the CART and CHAID techniques at identifying misbehaving inmates and the CHAID method outperformed the CART approach in classifying defied inmates. The MLPNN method performed significantly better than the LR technique in predicting inmate misconduct among the training samples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. Clinical approaches to risk assessment can be further dichotomized into unstructured and structured clinical judgment. With unstructured clinical judgment, a clinician relies solely on his/her professional experience for accuracy in predicting an individual’s risk. With structured clinical judgment, the clinician utilizes empirically-based risk factors to guide his/her prediction of an individual’s risk (for further descriptions of these two types of risk assessment methods, see Aegisdottir et al., 2006; Hanson, 2005; Singh and Fazel, 2010; Singh, Grann, and Fazel, 2011).

  2. A false positive is defined as a positive result on a diagnostic test for a condition in an individual who actually does not have that condition and a false negative is defined as a negative result on a diagnostic test for a condition in an individual who actually does have that condition.

  3. A classification tree starts with the top decision branch, sometimes called the root or parent node, and the top branch is split into subsequent branches known as child nodes Terminal nodes are branches on the tree beyond which no further decisions are made.

  4. The three measures of impurity generally used for classification problems are the Gini measure, the generalized Chi-square measure, and the generalized G-square measure. The Chi-square measure is similar to the standard Chi-square value computed for the expected and observed classifications, and the G-square measure is similar to the maximum-likelihood Chi-square. The Gini measure is the index most often used for measuring purity in the context of classification problems and the method advocated by the developers of CART (Breiman et al., 1984).

  5. The size of a tree in the classification and regression trees analysis is an important issue since an unreasonably big tree can only make the interpretation of results more difficult. There are two recommended strategies for selecting the “right-sized” tree. One strategy is to grow the tree to just the right size, where the right size is determined by the researcher, based on the knowledge from previous research, diagnostic information from previous analyses, or even intuition. The other strategy is to use a set of well-documented, structured procedures developed by Breiman et al. (1984) for selecting the “right-sized” tree.

  6. A classification tree model encompasses at least two samples - training and testing—and the training sample is used to build the model and the testing sample is employed to validate its performance.

  7. For a detailed description of the different neural network models see Abdi et al. (1999), Carpenter and Grossberg (1991), Gurney (1997), and Rumelhart and McClelland (1988).

  8. Our review of prior research only includes studies that explicitly compare conventional regression models with classification tree models and/or neural network models. There were prior studies that attempt to validate the actuarial model developed in the MacArthur Violence Risk Assessment Study or examine the development of the Classification of Violence Risk (COVR) software and these studies are not included in our review of prior literature.

  9. The four leading theoretical perspectives on inmates’ adjustment to prison are the deprivation, importation, situational, and administrative control models (Clemmer, 1940; Irwin and Cressey, 1962; DiIulio 1987; Sykes, 1958; Steinke, 1991). In this paper, we chose to focus on the importation model and defer the examination of the other models in subsequent papers since including measures from all four models would prove too cumbersome.

  10. Since reviews of the importation model are readily available elsewhere (see for example, Byrne and Hummer, 2007; Cao, Zhao, and Dine 1997; Goodstein and Wright, 1989; Paterline and Petersen, 1999; Wooldredge, 1991; Wright, 1991) and due to page limitation, we forego a thorough review of the literature on this perspective.

  11. The combinations of the five sub-datasets are as followed with the letter on the left side represents the testing sample and the letters in the right side represent the training sample: Sample 1 = A/BCDE; Sample 2 = B/CDEA; Sample 3 = C/DEAB; Sample 4 = D/EABC; and Sample 5 = E/ABCD.

  12. We elected to employ a general measure of any prison misconduct because prior comparative research has utilized similar measures such as delinquency, recidivism, or violence (see for example, Caulkins et al., 1996; Rosenfeld and Lewis, 2005; Thomas et al., 2005).

  13. It is noteworthy that prior research on the importation model tends to involve selective coding of the variable race. Some studies compare Black inmates with non-Black inmates, other studies compare White inmates with non-White inmates, and some studies even encompass several dichotomous measures of race (i.e., Black vs. other racial groups, White vs. other racial groups, Hispanic vs. other racial groups, etc.). We elected to code our race variable as Black vs. Non-Black because the importation model emphasizes the effect of pre-prison characteristics on prison adjustment and there is evidence that outside prison, Blacks have higher crime rates than Whites and other racial groups (see for example, Snyder, 2011).

  14. The true positive rate = (the # of true positives)/(the # of all positives)

  15. The false positive rate = 1—[(the # of true negatives)/(the # of all negatives)]

  16. Sensitivity = (the # of true positives)/(the # of all positives)

  17. Specificity = (the # of true negatives)/(the # of all negatives)

  18. Overall Accuracy = (the # of true positives and true negatives)/(the # of all positives and all negatives)

  19. We selected 0.50 as the cut-off probability based on the fact that between 52–54 % of the inmates in the five sub-samples were found guilty of breaking any rules.

  20. Given that the overall accuracy is a proportion, we constructed the confidence intervals using standard methods for proportions (see for example, Gardner and Altman, 1989).

  21. Shrinkage or over-fitting occurs when a statistical model demonstrates poor predictive performance or when the predictive accuracy of a model decreases from the training sample to the testing sample.

  22. Total value = [(8,000 X AUC of the training sample + 2000 X AUC the testing sample)/10,000].

  23. We also performed Analysis of Variance (ANOVA) test for testing the differences in the means of classification accuracy and conduct pairwise t-test for the various pairs formed between the four classification methods. We found the MLPNN and the LR techniques performed significantly better than the CART and CHAID methods in predicting inmate misconduct (p-value < 0.001), the CHAID technique outperformed the CART method (p-value < 0.001) in classifying disobeyed inmates, and the MLPNN approach performed significantly better than the LR method (p-value < 0.01) in predicting inmate misconduct (results are not shown but available upon request).

  24. To the best of our knowledge, to date, only one study has investigated the predictive utility between CHAID and LR (see Steadman et al., 2000).

  25. One issue inherent in all NN models is model transparency. Unlike the LR method, it is not possible to determine which variables contribute mainly to a particular output in a NN model (for further discussion on the issue of model transparency, see Bigi et al., 2005; Grann and Langstrom, 2007; Guerriere and Detsky, 1991; Ning et al., 2006)

References

  • Abdi, H., Valendin, D., & Edelman, B. (1999). Neural networks (Vol. 124). Newbury Park: Sage.

    Google Scholar 

  • Aegisdottir, S., White, M. J., Spengler, P. M., Maugherman, A. S., Anderson, L. A., Cook, R. S., et al. (2006). The meta-analysis of clinical judgment project: fifty-six years of accumulated research on clinical versus statistical prediction. The Counseling Psychologist, 34, 341–382.

    Article  Google Scholar 

  • Aldrich, J. H., & Nelson, F. D. (1984). Linear probability, logit, and probit models (Vol. 45). Newbury Park: Sage.

    Google Scholar 

  • Berk, R. A., & Bleich, J. (2013). Statistical procedures for forecasting criminal behavior: a comparative assessment. Criminology and Public Policy, 12, 513–544.

    Article  Google Scholar 

  • Bigi, R., Gregori, D., Cortigiani, L., Desideri, A., Chiarotto, F. A., & Toffolo, G. M. (2005). Artificial neural networks and robust Bayesian classifiers for risk stratification following uncomplicated myocardial infartion. International Journal of Cardiology, 101, 481–487.

    Article  Google Scholar 

  • Bishop, C. (1995). Neural networks for pattern recognition. New York: Oxford University Press.

    Google Scholar 

  • Bonta, J. (1996). Risk-needs assessment and treatment. In A. T. Harland (Ed.), Choosing correctional options that work: defining the demand and evaluating the supply (pp. 18–22). Thousand Oaks: Sage.

    Google Scholar 

  • Breiman, L. (2001). Decision tree forest. Machine Learning, 45, 5–32.

    Article  Google Scholar 

  • Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. Monterey: Wadsworth and Brooks/Cole.

    Google Scholar 

  • Brodzinski, J. D., Crable, E. A., & Scherer, R. F. (1994). Using artificial intelligence to model juvenile recidivism patterns. Computers in Human Services, 10, 1–18.

    Article  Google Scholar 

  • Bushway, S. D. (2013). Is there any logic to using logit: finding the right tool for the increasingly important job of risk prediction. Criminology and Public Policy, 12, 563–567.

    Article  Google Scholar 

  • Byrne, J. M., & Hummer, D. (2007). Myths and realities of prison violence: a review of the evidence. Victims and Offenders An International Journal of Evidence-based Research Policy and Practice, 2, 77–90.

    Google Scholar 

  • Cao, L., Zhao, J., & Van Dine, S. (1997). Prison disciplinary tickets: a test of the deprivation and importation models. Journal of Crime and Justice, 25, 103–113.

    Article  Google Scholar 

  • Carpenter, & Grossberg. (1991). Causal attributions in expert parole decisions. Journal of Personality and Social Psychology, 36, 1501–1511.

    Google Scholar 

  • Caulkins, J., Cohen, J., Gorr, W., & Wei, J. (1996). Predicting criminal recidivism: a comparison of neural network models with statistical methods. Journal of Crime and Justice, 24, 227–240.

    Article  Google Scholar 

  • Clemmer, D. (1940). The prison community. Boston: Christopher.

    Google Scholar 

  • Coid, J., Yang, M., & Ullrich, S., et al. (2007). Predicting and understanding risk of reoffending: The prisoner cohort study. Research Summary, Ministry of Justice 6.

  • Dawes, R. M., Faust, D., & Meehl, P. E. (1989). Clinical versus actuarial judgment. Science, 243, 1668–1674.

    Article  Google Scholar 

  • Dhami, M. K., Ayton, P., & Lowenstein, G. (2007). Adaption to imprisonment: indigenous or imported? Criminal Justice and Behavior, 34, 1085–1100.

    Article  Google Scholar 

  • DiIulio, J. J., Jr. (1987). Governing prisons: a comparative study of correctional management. New York: Free Press.

    Google Scholar 

  • Egan, J. P. (1975). Signal detection theory and ROC analysis. New York: Academic.

    Google Scholar 

  • Florio, T., Einfeld, S., & Levy, F. (1994). Neural network and psychiatry: candidate applications in clinical decision making. Australian and New Zealander Psychiatry, 28, 651–666.

    Article  Google Scholar 

  • Friedman, J. H. (1999). Stochastic gradient boosting. Stanford: Stanford University.

    Google Scholar 

  • Gardner, M. J., & Altman, D. G. (1989). Estimating with confidence. In M. J. Gardner & D. G. Altman (Eds.), Statistics with confidence (pp. 6–19). London: British Medical Journal.

    Google Scholar 

  • Gardner, W., Lidz, C. W., Mulvey, E. P., & Shaw, E. C. (1996). A comparison of actuarial methods for identifying repetitively violent patients with mental illnesses. Law and Human Behavior, 20, 35–48.

    Article  Google Scholar 

  • Gendreau, P., Goggin, C. E., & Law, M. A. (1997). Predicting prison misconducts. Criminal Justice and Behavior, 24, 414–431.

    Article  Google Scholar 

  • Gendreau, P., Goggin, C. E., & Smith, P. (2002). Is the PCL-R really the “unparelleled” measure of offender-risk? A lesson in knowledge accumulation. Criminal Justice and Behavior, 29, 397–426.

    Article  Google Scholar 

  • Glover, A., Nicholson, D., Hemmati, T., Bernfeld, G., & Quinsey, V. (2002). A comparison of predictors of general and violent recidivism among high risk federal offenders. Criminal Justice & Behavior, 29, 235–249.

  • Goodstein, L., & Wright, K. N. (1989). Inmate adjustment to prison. In L. Goodstein & D. L. MacKenzie (Eds.), The American prison: issues in research and policy (pp. 229–251). NY: Plenum.

    Chapter  Google Scholar 

  • Gottfredson, S. D., & Gottfredson, D. M. (1986). Accuracy of prediction models. In A. Blumstein, J. Cohen, J. Roth, & C. A. Visher (Eds.), Criminal careers and “Career Criminals” (pp. 212–290). Washington: National Academy of Sciences Press.

    Google Scholar 

  • Gottfredson, S. D., & Moriarty, L. J. (2006). Statistical risk assessment: old problems and new applications. Crime and Delinquency, 52, 178–200.

    Article  Google Scholar 

  • Grann, M., & Langstrom, N. (2007). Actuarial assessment of violence risk: to weigh or not to weigh? Criminal Justice and Behavior, 34, 22–36.

    Article  Google Scholar 

  • Grove, W. M., & Meehl, P. E. (1996). Comparative efficiency of informal (subjective, impressionistic) and formal (mechanical, algorithm) prediction procedures: the clinical-statistical controversy. Psychology, Public Policy, and Law, 2, 293–323.

    Article  Google Scholar 

  • Guerriere, M. R., & Detsky, A. S. (1991). Neural networks: what are they? Annals of Internal Medicine, 115, 906–907.

    Article  Google Scholar 

  • Gurney, K. (1997). An Introduction to neural networks. New York: UCL Press.

    Book  Google Scholar 

  • Hanson, R. K. (2005). Twenty years of progress in violence risk assessment. Journal of Interpersonal Violence, 20, 212–217.

    Article  Google Scholar 

  • Hanson, R.K. & Morton-Bourgon, K.E (2007) The accuracy of recidivism risk assessments for sexual offenders: A meta-analysis. Public Safety and Emergency Preparedness Canada.

  • Hanson, R. K., & Morton-Bourgon, K. E. (2009). The accuracy of recidivism risk assessments for sexual offenders: a meta-analysis of 118 prediction studies. Psychological Assessment, 21, 1–21.

    Article  Google Scholar 

  • Harer, M. D., & Steffensmeier, D. J. (1996). Race and prison violence. Criminology, 34, 323–355.

    Article  Google Scholar 

  • Hill, T., & Lewicki, P. (2006). Statistics, methods and application: a comprehensive reference for science, industry, and data mining. Tulsa: StatSoft, Inc.

    Google Scholar 

  • Hilton, N. Z., Harris, G. T., & Rice, M. E. (2006). Sixty-six years of research on the clinical versus actuarial prediction of violence. The Counseling Psychologist, 34, 400–409.

    Article  Google Scholar 

  • Hosmer, D. W., & Lemeshow, S. (1989). Applied logistic regression. New York: Wiley.

    Google Scholar 

  • Irwin, J. K. (1981). Sociological studies of the impact of long term confinement. In D. A. Ward & K. F. Schoen (Eds.), Confinement in maximum custody (pp. 33–68). Lexington: D.C. Health.

    Google Scholar 

  • Irwin, J. K., & Cressey, D. (1962). Thieves, convicts, and the inmate culture. Social Problems, 10, 142–155.

    Article  Google Scholar 

  • Jiang, S., & Fisher-Giorlando, M. (2002). Inmate misconduct: a test of the deprivation, importation, and situational models. The Prison Journal, 82, 335–358.

    Article  Google Scholar 

  • Jones, P. R. (1996). Risk prediction in criminal justice. In A. T. Harland (Ed.), Choosing correctional options that work: defining the demand and evaluating the supply (pp. 33–68). Thousand Oaks: Sage.

    Google Scholar 

  • Kass, G. V. (1980). An exploratory technique for investigating large quantities of categorical data. Applied Statistics, 29, 119–127.

    Article  Google Scholar 

  • Kroner, D. G., & Mills, J. F. (2001). The accuracy of five appraisal risk instruments in predicting institutional misconduct and new convictions. Criminal Justice and Behavior, 28, 471–489.

    Article  Google Scholar 

  • Liu, Y. Y., Yang, M., Ramsay, M., Li, X. S., & Coid, J. W. (2011). A comparison of logistic regression, classification and regression tree, and neural networks models in predicting violent re-offending. Journal of Quantitative Criminology, 27, 547–573.

    Article  Google Scholar 

  • Loh, W. Y., & Shih, Y. S. (1997). Split selection methods for classification trees. Statistica Sinica, 7, 815–840.

    Google Scholar 

  • Menzies, R., Webster, S. D., McMain, S., Staley, S., & Scaglione, R. (1994). The dimensions of dangerousness revisited. Law and Human Behavior, 18, 1–28.

    Article  Google Scholar 

  • Minsky, M., & Papert, S. (1969). Perceptrons. Cambridge: MIT Press.

    Google Scholar 

  • Mossman, D. (1994). Assessing prediction of violence: being accurate about accuracy. Journal of Consulting and Clinical Psychology, 62, 783–792.

    Article  Google Scholar 

  • Neuilly, M., Zgoba, K. M., Tita, G. E., & Lee, S. S. (2011). Predicting recidivism in homicide offenders using classification tree analysis. Homicide Studies, 15, 154–176.

  • Ning, G. M., Su, J., Li, Y. Q., Wang, X. Y., Li, C. H., & Yan, W. M. (2006). Artificial neural network base model for cardiovascular risk stratification in hypertension. Medical and Biological Engineering and Computing, 44, 202–208.

    Article  Google Scholar 

  • Palocsay, S. W., Wang, P., & Brookshire, R. G. (2000). Predicting criminal recidivism using neural networks. Socio-Economic Planning Sciences, 34, 271–284.

    Article  Google Scholar 

  • Paterline, B. A., & Petersen, D. M. (1999). Structural and social psychological determinants of prisonization. Journal of Crime and Justice, 27, 427–441.

    Article  Google Scholar 

  • Perlich, C., Provost, F., & Simonof, J. (2003). Tree induction vs. logistic regression: a learning curve analysis. Journal of Machine Learning Research, 4, 211–255.

    Google Scholar 

  • Price, R. K., Spitznagel, E. L., Downey, T. J., Meyer, D. J., Risk, N. K., & El-Ghazzawy, O. G. (2000). Applying artificial neural network models to clinical decision making. Psychological Assessment, 12, 40–51.

    Article  Google Scholar 

  • Rice, M. E., & Harris, G. T. (1995). Violent recidivism: assessing predictive validity. Journal of Consulting and Clinical Psychology, 63, 737–748.

    Article  Google Scholar 

  • Ridgeway, G. (2013). Linking prediction and prevention. Criminology and Public Policy, 12, 545–550.

    Article  Google Scholar 

  • Ripley, B. D. (1996). Pattern recognition and neural networks. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Rokach, L., & Maimon, O. (2008). Data mining with decision trees: theory and application. Hackensack: World Scientific Publishing.

    Google Scholar 

  • Rosenfeld, B., & Lewis, C. (2005). Assessing violent risk in stalking cases: a regression tree approach. Law and Human Behavior, 29, 343–357.

    Article  Google Scholar 

  • Rumelhart, D. E., & McClelland, J. L. (1986). Parallel distributed processing (Vol. 1). Cambridge: MIT Press.

    Google Scholar 

  • Rumelhart, D. E., & McClelland, J. L. (1988). Parallel distributed processing (Vol. 1 and 2). Cambridge: MIT Press.

    Google Scholar 

  • Seymour, J. (1977). Niches in prisons. In H. Toch (Ed.), Living in prison: the ecology of survival (pp. 18–22). New York: Free Press.

    Google Scholar 

  • Silver, E., Smith, W. R., & Banks, S. (2000). Constructing actuarial devices for predicting recidivism: a comparison of methods. Criminal Justice and Behavior, 27, 733–764.

    Article  Google Scholar 

  • Singh, J. P., & Fazel, S. (2010). Forensic risk assessment: a metareview. Criminal Justice and Behavior, 37, 965–988.

    Article  Google Scholar 

  • Singh, J. P., Grann, M., & Fazel, S. (2011). A comparative study of violence risk assessment tools: a systematic review and metaregression analysis of 68 studies and 25, 980 participants. Clinical Psychology Review. doi:doi:10.1016/j.cpr.2010.11.009.

    Google Scholar 

  • Snyder, H. N. (2011). Patterns & Trends: Arrests in the United States, 1980–2009. Bureau of Justice Statistics: U.S. Department of Justice.

  • Sorensen, J., Wrinkle, R., & Gutierrez, A. (1998). Patterns of rule-violating behaviors and adjustment to incarceration among murderers. The Prison Journal, 78, 222–231.

    Article  Google Scholar 

  • Stalans, L. J., Yarnold, P. R., Seng, M., Olson, D. E., & Repp, M. (2004). Identifying three types of violent offenders and predicting violent recidivism while on probation: a classification tree analysis. Law and Human Behavior, 28, 253–271.

    Article  Google Scholar 

  • StatSoft Inc (2008) Data mining, predictive analytics, statistics, StatSoft electronic textbook. http://www.statsoft.com/textbook/.

  • Steadman, H. J., Silver, E., Monahan, J., Appelbaum, P. S., Robbins, P. C., & Mulvey, E. P. (2000). A classification tree approach to the development of actuarial violence risk assessment tools. Law and Human Behavior, 24, 83–100.

    Article  Google Scholar 

  • Steinke, P. (1991). Using situational factors to predict types of prison violence. Journal of Offender Rehabilitation, 17, 119–132.

    Article  Google Scholar 

  • Sykes, G. M. (1958). The society of captives. Princeton: Princeton University Press.

    Google Scholar 

  • Thomas, S., & Leese, M. (2003). A green-fingered approach can improve the clinical utility of violence risk assessment tools. Criminal Behavior and Mental Health, 13, 153–158.

    Article  Google Scholar 

  • Thomas, S., Leese, M., Walsh, E., McCrone, P., Moran, P., & Burns, T. (2005). A comparison of statistical methods in predicting violence in psychotic illness. Comprehensive Psychiatry, 46, 296–303.

    Article  Google Scholar 

  • Toch, H. (1977). Living in prison: the ecology of survival. New York: Free Press.

    Google Scholar 

  • Toch, H., & Adams, K. (1986). Pathology and disruptiveness among prison inmates. Journal of Research in Crime and Delinquency, 23, 7–21.

    Article  Google Scholar 

  • Tu, J. V. (1996). Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes. Journal of Clinical Epidemiology, 9, 1225–1231.

    Article  Google Scholar 

  • Wallace, J. M., & Bachman, J. G. (1991). Explaining racial/ethnic differences in adolescent drug use: the impact of background and lifestyles. Social Forces, 38, 333–354.

    Google Scholar 

  • White, H. (1989). Some asymptotic results for learning in single hidden-layer feedforward network models. Journal of the American Statistical Society, 84, 1003–1013.

    Article  Google Scholar 

  • Wooldredge, J. D. (1991). Correlates of deviant behavior among inmates of U.S. correctional facilities. Journal of Crime and Justice, 14, 1–25.

    Article  Google Scholar 

  • Wright, K. N. (1991). A study of individual, environmental, and interactive effects in explaining adjustment to prison. Justice Quarterly, 8, 217–242.

    Article  Google Scholar 

  • Yan, L., Dodier, R., Mozer, M.C., & Wolniewicz, R (2003) Optimizing classifier performance via the Wilcoxon-Mann–Whitney statistics. Proceedings of the International Conference on Machine Learning.

  • Yang, M., Liu, Y.Y. & Coid, J.W (2010) Applying neural networks and classification tree models to the classification of serious offenders and the prediction of recidivism. Research summary, Ministry of Justice, UK, available online at www.justice.gov.uk/publications/research.htm.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fawn T. Ngo.

Appendices

Appendix 1

Fig. 1
figure 1

A one-factor decision tree diagrama

Decision tree diagrams are a common way to visually display classification schemes. A decision tree consists of nodes which further split or into two or more branches, creating more nodes. The diagram starts with a root node, which is split into two or more nodes based on some splitting rule. The splitting rule is based on the values of a certain variable. The node that splits into multiple nodes is called the parent node and the split nodes are called child nodes. The child nodes, in turn become parent nodes when they are split based on another splitting rule. When a node does not split any further, we call that node a leaf node or a terminal node. A branch ends with a terminal node. The terminal node shows the probability of the class in which a case belongs.

The decision tree diagram shown above displays the probability of an offender recidivating based on marital status. In this example, the root node shows the percentage of cases involving “recidivism”. The root node is split into two branches based on the value of the variable “marital status”. The first child node includes all the cases with a marital status of single and the second node includes all the cases with a marital status of married. The corresponding data indicate that from the sample of 100 offenders, 70 recidivated and 30 did not. Further, among the 75 offenders who were single, 68 recidivated and among the 25 offenders who were married, only 2 recidivated. Accordingly, the probability of an offender who is single recidivating is 91 % and the probability of an offender who is married recidivating is 8 %.

Appendix 2

Fig. 2
figure 2

A Simple Multilayer Feed Forward Neural Network Architecture with Backpropagation

The simple multilayer neural network architecture shown above has three layers: input, output and hidden. Each layer consists of a number of processing elements (PEs or neurons). In feed-forward neural network architectures, information is input into each PE, processed, and then passed on to each PE in the layer above. In the case of the output PE, information is simply passed out of the network.

Each PE in the input layer corresponds to a feature or characteristic that the researcher is interested in using as an independent variable. The goal of the network is to map the input units to a desired output similar to the way in which the dependent variable is a function of the independent variables in regression analysis.

The PEs are interlinked by a set of connections which are characterized by weights. In feed-forward networks with backpropagation, the networks “learn” to map the input units to the output units by adjusting the weights on the connections in response to error signals transmitted back through the network. The difference between the output of the network and the target mapping constitutes the error signal. The error signal is propagated back through the network via the PEs and their connections and the weights are updated. This process continues until the sum of all error signals is minimized.

Appendix 3

Fig. 3
figure 3

The ROC curve

A Receiver operating characteristic (ROC) curve is a graphical plot which illustrates the performance of a binary classifier system as its discriminant threshold is varied. It is created by plotting the fraction of true positives out of all the positives (Sensitivity) on the y-axis and false positives out of the negatives (1-specificity) on the x-axis at various threshold settings. The location of a point in the ROC space depicts the classification accuracy of a classification instrument. For example, the point at coordinate of (0,1) indicates that the classification instrument has a sensitivity of 100 % and specificity of 100 % (i.e., perfect classification). Classification instruments with 50 % sensitivity and 50 % specificity can be visualized on the diagonal determined by coordinate (0,0) and coordinate (1,0) and a point predicted by a classification instrument that falls into the area above the diagonal represents a good prediction. Conversely, a point predicted by a classification method that falls into the area below the diagonal represents a bad prediction. Theoretically, a random guess would give a point on the diagonal.

The ROC curve depicts the tradeoff between the true positive rate (TPR) and false positive rate (FPR) for different cut-points of a classification instrument. The interpretation of the ROC curve is similar to the single point in the ROC space in that the closer the points on the ROC curve to the ideal coordinate (0,1) the more accurate the classification instrument is. On the other hand, the closer the points on the ROC curve to the diagonal, the less accurate the classification instrument is. In the above diagram, a typical ROC curve looks like the curved line and the area under that curve is called the AUC under ROC. Higher the AUC values, better the classifier.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ngo, F.T., Govindu, R. & Agarwal, A. Assessing the Predictive Utility of Logistic Regression, Classification and Regression Tree, Chi-Squared Automatic Interaction Detection, and Neural Network Models in Predicting Inmate Misconduct. Am J Crim Just 40, 47–74 (2015). https://doi.org/10.1007/s12103-014-9246-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12103-014-9246-6

Keywords

Navigation