Abstract
This study assesses the relative utility of a traditional regression approach - logistic regression (LR) - and three classification techniques - classification and regression tree (CART), chi-squared automatic interaction detection (CHAID), and multi-layer perceptron neural network (MLPNN)—in predicting inmate misconduct. The four models were tested using a sample of inmates held in state and federal prisons and predictors derived from the importation model on inmate adaptation. Multi-validation procedure and multiple evaluation indicators were used to evaluate and report the predictive accuracy. The overall accuracy of the four models varied between 0.60 and 0.66 with an overall AUC range of 0.60–0.70. The LR and MLPNN methods performed significantly better than the CART and CHAID techniques at identifying misbehaving inmates and the CHAID method outperformed the CART approach in classifying defied inmates. The MLPNN method performed significantly better than the LR technique in predicting inmate misconduct among the training samples.
Similar content being viewed by others
Notes
Clinical approaches to risk assessment can be further dichotomized into unstructured and structured clinical judgment. With unstructured clinical judgment, a clinician relies solely on his/her professional experience for accuracy in predicting an individual’s risk. With structured clinical judgment, the clinician utilizes empirically-based risk factors to guide his/her prediction of an individual’s risk (for further descriptions of these two types of risk assessment methods, see Aegisdottir et al., 2006; Hanson, 2005; Singh and Fazel, 2010; Singh, Grann, and Fazel, 2011).
A false positive is defined as a positive result on a diagnostic test for a condition in an individual who actually does not have that condition and a false negative is defined as a negative result on a diagnostic test for a condition in an individual who actually does have that condition.
A classification tree starts with the top decision branch, sometimes called the root or parent node, and the top branch is split into subsequent branches known as child nodes Terminal nodes are branches on the tree beyond which no further decisions are made.
The three measures of impurity generally used for classification problems are the Gini measure, the generalized Chi-square measure, and the generalized G-square measure. The Chi-square measure is similar to the standard Chi-square value computed for the expected and observed classifications, and the G-square measure is similar to the maximum-likelihood Chi-square. The Gini measure is the index most often used for measuring purity in the context of classification problems and the method advocated by the developers of CART (Breiman et al., 1984).
The size of a tree in the classification and regression trees analysis is an important issue since an unreasonably big tree can only make the interpretation of results more difficult. There are two recommended strategies for selecting the “right-sized” tree. One strategy is to grow the tree to just the right size, where the right size is determined by the researcher, based on the knowledge from previous research, diagnostic information from previous analyses, or even intuition. The other strategy is to use a set of well-documented, structured procedures developed by Breiman et al. (1984) for selecting the “right-sized” tree.
A classification tree model encompasses at least two samples - training and testing—and the training sample is used to build the model and the testing sample is employed to validate its performance.
Our review of prior research only includes studies that explicitly compare conventional regression models with classification tree models and/or neural network models. There were prior studies that attempt to validate the actuarial model developed in the MacArthur Violence Risk Assessment Study or examine the development of the Classification of Violence Risk (COVR) software and these studies are not included in our review of prior literature.
The four leading theoretical perspectives on inmates’ adjustment to prison are the deprivation, importation, situational, and administrative control models (Clemmer, 1940; Irwin and Cressey, 1962; DiIulio 1987; Sykes, 1958; Steinke, 1991). In this paper, we chose to focus on the importation model and defer the examination of the other models in subsequent papers since including measures from all four models would prove too cumbersome.
Since reviews of the importation model are readily available elsewhere (see for example, Byrne and Hummer, 2007; Cao, Zhao, and Dine 1997; Goodstein and Wright, 1989; Paterline and Petersen, 1999; Wooldredge, 1991; Wright, 1991) and due to page limitation, we forego a thorough review of the literature on this perspective.
The combinations of the five sub-datasets are as followed with the letter on the left side represents the testing sample and the letters in the right side represent the training sample: Sample 1 = A/BCDE; Sample 2 = B/CDEA; Sample 3 = C/DEAB; Sample 4 = D/EABC; and Sample 5 = E/ABCD.
It is noteworthy that prior research on the importation model tends to involve selective coding of the variable race. Some studies compare Black inmates with non-Black inmates, other studies compare White inmates with non-White inmates, and some studies even encompass several dichotomous measures of race (i.e., Black vs. other racial groups, White vs. other racial groups, Hispanic vs. other racial groups, etc.). We elected to code our race variable as Black vs. Non-Black because the importation model emphasizes the effect of pre-prison characteristics on prison adjustment and there is evidence that outside prison, Blacks have higher crime rates than Whites and other racial groups (see for example, Snyder, 2011).
The true positive rate = (the # of true positives)/(the # of all positives)
The false positive rate = 1—[(the # of true negatives)/(the # of all negatives)]
Sensitivity = (the # of true positives)/(the # of all positives)
Specificity = (the # of true negatives)/(the # of all negatives)
Overall Accuracy = (the # of true positives and true negatives)/(the # of all positives and all negatives)
We selected 0.50 as the cut-off probability based on the fact that between 52–54 % of the inmates in the five sub-samples were found guilty of breaking any rules.
Given that the overall accuracy is a proportion, we constructed the confidence intervals using standard methods for proportions (see for example, Gardner and Altman, 1989).
Shrinkage or over-fitting occurs when a statistical model demonstrates poor predictive performance or when the predictive accuracy of a model decreases from the training sample to the testing sample.
Total value = [(8,000 X AUC of the training sample + 2000 X AUC the testing sample)/10,000].
We also performed Analysis of Variance (ANOVA) test for testing the differences in the means of classification accuracy and conduct pairwise t-test for the various pairs formed between the four classification methods. We found the MLPNN and the LR techniques performed significantly better than the CART and CHAID methods in predicting inmate misconduct (p-value < 0.001), the CHAID technique outperformed the CART method (p-value < 0.001) in classifying disobeyed inmates, and the MLPNN approach performed significantly better than the LR method (p-value < 0.01) in predicting inmate misconduct (results are not shown but available upon request).
To the best of our knowledge, to date, only one study has investigated the predictive utility between CHAID and LR (see Steadman et al., 2000).
One issue inherent in all NN models is model transparency. Unlike the LR method, it is not possible to determine which variables contribute mainly to a particular output in a NN model (for further discussion on the issue of model transparency, see Bigi et al., 2005; Grann and Langstrom, 2007; Guerriere and Detsky, 1991; Ning et al., 2006)
References
Abdi, H., Valendin, D., & Edelman, B. (1999). Neural networks (Vol. 124). Newbury Park: Sage.
Aegisdottir, S., White, M. J., Spengler, P. M., Maugherman, A. S., Anderson, L. A., Cook, R. S., et al. (2006). The meta-analysis of clinical judgment project: fifty-six years of accumulated research on clinical versus statistical prediction. The Counseling Psychologist, 34, 341–382.
Aldrich, J. H., & Nelson, F. D. (1984). Linear probability, logit, and probit models (Vol. 45). Newbury Park: Sage.
Berk, R. A., & Bleich, J. (2013). Statistical procedures for forecasting criminal behavior: a comparative assessment. Criminology and Public Policy, 12, 513–544.
Bigi, R., Gregori, D., Cortigiani, L., Desideri, A., Chiarotto, F. A., & Toffolo, G. M. (2005). Artificial neural networks and robust Bayesian classifiers for risk stratification following uncomplicated myocardial infartion. International Journal of Cardiology, 101, 481–487.
Bishop, C. (1995). Neural networks for pattern recognition. New York: Oxford University Press.
Bonta, J. (1996). Risk-needs assessment and treatment. In A. T. Harland (Ed.), Choosing correctional options that work: defining the demand and evaluating the supply (pp. 18–22). Thousand Oaks: Sage.
Breiman, L. (2001). Decision tree forest. Machine Learning, 45, 5–32.
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. Monterey: Wadsworth and Brooks/Cole.
Brodzinski, J. D., Crable, E. A., & Scherer, R. F. (1994). Using artificial intelligence to model juvenile recidivism patterns. Computers in Human Services, 10, 1–18.
Bushway, S. D. (2013). Is there any logic to using logit: finding the right tool for the increasingly important job of risk prediction. Criminology and Public Policy, 12, 563–567.
Byrne, J. M., & Hummer, D. (2007). Myths and realities of prison violence: a review of the evidence. Victims and Offenders An International Journal of Evidence-based Research Policy and Practice, 2, 77–90.
Cao, L., Zhao, J., & Van Dine, S. (1997). Prison disciplinary tickets: a test of the deprivation and importation models. Journal of Crime and Justice, 25, 103–113.
Carpenter, & Grossberg. (1991). Causal attributions in expert parole decisions. Journal of Personality and Social Psychology, 36, 1501–1511.
Caulkins, J., Cohen, J., Gorr, W., & Wei, J. (1996). Predicting criminal recidivism: a comparison of neural network models with statistical methods. Journal of Crime and Justice, 24, 227–240.
Clemmer, D. (1940). The prison community. Boston: Christopher.
Coid, J., Yang, M., & Ullrich, S., et al. (2007). Predicting and understanding risk of reoffending: The prisoner cohort study. Research Summary, Ministry of Justice 6.
Dawes, R. M., Faust, D., & Meehl, P. E. (1989). Clinical versus actuarial judgment. Science, 243, 1668–1674.
Dhami, M. K., Ayton, P., & Lowenstein, G. (2007). Adaption to imprisonment: indigenous or imported? Criminal Justice and Behavior, 34, 1085–1100.
DiIulio, J. J., Jr. (1987). Governing prisons: a comparative study of correctional management. New York: Free Press.
Egan, J. P. (1975). Signal detection theory and ROC analysis. New York: Academic.
Florio, T., Einfeld, S., & Levy, F. (1994). Neural network and psychiatry: candidate applications in clinical decision making. Australian and New Zealander Psychiatry, 28, 651–666.
Friedman, J. H. (1999). Stochastic gradient boosting. Stanford: Stanford University.
Gardner, M. J., & Altman, D. G. (1989). Estimating with confidence. In M. J. Gardner & D. G. Altman (Eds.), Statistics with confidence (pp. 6–19). London: British Medical Journal.
Gardner, W., Lidz, C. W., Mulvey, E. P., & Shaw, E. C. (1996). A comparison of actuarial methods for identifying repetitively violent patients with mental illnesses. Law and Human Behavior, 20, 35–48.
Gendreau, P., Goggin, C. E., & Law, M. A. (1997). Predicting prison misconducts. Criminal Justice and Behavior, 24, 414–431.
Gendreau, P., Goggin, C. E., & Smith, P. (2002). Is the PCL-R really the “unparelleled” measure of offender-risk? A lesson in knowledge accumulation. Criminal Justice and Behavior, 29, 397–426.
Glover, A., Nicholson, D., Hemmati, T., Bernfeld, G., & Quinsey, V. (2002). A comparison of predictors of general and violent recidivism among high risk federal offenders. Criminal Justice & Behavior, 29, 235–249.
Goodstein, L., & Wright, K. N. (1989). Inmate adjustment to prison. In L. Goodstein & D. L. MacKenzie (Eds.), The American prison: issues in research and policy (pp. 229–251). NY: Plenum.
Gottfredson, S. D., & Gottfredson, D. M. (1986). Accuracy of prediction models. In A. Blumstein, J. Cohen, J. Roth, & C. A. Visher (Eds.), Criminal careers and “Career Criminals” (pp. 212–290). Washington: National Academy of Sciences Press.
Gottfredson, S. D., & Moriarty, L. J. (2006). Statistical risk assessment: old problems and new applications. Crime and Delinquency, 52, 178–200.
Grann, M., & Langstrom, N. (2007). Actuarial assessment of violence risk: to weigh or not to weigh? Criminal Justice and Behavior, 34, 22–36.
Grove, W. M., & Meehl, P. E. (1996). Comparative efficiency of informal (subjective, impressionistic) and formal (mechanical, algorithm) prediction procedures: the clinical-statistical controversy. Psychology, Public Policy, and Law, 2, 293–323.
Guerriere, M. R., & Detsky, A. S. (1991). Neural networks: what are they? Annals of Internal Medicine, 115, 906–907.
Gurney, K. (1997). An Introduction to neural networks. New York: UCL Press.
Hanson, R. K. (2005). Twenty years of progress in violence risk assessment. Journal of Interpersonal Violence, 20, 212–217.
Hanson, R.K. & Morton-Bourgon, K.E (2007) The accuracy of recidivism risk assessments for sexual offenders: A meta-analysis. Public Safety and Emergency Preparedness Canada.
Hanson, R. K., & Morton-Bourgon, K. E. (2009). The accuracy of recidivism risk assessments for sexual offenders: a meta-analysis of 118 prediction studies. Psychological Assessment, 21, 1–21.
Harer, M. D., & Steffensmeier, D. J. (1996). Race and prison violence. Criminology, 34, 323–355.
Hill, T., & Lewicki, P. (2006). Statistics, methods and application: a comprehensive reference for science, industry, and data mining. Tulsa: StatSoft, Inc.
Hilton, N. Z., Harris, G. T., & Rice, M. E. (2006). Sixty-six years of research on the clinical versus actuarial prediction of violence. The Counseling Psychologist, 34, 400–409.
Hosmer, D. W., & Lemeshow, S. (1989). Applied logistic regression. New York: Wiley.
Irwin, J. K. (1981). Sociological studies of the impact of long term confinement. In D. A. Ward & K. F. Schoen (Eds.), Confinement in maximum custody (pp. 33–68). Lexington: D.C. Health.
Irwin, J. K., & Cressey, D. (1962). Thieves, convicts, and the inmate culture. Social Problems, 10, 142–155.
Jiang, S., & Fisher-Giorlando, M. (2002). Inmate misconduct: a test of the deprivation, importation, and situational models. The Prison Journal, 82, 335–358.
Jones, P. R. (1996). Risk prediction in criminal justice. In A. T. Harland (Ed.), Choosing correctional options that work: defining the demand and evaluating the supply (pp. 33–68). Thousand Oaks: Sage.
Kass, G. V. (1980). An exploratory technique for investigating large quantities of categorical data. Applied Statistics, 29, 119–127.
Kroner, D. G., & Mills, J. F. (2001). The accuracy of five appraisal risk instruments in predicting institutional misconduct and new convictions. Criminal Justice and Behavior, 28, 471–489.
Liu, Y. Y., Yang, M., Ramsay, M., Li, X. S., & Coid, J. W. (2011). A comparison of logistic regression, classification and regression tree, and neural networks models in predicting violent re-offending. Journal of Quantitative Criminology, 27, 547–573.
Loh, W. Y., & Shih, Y. S. (1997). Split selection methods for classification trees. Statistica Sinica, 7, 815–840.
Menzies, R., Webster, S. D., McMain, S., Staley, S., & Scaglione, R. (1994). The dimensions of dangerousness revisited. Law and Human Behavior, 18, 1–28.
Minsky, M., & Papert, S. (1969). Perceptrons. Cambridge: MIT Press.
Mossman, D. (1994). Assessing prediction of violence: being accurate about accuracy. Journal of Consulting and Clinical Psychology, 62, 783–792.
Neuilly, M., Zgoba, K. M., Tita, G. E., & Lee, S. S. (2011). Predicting recidivism in homicide offenders using classification tree analysis. Homicide Studies, 15, 154–176.
Ning, G. M., Su, J., Li, Y. Q., Wang, X. Y., Li, C. H., & Yan, W. M. (2006). Artificial neural network base model for cardiovascular risk stratification in hypertension. Medical and Biological Engineering and Computing, 44, 202–208.
Palocsay, S. W., Wang, P., & Brookshire, R. G. (2000). Predicting criminal recidivism using neural networks. Socio-Economic Planning Sciences, 34, 271–284.
Paterline, B. A., & Petersen, D. M. (1999). Structural and social psychological determinants of prisonization. Journal of Crime and Justice, 27, 427–441.
Perlich, C., Provost, F., & Simonof, J. (2003). Tree induction vs. logistic regression: a learning curve analysis. Journal of Machine Learning Research, 4, 211–255.
Price, R. K., Spitznagel, E. L., Downey, T. J., Meyer, D. J., Risk, N. K., & El-Ghazzawy, O. G. (2000). Applying artificial neural network models to clinical decision making. Psychological Assessment, 12, 40–51.
Rice, M. E., & Harris, G. T. (1995). Violent recidivism: assessing predictive validity. Journal of Consulting and Clinical Psychology, 63, 737–748.
Ridgeway, G. (2013). Linking prediction and prevention. Criminology and Public Policy, 12, 545–550.
Ripley, B. D. (1996). Pattern recognition and neural networks. Cambridge: Cambridge University Press.
Rokach, L., & Maimon, O. (2008). Data mining with decision trees: theory and application. Hackensack: World Scientific Publishing.
Rosenfeld, B., & Lewis, C. (2005). Assessing violent risk in stalking cases: a regression tree approach. Law and Human Behavior, 29, 343–357.
Rumelhart, D. E., & McClelland, J. L. (1986). Parallel distributed processing (Vol. 1). Cambridge: MIT Press.
Rumelhart, D. E., & McClelland, J. L. (1988). Parallel distributed processing (Vol. 1 and 2). Cambridge: MIT Press.
Seymour, J. (1977). Niches in prisons. In H. Toch (Ed.), Living in prison: the ecology of survival (pp. 18–22). New York: Free Press.
Silver, E., Smith, W. R., & Banks, S. (2000). Constructing actuarial devices for predicting recidivism: a comparison of methods. Criminal Justice and Behavior, 27, 733–764.
Singh, J. P., & Fazel, S. (2010). Forensic risk assessment: a metareview. Criminal Justice and Behavior, 37, 965–988.
Singh, J. P., Grann, M., & Fazel, S. (2011). A comparative study of violence risk assessment tools: a systematic review and metaregression analysis of 68 studies and 25, 980 participants. Clinical Psychology Review. doi:doi:10.1016/j.cpr.2010.11.009.
Snyder, H. N. (2011). Patterns & Trends: Arrests in the United States, 1980–2009. Bureau of Justice Statistics: U.S. Department of Justice.
Sorensen, J., Wrinkle, R., & Gutierrez, A. (1998). Patterns of rule-violating behaviors and adjustment to incarceration among murderers. The Prison Journal, 78, 222–231.
Stalans, L. J., Yarnold, P. R., Seng, M., Olson, D. E., & Repp, M. (2004). Identifying three types of violent offenders and predicting violent recidivism while on probation: a classification tree analysis. Law and Human Behavior, 28, 253–271.
StatSoft Inc (2008) Data mining, predictive analytics, statistics, StatSoft electronic textbook. http://www.statsoft.com/textbook/.
Steadman, H. J., Silver, E., Monahan, J., Appelbaum, P. S., Robbins, P. C., & Mulvey, E. P. (2000). A classification tree approach to the development of actuarial violence risk assessment tools. Law and Human Behavior, 24, 83–100.
Steinke, P. (1991). Using situational factors to predict types of prison violence. Journal of Offender Rehabilitation, 17, 119–132.
Sykes, G. M. (1958). The society of captives. Princeton: Princeton University Press.
Thomas, S., & Leese, M. (2003). A green-fingered approach can improve the clinical utility of violence risk assessment tools. Criminal Behavior and Mental Health, 13, 153–158.
Thomas, S., Leese, M., Walsh, E., McCrone, P., Moran, P., & Burns, T. (2005). A comparison of statistical methods in predicting violence in psychotic illness. Comprehensive Psychiatry, 46, 296–303.
Toch, H. (1977). Living in prison: the ecology of survival. New York: Free Press.
Toch, H., & Adams, K. (1986). Pathology and disruptiveness among prison inmates. Journal of Research in Crime and Delinquency, 23, 7–21.
Tu, J. V. (1996). Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes. Journal of Clinical Epidemiology, 9, 1225–1231.
Wallace, J. M., & Bachman, J. G. (1991). Explaining racial/ethnic differences in adolescent drug use: the impact of background and lifestyles. Social Forces, 38, 333–354.
White, H. (1989). Some asymptotic results for learning in single hidden-layer feedforward network models. Journal of the American Statistical Society, 84, 1003–1013.
Wooldredge, J. D. (1991). Correlates of deviant behavior among inmates of U.S. correctional facilities. Journal of Crime and Justice, 14, 1–25.
Wright, K. N. (1991). A study of individual, environmental, and interactive effects in explaining adjustment to prison. Justice Quarterly, 8, 217–242.
Yan, L., Dodier, R., Mozer, M.C., & Wolniewicz, R (2003) Optimizing classifier performance via the Wilcoxon-Mann–Whitney statistics. Proceedings of the International Conference on Machine Learning.
Yang, M., Liu, Y.Y. & Coid, J.W (2010) Applying neural networks and classification tree models to the classification of serious offenders and the prediction of recidivism. Research summary, Ministry of Justice, UK, available online at www.justice.gov.uk/publications/research.htm.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1
Decision tree diagrams are a common way to visually display classification schemes. A decision tree consists of nodes which further split or into two or more branches, creating more nodes. The diagram starts with a root node, which is split into two or more nodes based on some splitting rule. The splitting rule is based on the values of a certain variable. The node that splits into multiple nodes is called the parent node and the split nodes are called child nodes. The child nodes, in turn become parent nodes when they are split based on another splitting rule. When a node does not split any further, we call that node a leaf node or a terminal node. A branch ends with a terminal node. The terminal node shows the probability of the class in which a case belongs.
The decision tree diagram shown above displays the probability of an offender recidivating based on marital status. In this example, the root node shows the percentage of cases involving “recidivism”. The root node is split into two branches based on the value of the variable “marital status”. The first child node includes all the cases with a marital status of single and the second node includes all the cases with a marital status of married. The corresponding data indicate that from the sample of 100 offenders, 70 recidivated and 30 did not. Further, among the 75 offenders who were single, 68 recidivated and among the 25 offenders who were married, only 2 recidivated. Accordingly, the probability of an offender who is single recidivating is 91 % and the probability of an offender who is married recidivating is 8 %.
Appendix 2
The simple multilayer neural network architecture shown above has three layers: input, output and hidden. Each layer consists of a number of processing elements (PEs or neurons). In feed-forward neural network architectures, information is input into each PE, processed, and then passed on to each PE in the layer above. In the case of the output PE, information is simply passed out of the network.
Each PE in the input layer corresponds to a feature or characteristic that the researcher is interested in using as an independent variable. The goal of the network is to map the input units to a desired output similar to the way in which the dependent variable is a function of the independent variables in regression analysis.
The PEs are interlinked by a set of connections which are characterized by weights. In feed-forward networks with backpropagation, the networks “learn” to map the input units to the output units by adjusting the weights on the connections in response to error signals transmitted back through the network. The difference between the output of the network and the target mapping constitutes the error signal. The error signal is propagated back through the network via the PEs and their connections and the weights are updated. This process continues until the sum of all error signals is minimized.
Appendix 3
A Receiver operating characteristic (ROC) curve is a graphical plot which illustrates the performance of a binary classifier system as its discriminant threshold is varied. It is created by plotting the fraction of true positives out of all the positives (Sensitivity) on the y-axis and false positives out of the negatives (1-specificity) on the x-axis at various threshold settings. The location of a point in the ROC space depicts the classification accuracy of a classification instrument. For example, the point at coordinate of (0,1) indicates that the classification instrument has a sensitivity of 100 % and specificity of 100 % (i.e., perfect classification). Classification instruments with 50 % sensitivity and 50 % specificity can be visualized on the diagonal determined by coordinate (0,0) and coordinate (1,0) and a point predicted by a classification instrument that falls into the area above the diagonal represents a good prediction. Conversely, a point predicted by a classification method that falls into the area below the diagonal represents a bad prediction. Theoretically, a random guess would give a point on the diagonal.
The ROC curve depicts the tradeoff between the true positive rate (TPR) and false positive rate (FPR) for different cut-points of a classification instrument. The interpretation of the ROC curve is similar to the single point in the ROC space in that the closer the points on the ROC curve to the ideal coordinate (0,1) the more accurate the classification instrument is. On the other hand, the closer the points on the ROC curve to the diagonal, the less accurate the classification instrument is. In the above diagram, a typical ROC curve looks like the curved line and the area under that curve is called the AUC under ROC. Higher the AUC values, better the classifier.
Rights and permissions
About this article
Cite this article
Ngo, F.T., Govindu, R. & Agarwal, A. Assessing the Predictive Utility of Logistic Regression, Classification and Regression Tree, Chi-Squared Automatic Interaction Detection, and Neural Network Models in Predicting Inmate Misconduct. Am J Crim Just 40, 47–74 (2015). https://doi.org/10.1007/s12103-014-9246-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12103-014-9246-6