Skip to main content

Advertisement

Log in

Using Response Times to Model Not-Reached Items due to Time Limits

  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

Missing values at the end of a test typically are the result of test takers running out of time and can as such be understood by studying test takers’ working speed. As testing moves to computer-based assessment, response times become available allowing to simulatenously model speed and ability. Integrating research on response time modeling with research on modeling missing responses, we propose using response times to model missing values due to time limits. We identify similarities between approaches used to account for not-reached items (Rose et al. in ETS Res Rep Ser 2010:i–53, 2010) and the speed-accuracy (SA) model for joint modeling of effective speed and effective ability as proposed by van der Linden (Psychometrika 72(3):287–308, 2007). In a simulation, we show (a) that the SA model can recover parameters in the presence of missing values due to time limits and (b) that the response time model, using item-level timing information rather than a count of not-reached items, results in person parameter estimates that differ from missing data IRT models applied to not-reached items. We propose using the SA model to model the missing data process and to use both, ability and speed, to describe the performance of test takers. We illustrate the application of the model in an empirical analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. This is different in the study by Goldhammer (2015), who imposed item-level time limits to reduce the heterogeneity in RTs across persons. Note, however, that this only reduces heterogeneity in RTs across persons, but does not get rid of it. Furthermore, item-level time limits may result in guessing and item omission (Kuhn & Ranger, 2015, Pohl & von Davier, 2018).

  2. This uses item parameters estimated with missing data ignored on data in which missing responses are coded as wrong. Hence, the item parameters do not fit the observed rates of wrong responses. This procedure was abandoned in PISA 2015.

  3. See Eq. (04) in their paper.

  4. These results refer to a condition with \(N = 30\) items and \(N = 1000\) persons.

References

  • Allen, N. L., Donoghue, J. R., & Schoeps, T. L. (2001). The NAEP 1998 technical report (NCES 2001–509). Washington, DC: National Center for Education Statistics.

    Google Scholar 

  • Bolsinova, M., & Tijmstra, J. (2016). Modeling conditional dependence between response time and accuracy. Psychometrika, 82(4), 1126–1148. https://doi.org/10.1007/s11336-016-9537-6.

    Article  PubMed  Google Scholar 

  • Bolsinova, M., Tijmstra, J., & Molenaar, D. (2017). Response moderation models for conditional dependence between response time and response accuracy. British Journal of Mathematical and Statistical Psychology, 70(2), 257–279. https://doi.org/10.1111/bmsp.12076.

    Article  PubMed  Google Scholar 

  • Cosgrove, J., & Cartwright, F. (2014). Changes in achievement on PISA: The case of Ireland and implications for international assessment practice. Large-Scale Assessments in Education, 2(1), 2. https://doi.org/10.1186/2196-0739-2-2.

    Article  Google Scholar 

  • Culbertson, M. (2011). Is it wrong? Handling missing responses in IRT. Paper presented at the annual meeting of the National Council of Measurement in Education, New Orleans, LA.

  • De Ayala, R. J., Plake, B. S., & Impara, J. C. (2001). The impact of omitted responses on the accuracy of ability estimation in item response theory. Educational and Psychological Measurement, 38, 213–234. https://doi.org/10.1111/j.1745-3984.2001.tb01124.x.

    Article  Google Scholar 

  • Drummond, A. J., Nicholls, G. K., Rodrigo, A. G., & Solomon, W. (2002). Estimating mutation parameters, population history and genealogy simultaneously from temporally spaced sequence data. Genetics, 161(3), 1307–1320.

    PubMed  PubMed Central  Google Scholar 

  • Duchhardt, C. & Gerdes, A. (2012). NEPS technical report for mathematics—scaling results of starting cohort 3 in fifth grade (NEPS Working Paper No. 19). Bamberg: Otto-Friedrich-Universität, Nationales Bildungspanel.

  • Finch, H. (2008). Estimation of item response theory parameters in the presence of missing data. Journal of Educational Measurement, 45, 225–245. https://doi.org/10.1111/j.1745-3984.2008.00062.x.

    Article  Google Scholar 

  • Fox, J. P. (2010). Bayesian item response modeling: Theory and applications. Berlin: Springer.

    Book  Google Scholar 

  • Fox, J. P., & Marianti, S. (2016). Joint modeling of ability and differential speed using responses and response times. Multivariate Behavioral Research, 51(4), 540–553. https://doi.org/10.1080/00273171.2016.1171128.

    Article  PubMed  Google Scholar 

  • Gelman, A., & Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7(4), 457–511.

    Article  Google Scholar 

  • Gelman, A., & Shirley, K. (2011). Inference from simulations and monitoring convergence. In S. Brooks, A. Gelman, G. L. Jones, & X. L. Meng (Eds.), Handbook of markov chain monte carlo (pp. 163–174). London: Chapman and Hall/CRC.

    Google Scholar 

  • Goegebeur, Y., De Boeck, P., Wollack, J., & Cohen, A. (2008). A speeded item response model with gradual process change. Psychometrika, 73, 65–87. https://doi.org/10.1007/s11336-007-9031-2.

    Article  Google Scholar 

  • Glas, C. A., Pimentel, J. L., & Lamers, S. M. A. (2015). Nonignorable data in IRT models: Polytomous responses and response propensity models with covariates. Psychological Test and Assessment Modeling, 57(4), 523–541.

    Google Scholar 

  • Glas, C. A. W., Pimentel, J. L., & Lamers, S. M. A. (2015). Nonignorable data in IRT models: Polytomous responses and response propensity models with covariates. Psychological Test and Assessment Modeling.

  • Goldhammer, F. (2015). Measuring ability, speed, or both? Challenges, psychometric solutions, and what can be gained from experimental control. Measurement: Interdisciplinary Research and Perspectives, 13:3–4, 133–164. https://doi.org/10.1080/15366367.2015.1100020.

    Article  Google Scholar 

  • Goldhammer, F., & Kroehne, U. (2014). Controlling individuals’ time spent on task in speeded performance measures: Experimental time limits, posterior time limits, and response time modeling. Applied Psychological Measurement, 38(4), 255–267. https://doi.org/10.1177/0146621613517164.

    Article  Google Scholar 

  • Holman, R., & Glas, C. A. W. (2005). Modelling non-ignorable missing-data mechanisms with item response theory models. British Journal of Mathematical and Statistical Psychology, 58, 1–17. https://doi.org/10.1111/j.2044-8317.2005.tb00312.x.

    Article  PubMed  Google Scholar 

  • Johnson, E. G., & Allen, N. L. (1992). The NAEP 1990 technical report: (Rep. No. 21-TR-20). NJ: Princeton.

  • Klein Entink, R. H., Fox, J. P., & van der Linden, W. J. (2009). A multivariate multilevel approach to the modeling of accuracy and speed of test takers. Psychometrika, 74(1), 21. https://doi.org/10.1007/s11336-008-9075-y.

    Article  PubMed  Google Scholar 

  • Köhler, C., Pohl, S., & Carstensen, C. H. (2014). Taking the missing propensity into account when estimating competence scores: Evaluation of IRT models for non-ignorable omissions. Educational and Psychological Measurement, 1, 1–25. https://doi.org/10.1177/0013164414561785.

    Article  Google Scholar 

  • Köhler, C., Pohl, S., & Carstensen, C. H. (2015). Investigating mechanisms for missing responses in competence tests. Psychological Test and Assessment Modeling, 57(4), 499–522.

    Google Scholar 

  • Köhler, C., Pohl, S., & Carstensen, C. H. (2017). Dealing with item nonresponse in large-scale cognitive assessments: The impact of missing data methods on estimated explanatory relationships. Journal of Educational Measurement, 54(4), 397–419.

    Article  Google Scholar 

  • Kruschke, J. (2014). Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan. New York: Academic Press.

    Google Scholar 

  • Kuhn, J.-T., & Ranger, J. (2015). Measuring speed, ability, or motivation: A commentary on Goldhammer (2015). Measurement: Interdisciplinary Research and Perspectives, 13:3–4, 173–176. https://doi.org/10.1080/15366367.2015.1105065.

    Article  Google Scholar 

  • Lee, Y.-H., & Chen, H. (2011). A review of recent response-time analyses in educational testing. Psychological Test and Assessment Modeling, 53(3), 359–379.

    Google Scholar 

  • Lee, Y.-H., & Jia, Y. (2014). Using response time to investigate students’ test-taking behaviors in a NAEP computer-based study. Large Scale Assessments in Education, 2(8), 1–24. https://doi.org/10.1186/s40536-014-0008-1.

    Article  Google Scholar 

  • Lord, F. M. (1974). Estimation of latent ability and item parameters when there are omitted responses. Psychometrika, 39, 247–264. https://doi.org/10.1007/BF02291471.

    Article  Google Scholar 

  • Meng, X. B., Tao, J., & Chang, H. H. (2015). A conditional joint modeling approach for locally dependent item responses and response times. Journal of Educational Measurement, 52, 1–27. https://doi.org/10.1111/jedm.12060.

    Article  Google Scholar 

  • Mislevy, R. J., & Wu, P.-K. (1996). Missing responses and IRT ability estimation: Omits, choice, time limits, and adaptive testing. ETS Research Report Series, 1996, i–36. https://doi.org/10.1002/j.2333-8504.1996.tb01708.x.

    Article  Google Scholar 

  • Molenaar, D., Oberski, D., Vermunt, J., & De Boeck, P. (2016). Hidden markov item response theory models for responses and response times. Multivariate Behavioral Research, 51(5), 606–626. https://doi.org/10.1080/00273171.2016.1192983.

    Article  PubMed  Google Scholar 

  • Molenaar, D., Tuerlinckx, F., & van der Maas, H. L. (2015). A generalized linear factor model approach to the hierarchical framework for responses and response times. British Journal of Mathematical and Statistical Psychology, 68(2), 197–219. https://doi.org/10.1111/bmsp.12042.

    Article  PubMed  Google Scholar 

  • Moustaki, I., & Knott, M. (2000). Weighting for item non-response in attitude scales by using latent variable models with covariates. Journal of the Royal Statistical Society: Series A (Statistics in Society), 163(3), 445–459. https://doi.org/10.1111/1467-985X.00177.

    Article  Google Scholar 

  • OCED. (2009). PISA 2006 technical report. Paris: OECD.

  • OCED. (2017). PISA 2015 technical report. Paris: OECD.

  • O’Muircheartaigh, C., & Moustaki, I. (1999). Symmetric pattern models: A latent variable approach to item non-response in attitude scales. Journal of the Royal Statistical Society: Series A (Statistics in Society), 162(2), 177–194.

    Article  Google Scholar 

  • Plummer, M. (2003). JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling. In Proceedings of the 3rd international workshop on distributed statistical computing (Vol.124).

  • Plummer, M. (2016). rjags: Bayesian graphical models using MCMC. R package version 4-6. Retrieved from https://CRAN.R-project.org/package=rjags.

  • Pohl, S. & Carstensen, C. (2012). NEPS technical report—scaling the data of the competence tests (NEPS Working Paper No. 14). Bamberg: Otto-Friedrich-University, Nationales Bildungspanel.

  • Pohl, S., Gräfe, L., & Rose, N. (2014). Dealing with omitted and not-reached items in competence tests: Evaluating approaches accounting for missing responses in item response theory models. Educational and Psychological Measurement, 74(3), 423–452. https://doi.org/10.1177/0013164413504926.

    Article  Google Scholar 

  • Pohl, S., Haberkorn, K., Hardt, K., & Wiegand, E. (2012). NEPS technical report for reading—scaling results of starting cohort 3 in fifth grade (NEPS Working Paper No. 15). Bamberg: Otto-Friedrich-Universität, Nationales Bildungspanel.

  • Pohl, S., & von Davier, M. (2018). Commentary: "On the importance of the speed-ability trade-off when dealing with not reached items" by Jesper Tijmstra and Maria Bolsinova. Frontiers in Psychology, 9, 1988. https://doi.org/10.3389/fpsyg.2018.01988.

    Article  PubMed  PubMed Central  Google Scholar 

  • R Development Core Team. (2016). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Retrieved from http://www.r-project.org.

  • Ranger, J., & Kuhn, J.-T. (2012). A flexible latent trait model for response times in tests. Psychometrika, 77(1), 31–47. https://doi.org/10.1007/s11336-011-9231-7.

    Article  Google Scholar 

  • Ranger, J., & Orthner, T. (2012). The case of dependency of responses and response times: A modeling approach based on standard latent trait models. Psychological Test and Assessment Modeling, 54(2), 128–148.

    Google Scholar 

  • Rose, N. (2013). Item nonresponses in educational and psychological measurement (Unpublished doctoral dissertation). Friedrich-Schiller-University of Jena, Germany.

  • Rose, N., von Davier, M., & Xu, X. (2010). Modeling nonignorable missing data with item response theory (IRT). ETS Research Report Series, 2010, i–53. https://doi.org/10.1002/j.2333-8504.2010.tb02218.x.

    Article  Google Scholar 

  • Sachse, K., Mahler, N., & Pohl, S. (2019). When nonresponse mechanisms change: effects on trends and group comparisons in international large-scale assessments. Educational and Psychological Measurement,. https://doi.org/10.1177/0013164419829196.

    Article  Google Scholar 

  • Schnipke, D. L., & Scrams, D. J. (1997). Modeling item response times with a two-state mixture model: A new method of measuring speededness. Journal of Educational Measurement, 34(3), 213–232.

    Article  Google Scholar 

  • Schnipke, D. L., & Scrams, D. J. (2002). Exploring issues of examinee behavior: Insights gained from response-time analyses. In C. N. Mills, M. Potenza, J. J. Fremer, & W. Ward (Eds.), Computer-based testing: Building the foundation for future assessments (pp. 237–266). Hillsdale, NJ: Lawrence Erlbaum Associates.

    Google Scholar 

  • Semmes, R., Davison, M. L., & Close, C. (2011). Modeling individual differences in numerical reasoning speed as a random effect of response time limits. Applied Psychological Measurement, 35(6), 433–446. https://doi.org/10.1177/0146621611407305.

    Article  Google Scholar 

  • Senkbeil, M. & Ihme, J. M. (2012). NEPS Technical report for computer literacy—scaling results of starting cohort 4 in ninth grade (NEPS Working Paper No. 17). Bamberg: Otto-Friedrich-Universität, Nationales Bildungspanel.

  • Tijmstra, J., & Bolsinova, M. (2018). On the importance of the speed-ability trade-off when dealing with not reached items. Frontiers in Psychology, 9, 964. https://doi.org/10.3389/fpsyg.2018.00964.

    Article  PubMed  PubMed Central  Google Scholar 

  • van der Linden, W. J. (2006). A lognormal model for response times on test items. Journal of Educational and Behavioral Statistics, 31, 181–204. https://doi.org/10.3102/10769986031002181.

    Article  Google Scholar 

  • van der Linden, W. J. (2007). A hierarchical framework for modeling speed and accuracy on test items. Psychometrika, 72(3), 287–308. https://doi.org/10.1007/s11336-006-1478-z.

    Article  Google Scholar 

  • van der Linden, W. J. (2008). Using response times for item selection in adaptive testing. Journal of Educational and Behavioral Statistics, 33, 5–20. https://doi.org/10.3102/1076998607302626.

    Article  Google Scholar 

  • van der Linden, W. J., Breithaupt, K., Chuah, S. C., & Zhang, Y. (2007). Detecting differential speededness in multistage testing. Journal of Educational Measurement, 44(2), 117–130. https://doi.org/10.1111/j.1745-3984.2007.00030.x.

    Article  Google Scholar 

  • van der Linden, W. J., & Glas, C. A. (2010). Statistical tests of conditional independence between responses and/or response times on test items. Psychometrika, 75(1), 120–139. https://doi.org/10.1007/s11336-009-9129-9.

    Article  Google Scholar 

  • van der Linden, W. J., & Guo, F. (2008). Bayesian Procedures for identifying aberrant response-time patterns in adaptive testing. Psychometrika, 73(3), 365–384. https://doi.org/10.1007/s11336-007-9046-8.

    Article  Google Scholar 

  • van der Linden, W. J., Scrams, D. J., & Schnipke, D. L. (1999). Using response-time constraints to dontrol for differential speededness in computerized adaptive testing. Applied Psychological Measurement, 23(3), 195–210. https://doi.org/10.1177/01466219922031329.

    Article  Google Scholar 

  • Weeks, J. P., von Davier, M., & Yamamoto, K. (2016). Using response time data to inform the coding of omitted responses. Special issue: Current methodological issues in large-scale assessments. Psychological Test and Assessment Modeling, 58(4), 671–701.

    Google Scholar 

  • Wise, S. L., & DeMars, C. E. (2005). Low examinee effort in low-stakes assessment: Problems and potential solutions. Educational Assessment, 10(1), 1–17. https://doi.org/10.1207/s15326977ea1001_1.

    Article  Google Scholar 

  • Yamamoto, K., & Everson, H. (1997). Modeling the effects of test length and test time on parameter estimation using the hybrid model. In J. Rost (Ed.), Applications of latent trait nd latent class models in the social sciences (pp. 89–98). Münster, Germany: Waxmann.

    Google Scholar 

  • Yamamoto, K., Khorramdel, L., & von Davier, M. (2013). Scaling PIAAC Cognitive Data. In Organisation for Economic Cooperation and Development (2013), Technical Report of the Survey of Adult Skills (PIAAC) (pp. 406–438). OECD Publishing. Available at: http://www.oecd.org/site/piaac/_Technical%20Report_17OCT13.pdf.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Steffi Pohl.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported by the German Research Foundation (Deutsche Forschungsgemeinschaft), Grant No. PO1655/3-1. We thank Wim van der Linden for helpful comments on the manuscript as well as the HPC service of Freie Universität Berlin for support and computing time.

Appendices

Appendix A

Prior settings

Speed-accuracy model

\(\varSigma _{P} \sim \textit{IW}_{2+1} \left( {{I}_2 } \right) \)

\(\varSigma _{I} \sim \textit{IW}_{2+1} \left( {{I}_2 } \right) \)

\(\mu _{b} \sim {N}\left( {0,1000^{2}} \right) \)

\(\mu _{\beta } \sim {N}\left( {1,1000^{2}} \right) \)

\(\alpha ^{2}\sim \varGamma \left( {0.01,0.001} \right) \)

Manifest missing response model

\(\sigma _{\theta }^2 \sim \textit{IG}\left( {0.01,0.001} \right) \)

\(\gamma \sim {N}\left( {0,1000^{2}} \right) \)

\(\mu _{\beta } \sim {N}\left( {1,1000^{2}} \right) \)

\(\sigma _{\beta }^2 \sim \textit{IG}\left( {0.01,0.001} \right) \)

  1. \(IW_{2+1}(.):\) inverse Wishart prior with 2+1 degrees of freedom; N(., .) :  normal prior; IG(.,.): inverse gamma prior; \(\Gamma \left( {.,.} \right) \): gamma prior; \({I}_2\) represents and identity matrix of size 2.
figure a
figure b

Appendix B

1.1 Difference in Speed Estimates

(see Figs. 14, 15, 16, 17.)

Fig. 14
figure 14

Difference in speed estimates using the SA model for complete data compared to the true speed values as a function of true ability. White circles represent simulees without missing values and filled circles persons with missing values. The number of missing values is given by the circles’ color, with darker colors denoting a higher number of not-reached items.

Fig. 15
figure 15

Difference in speed estimates using the SA model for complete data compared to the true speed values as a function of true speed. White circles represent simulees without missing values and filled circles persons with missing values. The number of missing values is given by the circles’ color, with darker colors denoting a higher number of not-reached items.

Fig. 16
figure 16

Difference in speed estimates between the SA model for incomplete data (SAinc) and the SA model for complete data (SAcomp) as a function of true ability. White circles represent simulees without missing values and filled circles persons with missing values. The number of missing values is given by the circles’ color, with darker colors denoting a higher number of not-reached items.

Fig. 17
figure 17

Difference in speed estimates between the SA model for incomplete data (SAinc) and the SA model for complete data (SAcomp) as a function of true speed. White circles represent simulees without missing values and filled circles persons with missing values. The number of missing values is given by the circles’ color, with darker colors denoting a higher number of not-reached items.

Appendix C

1.1 Subsequent Analyses

(see Figs. 18, 19.)

Fig. 18
figure 18

Difference in ability estimates using the SA model for complete data (SAcomp) compared to the true ability values as a function of true ability. White circles represent simulees without missing values and filled circles persons with missing values. The number of missing values is given by the circles’ color, with darker colors denoting a higher number of not-reached items.

Fig. 19
figure 19

Difference in ability estimates between using the SA model for incomplete data (SAinc) and using the SA model for complete data (SAcomp) as a function of true ability. White circles represent simulees without missing values and filled circles persons with missing values. The number of missing values is given by the circles’ color with darker colors denoting a higher number of not-reached items.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pohl, S., Ulitzsch, E. & von Davier, M. Using Response Times to Model Not-Reached Items due to Time Limits. Psychometrika 84, 892–920 (2019). https://doi.org/10.1007/s11336-019-09669-2

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-019-09669-2

Keywords

Navigation