Skip to main content
Log in

A test-theoretic approach to observed-score equating

  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

Observed-score equating using the marginal distributions of two tests is not necessarily the universally best approach it has been claimed to be. On the other hand, equating using the conditional distributions given the ability level of the examinee is theoretically ideal. Possible ways of dealing with the requirement of known ability are discussed, including such methods as conditional observed-score equating at point estimates or posterior expected conditional equating. The methods are generalized to the problem of observed-score equating with a multivariate ability structure underlying the scores.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Braun, H.I., & Holland, P.W. (1982). Observed score test equating: A mathematical analysis of some ETS equating procedures. In P. W. Holland & D. B. Rubin (Eds.).Test equating (pp. 9–49). New York: Academic Press.

    Google Scholar 

  • Campbell, N. R. (1928).An account of the principles of measurement and calculation. London: Longmans, Green & Co.

    Google Scholar 

  • Cizek, G.J., Kenney, P.A., Kolen, M.J., Peters, C.W., & van der Linden, W.J. (1999).The feasibility of linking scores on the proposed Voluntary National Test and the National Assessment of Educational Progress [Final report]. Washington, DC: National Assessment Governing Board.

    Google Scholar 

  • Dorans, N.J. (1999).Correspondences between ACT and SAT I scores (College Board Rep. No. 99-1). New York: College Entrance Board.

    Google Scholar 

  • Dubois, P.H. (1970).A history of psychological testing. Boston: Allyn & Bacon.

    Google Scholar 

  • Feuer, M.J., Holland, P.W., Green, B.F., Bertenthal, M. W., & Hemphill, F. C. (Eds.). (1999).Uncommon measures: Equivalence and linkage among educational tests. Washington, DC: National Academy Press.

    Google Scholar 

  • Glas, C.A.W. (1992). A Rasch model with a multivariate distribution of ability. In M. Wilson (Ed.),Objective measurement: Theory into practice (Vol. 1, pp. 236–260). Norwood, NJ: Ablex.

    Google Scholar 

  • Grayson, D.A. (1988). Two-group classification in latent trait theory: Scores with monotone likelihood ratio.Psychometrika, 53, 383–392.

    Article  Google Scholar 

  • Harris, D.B., & Crouse, J.D. (1993). A study of criteria used in equating.Applied Measurement in Education, 6, 195–240.

    Google Scholar 

  • Holland, P.W., & Rubin, D.B. (Eds.). (1982).Test equating. New York: Academic Press.

    Google Scholar 

  • Junker, B.W., & Sijtsma, K. (2000). Latent and manifest monotonicity in item response models.Applied Psychological Measurement, 24, 65–81.

    Google Scholar 

  • Kolen, M.J., & Brennan, R.L. (1995).Test equating: Methods and practices. New York, NY: Springer-Verlag.

    Google Scholar 

  • Koretz, D.M., Bertenthal, M.W., & Green, B.F. (Eds.). (1999).Embedded questions: The pursuit of a common measure in uncommon tests. Washington, DC: National Academy Press.

    Google Scholar 

  • Lehmann, E.L. (1986).Testing statistical hypothesis (2nd ed.). New York, NY: Wiley & Sons.

    Google Scholar 

  • Linn, R.L. (1993). Linking results of distincts assessments.Applied Measurement in Education, 6, 83–102.

    Google Scholar 

  • Liou, M., & Cheng, P.E. (1995). Asymptotic standard error of equipercentile equating.Journal of Educational and Behavioral Statistics, 20, 119–136.

    Google Scholar 

  • Lord, F.M. (1980).Applications of item response theory to practical testing problems. Hillsdale, NJ: Erlbaum.

    Google Scholar 

  • Lord, F.M. (1982). The standard error of equipercentile equating.Journal of Educational Statistics, 7, 165–174.

    Google Scholar 

  • Lord, F.M., & Wingersky, M.S. (1984). Comparison of IRT true-score and equipercentile observed-score “equatings”.Applied Psychological Measurement, 8, 452–461.

    Google Scholar 

  • Mislevy, R.J. (1992).Linking educational assessments: Concepts, issues, methods, and prospects. Princeton, NJ: Educational Testing Service.

    Google Scholar 

  • Morris, C.N. (1982). On the foundations of test equating. In P.W. Holland & D.B. Rubin (Eds.),Test equating (pp. 169–191). New York, NY: Academic Press.

    Google Scholar 

  • Pashley, P.J., & Philips, G.W. (1993).Towards world-class standards: A research study linking international and national assessments. Princeton, NJ: Educational Testing Service, Center for Educational Progress.

    Google Scholar 

  • Rasch, G. (1960).Probabilistic models for some intelligence and attainment tests. Copenhagen: Danish Institute for Educational Research.

    Google Scholar 

  • Spearman, C. (1904). The proof and measurement of association between two things.American Journal of Psychology, 15, 72–101.

    Google Scholar 

  • Suppes, P., & Zinnes, J.L. (1963). Basic measurement theory. In R.D. Luce, R.R. Bush, & E. Galanter (Eds.),Handbook of mathematical psychology (Vol. 1, pp. 1–76). New York, NY: Wiley & Sons.

    Google Scholar 

  • van der Linden, W. J. (1996). Assembling tests for the measurement of multiple abilities.Applied Psychological Measurement, 20, 373–388.

    Google Scholar 

  • van der Linden, W.J. (1998a). Stochastic order in dichotomous iem response models for fixed, adaptive, and multidimensional tests.Psychometrika, 63, 211–226.

    Google Scholar 

  • van der Linden, W.J. (1998b). Optimal assembly of psychological and educational tests.Applied Psychological Measurement, 22, 195–211.

    Google Scholar 

  • van der Linden, W.J. (in press). Adaptive testing with equated number-correct scoring.Applied Psychological Measurement, 25.

  • van der Linden, W.J., & Luecht, R.M. (1998). Observed-equating as a test assembly problem.Psychometrika, 63, 401–418.

    Google Scholar 

  • van der Linden, W.J. & Vos, J.H. (1996). A compensatory approach to optimal selection with mastery scores.Psychometrika, 61, 155–172.

    Article  Google Scholar 

  • Wilk, M. B., & Gnanadesikan, R. (1968). Probability plotting methods for the analysis of data.Biometrika, 55, 1–17.

    PubMed  Google Scholar 

  • Williams, V., Billaud, L., Davis, D., Thissen, D., & Sanford, E. (1995).Projecting the NAEP scale: Results from the North Carolina end—of-grade testing program (Technical Rep. No. 34). Chapel Hill, NC: University of North Carolina, Chapel Hill, National Institute of Statistical Sciences.

    Google Scholar 

  • Yen, W. (1983). Tau-equivalence and equipercentile equating.Psychometrika, 48, 353–369.

    Article  Google Scholar 

  • Zeng, L., & Kolen, M.J. (1995). An alternative approach for IRT observed-score equating of number-correct scores.Applied Psychological Measurement, 19, 231–240.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wim J. van der Linden.

Additional information

This article is based on the author's Presidential Address given on July 7, 2000 at the 65th Annual Meeting of the Psychometric Society held at the University of British Columbia, Vancouver, Canada.

The author is most indebted to Wim M.M. Tielen for his computational assistance and Cees A.W. Glas for his comments on a draft of this paper.

Rights and permissions

Reprints and permissions

About this article

Cite this article

van der Linden, W.J. A test-theoretic approach to observed-score equating. Psychometrika 65, 437–456 (2000). https://doi.org/10.1007/BF02296337

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02296337

Key words

Navigation