Skip to main content

Part of the book series: Kluwer International Handbooks of Education ((SIHE,volume 9))

Abstract

“Validity, reliability, comparability, and fairness are not just measurement issues, but social values that have meaning and force outside of measurement wherever evaluative judgments and decisions are made” (Messick, 1994, p. 2).

Endnotes

This work draws in part on the authors’ work on the National Research Council’s Committee on the Foundations of Assessment. The first author received support under the Educational Research and Development Centers Program, PR/Award Number R305B60002, as administered by the Office of Educational Research and Improvement, U.S. Department of Education. The second author received support from the National Science Foundation under grant No. ESI-9910154. The findings and opinions expressed in this report do not reflect the positions or policies of the National Research Council, the National Institute on Student Achievement, Curriculum, and Assessment, the Office of Educational Research and Improvement, the National Science Foundation, or the U.S. Department of Education.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 749.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 949.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 949.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Adams, R., Wilson, M.R., & Wang, W.-C. (1997). The multidimensional random coefficients multino mial logit model. Applied Psychological Measurement, 21, 1–23.

    Article  Google Scholar 

  • Almond, R.G., & Mislevy, R.J. (1999). Graphical models and computerized adaptive testing. Applied Psychological Measurement, 23, 223–237.

    Article  Google Scholar 

  • American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.

    Google Scholar 

  • Anderson, J.R., Boyle, C.F., & Corbett, A.T. (1990). Cognitive modeling and intelligent tutoring. Artificial Intelligence, 42, 7–49.

    Article  Google Scholar 

  • Bennett, R.E. (2001). How the internet will help large-scale assessment reinvent itself. Education Policy Analysis, 9(5). Retrieved from http://epaa.asu.edu/epaa/v9n5.htlm.

  • Bradlow, E.T., Wainer, H., & Wang, X. (1999). A Bayesian random effects model for testlets. Psychometrika, 64, 153–168.

    Article  Google Scholar 

  • Brennan, R.L. (1983). The elements of generalizability theory. Iowa City, IA: American College Testing Program.

    Google Scholar 

  • Brennan, R.L. (2001). An essay on the history and future of reliability from the perspective of replications. Journal of Educational Measurement, 38(4), 295–317

    Article  Google Scholar 

  • Brown, W. (1910). Some experimental results in the correlation of mental abilities. British Journal of Psychology, 3, 296–322.

    Google Scholar 

  • Bryk, A.S., & Raudenbush, S. (1992). Hierarchical linear models: Applications and data analysis methods. Newbury Park: Sage.

    Google Scholar 

  • Campbell, D.T., & Fiske, D.W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56, 81–105.

    Article  Google Scholar 

  • Cohen, J.A. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37–46.

    Article  Google Scholar 

  • Cronbach, L.J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 17, 297–334.

    Article  Google Scholar 

  • Cronbach, L.J. (1989). Construct validation after thirty years. In R.L. Linn (Ed.), Intelligence: Measurement, theory, and public policy (pp. 147–171). Urbana, IL: University of Illinois Press.

    Google Scholar 

  • Cronbach, L.J., Gleser, G.C., Nanda, H., & Rajaratnam, N. (1972). The dependability of behavioral measurements: Theory of generalizability for scores and profiles. New York: Wiley.

    Google Scholar 

  • Cronbach, L.J., & Meehl, P.E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281–302.

    Article  Google Scholar 

  • Dayton, CM. (1999). Latent class scaling analysis. Thousand Oaks, CA: Sage.

    Google Scholar 

  • Dibello, L.V., Stout, W.F., & Roussos, L.A. (1995). Unified cognitive/psychometric diagnostic assessment likelihood based classification techniques. In P. Nichols, S. Chipman, & R. Brennan (Eds.), Cognitively diagnostic assessment (pp. 361–389). Hillsdale, NJ: Erlbaum.

    Google Scholar 

  • Embretson, S. (1983). Construct validity: Construct representation versus nomothetic span. Psychological Bulletin, 93, 179–197.

    Article  Google Scholar 

  • Embretson, S.E. (1998). A cognitive design systems approach to generating valid tests: Application to abstract reasoning. Psychological Methods, 3, 380–396.

    Article  Google Scholar 

  • Ercikan, K. (1998). Translation effects in international assessments. International Journal of Educational Research, 29, 543–553.

    Article  Google Scholar 

  • Ercikan, K., & Julian, M. (2002). Classification accuracy of assigning student performance to proficiency levels: Guidelines for assessment design. Applied Measurement in Education. 15, 269–294.

    Article  Google Scholar 

  • Falmagne, J.-C., & Doignon, J.-P. (1988). A class of stochastic procedures for the assessment of knowledge. British Journal of Mathematical and Statistical Psychology, 41, 1–23.

    Article  Google Scholar 

  • Fischer, G.H. (1973). The linear logistic test model as an instrument in educational research. Acta Psychologica, 37, 359–374.

    Article  Google Scholar 

  • Gelman, A., Carlin, J., Stern, H., & Rubin, D.B. (1995). Bayesian data analysis. London: Chapman & Hall.

    Google Scholar 

  • Greeno, J.G., Collins, A.M., & Resnick, L.B. (1996). Cognition and learning. In D.C Berliner, & R.C Calfee (Eds.), Handbook of educational psychology (pp. 15–146). New York: Macmillan.

    Google Scholar 

  • Gulliksen, H. (1950/1987). Theory of mental tests. New York: John Wiley/Hillsdale, NJ: Lawrence Erlbaum.

    Book  Google Scholar 

  • Haertel, E.H. (1989). Using restricted latent class models to map the skill structure of achievement test items. Journal of Educational Measurement, 26, 301–321.

    Article  Google Scholar 

  • Haertel, E.H., & Wiley, D.E. (1993). Representations of ability structures: Implications for testing. In N. Frederiksen, R.J. Mislevy, & I.I. Bejar (Eds.), Test theory for a new generation of tests. Hillsdale, NJ: Lawrence Erlbaum.

    Google Scholar 

  • Hambleton, R.K. (1989). Principles and selected applications of item response theory. In R.L. Linn (Ed.), Educational measurement (3rd ed.) (pp. 147–200). Phoenix, AZ: American Council on Education/Oryx Press.

    Google Scholar 

  • Hambleton, R.K., & Slater, S.C. (1997). Reliability of credentialing examinations and the impact of scoring models and standard-setting policies. Applied Measurement in Education, 10, 19–39.

    Article  Google Scholar 

  • Holland, P.W., & Thayer, D.T. (1988). Differential item performance and the Mantel-Haenzsel procedures. In H. Wainer, & H.I. Braun (Eds.), Test validity (pp. 129–145). Hillsdale, NJ: Lawrence Erlbaum.

    Google Scholar 

  • Holland, P.W., & Wainer, H. (1993). Differential item functioning. Hillsdale, NJ: Lawrence Erlbaum.

    Google Scholar 

  • Jöreskog, K.G., & Sörbom, D. (1979). Advances in factor analysis and structural equation models. Cambridge, MA: Abt Books.

    Google Scholar 

  • Kadane, J.B., & Schum, D.A. (1996). A probabilistic analysis of the Sacco and Vanzetti evidence. New York: Wiley.

    Google Scholar 

  • Kane, M.T. (1992). An argument-based approach to validity. Psychological Bulletin, 112, 527–535.

    Article  Google Scholar 

  • Kelley, T.L. (1927). Interpretation of educational measurements. New York: World Book.

    Google Scholar 

  • Kuder, G.F., & Richardson, M.W. (1937). The theory of estimation of test reliability. Psychometrika, 2, 151–160.

    Article  Google Scholar 

  • Lane, W., Wang, N., & Magone, M. (1996). Gender-related differential item functioning on a middle-school mathematics performance assessment. Educational Measurement: Issues and Practice, 15(4), 21–27, 31.

    Article  Google Scholar 

  • Lazarsfeld, P.F. (1950). The logical and mathematical foundation of latent structure analysis. In S.A. Stouffer, L. Guttman, E.A. Suchman, P.R. Lazarsfeld, S.A. Star, & J.A Clausen (Eds.), Measurement and prediction (pp. 362–412). Princeton, NJ: Princeton University Press.

    Google Scholar 

  • Levine, M., & Drasgow, F. (1982). Appropriateness measurement: Review, critique, and validating studies. British Journal of Mathematical and Statistical Psychology, 35, 42–56.

    Article  Google Scholar 

  • Linacre, J.M. (1989). Many faceted Rasch measurement. Doctoral dissertation, University of Chicago.

    Google Scholar 

  • Lord, F.M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum.

    Google Scholar 

  • Lord, R.M., & Novick, M.R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.

    Google Scholar 

  • Martin, J.D., & VanLehn, K. (1995). A Bayesian approach to cognitive assessment. In P. Nichols, S. Chipman, & R. Brennan (Eds.), Cognitively diagnostic assessment (pp. 141–165). Hillsdale, NJ: Erlbaum.

    Google Scholar 

  • Messick, S. (1989). Validity. In R.L. Linn (Ed.), Educational measurement (3rd ed.) (pp. 13–103). New York: American Council on Education/Macmillan.

    Google Scholar 

  • Messick, S. (1994). The interplay of evidence and consequences in the validation of performance assessments. Education Researcher, 32, 13–23.

    Google Scholar 

  • Messick, S., Beaton, A.E., & Lord, F.M. (1983). National Assessment of Educational Progress reconsidered: A new design for a new era. NAEP Report 83-1. Princeton, NJ: National Assessment for Educational Progress.

    Google Scholar 

  • Mislevy, R.J., Steinberg, L.S., & Almond, R.G. (in press). On the structure of educational assessments. Measurement: Interdisciplinary Research and Perspectives. In S. Irvine, & P. Kyllonen (Eds.), Generating items for cognitive tests: Theory and practice. Hillsdale, NJ: Erlbaum.

    Google Scholar 

  • Mislevy, R.J., Steinberg, L.S., Almond, R.G., Haertel, G., & Penuel, W. (in press). Leverage points for improving educational assessment. In B. Means, & G. Haertel (Eds.), Evaluating the effects of technology in education. Hillsdale, NJ: Erlbaum.

    Google Scholar 

  • Mislevy, R.J., Steinberg, L.S., Breyer, F.J., Almond, R.G., & Johnson, L. (1999). A cognitive task analysis, with implications for designing a simulation-based assessment system. Computers and Human Behavior, 15, 335–374.

    Article  Google Scholar 

  • Mislevy, R.J., Steinberg, L.S., Breyer, F.J., Almond, R.G., & Johnson, L. (in press). Making sense of data from complex assessment. Applied Measurement in Education.

    Google Scholar 

  • Myford, C.M., & Mislevy, R.J. (1995). Monitoring and improving a portfolio assessment system (Center for Performance Assessment Research Report). Princeton, NJ: Educational Testing Service.

    Google Scholar 

  • National Research Council (1999). How people learn: Brain, mind, experience, and school. Committee on Developments in the Science of Learning. Bransford, J.D., Brown, A.L., & Cocking, R.R. (Eds.). Washington, DC: National Academy Press.

    Google Scholar 

  • National Research Council (2001). Knowing what students know: The science and design of educational assessment. Committee on the Foundations of Assessment. Pellegrino, J., Chudowsky, N., & Glaser, R. (Eds.). Washington, DC: National Academy Press.

    Google Scholar 

  • O’Neil, K.A., & McPeek, W.M. (1993). Item and test characteristics that are associated with Differential Item Functioning. In P.W. Holland, & H. Wainer (Eds.), Differential item functioning (pp. 255–276). Hillsdale, NJ: Erlbaum.

    Google Scholar 

  • Patz, R.J., & Junker, B.W. (1999). Applications and extensions of MCMC in IRT: Multiple item types, missing data, and rated responses. Journal of Educational and Behavioral Statistics, 24, 342–366.

    Google Scholar 

  • Petersen, N.S., Kolen, M.J., & Hoover, H.D. (1989). Scaling, norming, and equating. In R.L. Linn (Ed.), Educational measurement (3rd ed.) (pp. 221–262). New York: American Council on Education/Macmillan.

    Google Scholar 

  • Pirolli. P., & Wilson, M. (1998). A theory of the measurement of knowledge content, access, and learning. Psychological Review, 105, 58–82.

    Article  Google Scholar 

  • Rasch, G. (1960/1980). Probabilistic models for some intelligence and attainment tests. Copenhagen: Danish Institute for Educational Research/Chicago: University of Chicago Press (reprint).

    Google Scholar 

  • Reckase, M. (1985). The difficulty of test items that measure more than one ability. Applied Psychological Measurement, 9, 401–412.

    Article  Google Scholar 

  • Rogosa, D.R., & Ghandour, G.A. (1991). Statistical models for behavioral observations (with discussion). Journal of Educational Statistics, 16, 157–252.

    Article  Google Scholar 

  • Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph No. 17, 34, (No. 4, Part 2).

    Google Scholar 

  • Samejima, F. (1973). Homogeneous case of the continuous response level. Psychometrika, 38, 203–219.

    Article  Google Scholar 

  • Schum, D.A. (1987). Evidence and inference for the intelligence analyst. Lanham, MD: University Press of America.

    Google Scholar 

  • Schum, D.A. (1994). The evidential foundations of probabilistic reasoning. New York: Wiley.

    Google Scholar 

  • SEPUP (1995). Issues, evidence, and you: Teacher’s guide. Berkeley: Lawrence Hall of Science.

    Google Scholar 

  • Shavelson, R.J., & Webb, N.W (1991). Generalizability theory: A primer. Newbury Park, CA: Sage.

    Google Scholar 

  • Spearman, C. (1904). The proof and measurement of association between two things. American Journal of Psychology, 15, 72–101.

    Article  Google Scholar 

  • Spearman, C. (1910). Correlation calculated with faulty data. British Journal of Psychology, 3, 271–295.

    Google Scholar 

  • Spiegelhalter, D.J., Thomas, A., Best, N.G., & Gilks, W.R. (1995). BUGS: Bayesian inference using Gibbs sampling, Version 0.50. Cambridge: MRC Biostatistics Unit.

    Google Scholar 

  • Tatsuoka, K.K. (1990). Toward an integration of item response theory and cognitive error diagnosis. In N. Frederiksen, R. Glaser, A. Lesgold, & M.G. Shafto, (Eds.), Diagnostic monitoring of skill and knowledge acquisition (pp. 453–488). Hillsdale, NJ: Erlbaum.

    Google Scholar 

  • Thissen, D., & Steinberg, L. (1986). A taxonomy of item response models. Psychometrika, 51, 567–77.

    Article  Google Scholar 

  • Toulmin, S. (1958). The uses of argument. Cambridge, England: University of Cambridge Press.

    Google Scholar 

  • Traub, R.E., & Rowley, G.L. (1980). Reliability of test scores and decisions. Applied Psychological Measurement, 4, 517–545.

    Article  Google Scholar 

  • van der Linden, W.J. (1998). Optimal test assembly. Applied Psychological Measurement, 22, 195–202.

    Article  Google Scholar 

  • van der Linden, W.J., & Hambleton, R.K. (1997). Handbook of modern item response theory. New York: Springer.

    Google Scholar 

  • Wainer, H., Dorans, N.J., Flaugher, R., Green, B.F., Mislevy, R.J., Steinberg, L., & Thissen, D. (2000). Computerized adaptive testing: A primer (2nd ed). Hillsdale, NJ: Lawrence Erlbaum.

    Google Scholar 

  • Wainer, H., & Keily, G.L. (1987). Item clusters and computerized adaptive testing: A case for testlets. Journal of Educational Measurement, 24, 195–201.

    Article  Google Scholar 

  • Wiley, D.E. (1991). Test validity and invalidity reconsidered. In R.E. Snow, & D.E. Wiley (Eds.), Improving inquiry in social science (pp. 75–107). Hillsdale, NJ: Erlbaum.

    Google Scholar 

  • Willingham, W.W., & Cole, N.S. (1997). Gender and fair assessment. Mahwah, NJ: Lawrence Erlbaum.

    Google Scholar 

  • Wilson, M., & Sloane, K. (2000). From principles to practice: An embedded assessment system. Applied Measurement in Education, 13, 181–208.

    Article  Google Scholar 

  • Wolf, D., Bixby, J., Glenn, J., & Gardner, H. (1991). To use their minds well: Investigating new forms of student assessment. In G. Grant (Ed.), Review of Educational Research, Vol. 17 (pp. 31–74). Washington, DC: American Educational Research Association.

    Google Scholar 

  • Wright, B.D., & Masters, G.N. (1982). Rating scale analysis. Chicago: MESA Press.

    Google Scholar 

  • Yen, W.M. (1993). Scaling performance assessments: Strategies for managing local item dependence. Journal of Educational Measurement, 30, 187–213.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Kluwer Academic Publishers

About this chapter

Cite this chapter

Mislevy, R.J., Wilson, M.R., Ercikan, K., Chudowsky, N. (2003). Psychometric Principles in Student Assessment. In: Kellaghan, T., Stufflebeam, D.L. (eds) International Handbook of Educational Evaluation. Kluwer International Handbooks of Education, vol 9. Springer, Dordrecht. https://doi.org/10.1007/978-94-010-0309-4_31

Download citation

  • DOI: https://doi.org/10.1007/978-94-010-0309-4_31

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-1-4020-0849-8

  • Online ISBN: 978-94-010-0309-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics