Instruktionssensitivität von Tests und Items

Naumann, Alexander; Musow, Stephanie; Aichele, Christine; Hochweber, Jan; Hartig, Johannes

doi:10.1007/s11618-018-0832-0

Instruktionssensitivität von Tests und Items

Instructional sensitivity of tests and items

Allgemeiner Teil
Published: 18 June 2018

Volume 22, pages 181–202, (2019)
Cite this article

Zeitschrift für Erziehungswissenschaft Aims and scope Submit manuscript

Alexander Naumann^1,3,
Stephanie Musow^2,3,
Christine Aichele¹,
Jan Hochweber^2,3 &
…
Johannes Hartig^1,3

875 Accesses
6 Citations
1 Altmetric
Explore all metrics

Zusammenfassung

Testergebnisse von Schülerinnen und Schülern dienen regelmäßig als ein zentrales Kriterium für die Beurteilung der Effektivität von Schule und Unterricht. Gültige Rückschlüsse über Schule und Unterricht setzen voraus, dass die eingesetzten Testinstrumente mögliche Effekte des Unterrichts auffangen können, also instruktionssensitiv sind. Jedoch wird diese Voraussetzung nur selten empirisch überprüft. Somit bleibt mitunter unklar, ob ein Test nicht instruktionssensitiv oder ein Unterricht nicht effektiv war. Die Klärung dieser Frage erfordert die empirische Untersuchung der Instruktionssensitivität der eingesetzten Tests und Items.

Während die Instruktionssensitivität in den USA bereits seit Langem diskutiert wird, findet das Konzept im deutschsprachigen Diskurs bislang nur wenig Beachtung. Unsere Arbeit zielt daher darauf ab, das Konzept Instruktionssensitivität in den deutschsprachigen Diskurs über schulische Leistungsmessung einzubetten. Dazu werden drei Themenfelder behandelt, (a) der theoretische Hintergrund des Konzepts Instruktionssensitivität, (b) die Messung von Instruktionssensitivität sowie (c) die Identifikation von weiteren Forschungsbedarfen.

Abstract

Student performance in assessments is a criterion regularly used to determine the effectiveness of school and instruction. Valid interpretation requires that outcomes are affected by instruction to a significant degree. Hence, instruments need to be capable of detecting effects of instruction, that is, instruments need to be instructionally sensitive. However, the empirical investigation of the instructional sensitivity of tests and items is under-researched. In consequence, in many cases, it remains unclear whether teaching was ineffective, or the instrument was insensitive.

While there is a living discussion on the instructional sensitivity of tests and items in the USA, the concept of instructional sensitivity is rather unknown in German-speaking countries. Thus, the present study aims at (a) introducing the concept of instructional sensitivity, (b) providing an overview on current approaches of measuring instructional sensitivity, and (c) identifying further research directions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Literatur

Airasian, P. W., & Madaus, G. F. (1983). Linking testing and instruction: policy issues. Journal of Educational Measurement, 20, 103–118.
Article Google Scholar
Altrichter, H., Moosbrugger, M. R., & Zuber, M. J. (2016). Schul-und Unterrichtsentwicklung durch Datenrückmeldung. In H. Altrichter & K. Maag Merki (Hrsg.), Handbuch Neue Steuerung im Schulsystem (2. Aufl., S. 235–277). Wiesbaden: Springer.
Chapter Google Scholar
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education (2014). Standards for educational and psychological testing. Washington, DC.: AERA, APA, & NCME.
Google Scholar
Anderson, L. W. (2002). Curricular alignment: a re-examination. Theory Into Practice, 41(4), 255–260.
Article Google Scholar
Arnold, K.-H. (2005). Mehr Fairness im Bildungssystem. Fragen zu Standards und Vergleichsarbeiten. Friedrich-Jahresheft, 23, 25–27.
Google Scholar
Baker, E. L. (1994). Making performance assessment work: the road ahead. Educational Leadership, 51(6), 58–62.
Google Scholar
Baumert, J., Brunner, M., Lüdtke, O., & Trautwein, U. (2007). Was messen internationale Schulleistungsstudien? – Resultate kumulativer Wissenserwerbsprozesse. Eine Antwort auf Heiner Rindermann. Psychologische Rundschau, 58, 118–127.
Article Google Scholar
Brühwiler, C. (2014). Adaptive Lehrkompetenz und schulisches Lernen: Effekte handlungssteuernder Kognitionen von Lehrpersonen auf Unterrichtsprozesse und Lernergebnisse der Schülerinnen und Schüler. Münster: Waxmann.
Google Scholar
Burstein, L. (1989). Conceptual considerations in instructionally sensitive assessment. Paper presented at the Annual Meeting of the American Educational Research Association, San Francisco.
Google Scholar
Chen, J. (2012). Impact of instructional sensitivity on high-stakes achievement test items: a comparison of methods. Lawrence: University of Kansas.
Google Scholar
Clauser, B. E., Nungester, R. J., & Swaminathan, H. (1996). Improving the matching for DIF analysis by conditioning on both test score and an educational background variable. Journal of Educational Measurement, 33(4), 453–464.
Article Google Scholar
Cox, R. C., & Vargas, J. S. (1966). A comparison of item-selection techniques for norm referenced and criterion referenced tests. Paper presented at the Annual Conference of the National Council on Measurement in Education, Chicago.
Google Scholar
D’Agostino, J. V., Welsh, M. E., & Corson, N. M. (2007). Instructional sensitivity of a state standards-based assessment. Educational Assessment, 12, 1–22.
Article Google Scholar
Decristan, J., Klieme, E., Kunter, M., Hochweber, J., Büttner, G., Fauth, B., & Hardy, I. (2015). Embedded formative assessment and classroom process quality: how do they interact in promoting science understanding? American Educational Research Journal, 52(6), 1133–1159.
Article Google Scholar
DeMars, C. E. (2004). Detection of item parameter drift over multiple test administrations. Applied Measurement in Education, 17, 265–300.
Article Google Scholar
Deutscher, V., & Winther, E. (2017). Instructional sensitivity in vocational education. Learning and Instruction. https://doi.org/10.1016/j.learninstruc.2017.07.004.
Article Google Scholar
Drechsel, B., Prenzel, M., & Seidel, T. (2015). Nationale und internationale Schulleistungsstudien. In E. Wild & J. Möller (Hrsg.), Pädagogische Psychologie (S. 353–380). Berlin: Springer.
Google Scholar
Fend, H. (2002). Mikro- und Makrofaktoren eines Angebot-Nutzungsmodells von Schulleistungen. Zum Stellenwert der Pädagogischen Psychologie bei der Erklärung von Schulleistungsunterschieden verschiedener Länder. Zeitschrift für Pädagogische Psychologie, 16(3/4), 141–149.
Article Google Scholar
Fend, H. (2011). Die Wirksamkeit der Neuen Steuerung – theoretische und methodische Probleme ihrer Evaluation. Zeitschrift für Bildungsforschung, 1, 5–24.
Article Google Scholar
Fischer, G. H. (1972). Conditional maximum-likelihood estimations of item parameters for a linear logistic test model. Research Bulletin 9. Vienna: University of Vienna, Psychological Institute.
Google Scholar
Fischer, N., Sauerwein, M. N., Theis, D., & Wolgast, A. (2016). Vom Lesenlernen in der Ganztagsschule: Leisten Ganztagsangebote einen Beitrag zur Leseförderung am Beginn der Sekundarstufe I? Zeitschrift Für Pädagogik, 62(6), 780–796.
Google Scholar
French, B. F., Finch, W. F., Randel, B., Hand, B., & Gotch, C. M. (2016). Measurement invariance techniques to enhance measurement sensitivity. International Journal of Quantitative Research in Education, 3, 79–93.
Article Google Scholar
Geisinger, K. F., & McCormick, C. M. (2010). Adopting cut scores: post-standard-setting panel considerations for decision makers. Educational Measurement: Issues and Practice, 29(1), 38–44.
Article Google Scholar
Greer, E. A. (1995). Examining the validity of a new large-scale reading assessment instrument from two perspectives. Urbana, IL: Center for the Study of Reading.
Google Scholar
Grossman, P., Cohen, J., Ronfeldt, M., & Brown, L. (2014). The test matters: the relationship between classroom observation scores and teacher value added on multiple types of assessment. Educational Researcher, 43, 293–303.
Article Google Scholar
Grünkorn, J., Klieme, E., & Stanat, P. (2018). Bildungsmonitoring und Qualitätssicherung. In O. Köller, M. Hasselhorn, F. W. Hesse, K. Maaz, J. Schrader, C. K. Spieß, H. Solga, & K. Zimmer (Hrsg.), Das Bildungswesen in Deutschland: Bestand und Potentiale. Bad Heilbrunn: UTB/Klinkhardt.
Google Scholar
Haladyna, T. M. (2004). Developing and validating multiple-choice test items. Mahwah: Lawrence Erlbaum.
Google Scholar
Haladyna, T. M., & Roid, G. H. (1981). The role of instructional sensitivity in the empirical review of criterion-referenced test items. Journal of Educational Measurement, 18, 39–53.
Article Google Scholar
Hartig, J., Frey, A., & Jude, N. (2012). Validität. In H. Moosbrugger & A. Kelava (Hrsg.), Testtheorie und Fragebogenkonstruktion (2. Aufl., S. 143–171). Berlin: Springer.
Chapter Google Scholar
Hartig, J., Klieme, E., & Leutner, D. (Hrsg.) (2008). Assessment of competencies in educational contexts. Göttingen: Hogrefe & Huber Publishers.
Google Scholar
Hascher, T., & Schmitz, B. (Hrsg.) (2010). Pädagogische Interventionsforschung: Theoretische Grundlagen und empirisches Handlungswissen. Weinheim: Juventa.
Google Scholar
Helmke, A. (2012). Unterrichtsqualität und Lehrerprofessionalität: Diagnose, Evaluation und Verbesserung des Unterrichts (4. Aufl.). Seelze: Klett.
Google Scholar
Hochweber, J., Naumann, A., Hartig, J., Kleinbub, I., & Musow, S. (2017). Using item properties to predict the instructional sensitivity of test items. Paper presented at the 17th Biennial Conference of the European Association for Research on Learning and Instruction (EARLI), Tampere.
Google Scholar
Holland, P. W., & Wainer, H. (Hrsg.) (1993). Differential item functioning: theory and practice. Hillsdale: Lawrence Erlbaum.
Google Scholar
Ing, M. (2008). Using instructional sensitivity and instructional opportunities to interpret students’ mathematics performance. Journal of Educational Research & Policy Studies, 8, 23–43.
Google Scholar
Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50, 1–73.
Article Google Scholar
Klieme, E. (2008). Systemmonitoring für den Sprachunterricht. In DESI-Konsortium (Hrsg.), Unterricht und Kompetenzerwerb in Deutsch und Englisch: Ergebnisse der DESI-Studie (S. 1–10). Weinheim: Beltz.
Google Scholar
Klieme, E., & Leutner, D. (2006). Kompetenzmodelle zur Erfassung individueller Lernergebnisse und zur Bilanzierung von Bildungsprozessen. Beschreibung eines neu eingerichteten Schwerpunktprogramms der DFG. Zeitschrift für Pädagogik, 52(6), 876–903.
Google Scholar
Klieme, E., Pauli, C., & Reusser, K. (2009). The Pythagoras study: investigating effects of teaching and learning in Swiss and German mathematics classrooms. In T. Janík & T. Seidel (Hrsg.), The power of video studies in investigating teaching and learning in the classroom (S. 137–160). Münster: Waxmann.
Google Scholar
Kosecoff, J. B., & Klein, S. P. (1974). Instructional sensitivity statistics appropriate for objectives-based test items. Paper presented at the Annual Conference of the National Council on Measurement in Education, Chicago.
Google Scholar
Kultusministerkonferenz (2006). Gesamtstrategie der Kultusministerkonferenz zum Bildungsmonitoring. München: Wolters Kluwer.
Google Scholar
Li, H., Qin, Q., & Lei, P.-W. (2016). An examination of the instructional sensitivity of the TIMSS math items: a hierarchical differential item functioning approach. Educational Assessment, 22(1), 1–17.
Article Google Scholar
van der Linden, W. J. (1981). A latent trait look at pretest-posttest validation of criterion-referenced test items. Review of Educational Research, 51(3), 379–402.
Article Google Scholar
Linn, R. L., & Harnisch, D. L. (1981). Interactions between item content and group membership on achievement test items. Journal of Educational Measurement, 18(2), 109–118.
Article Google Scholar
Lossen, K., Tillmann, K., Holtappels, H. G., Rollett, W., & Hannemann, J. (2016). Entwicklung der naturwissenschaftlichen Kompetenzen und des sachunterrichtsbezogenen Selbstkonzepts bei Schüler/innen in Ganztagsgrundschulen: Ergebnisse der Längsschnittstudie StEG-P zu Effekten der Schülerteilnahme und der Angebotsqualität. Zeitschrift für Pädagogik, 62(6), 760–779.
Google Scholar
Maag Merki, K. (2016). Theoretische und empirische Analysen der Effektivität von Bildungsstandards, standardbezogenen Lernstandserhebungen und zentralen Abschlussprüfungen. In H. Altrichter & K. Maag Merki (Hrsg.), Handbuch Neue Steuerung im Schulsystem (2. Aufl., S. 151–182). Wiesbaden: Springer.
Google Scholar
McClung, M. S. (1979). Competency testing programs: legal and educational issues. Fordham Law Review, 47, 652–711.
Google Scholar
Mehrens, W. A., & Phillips, S. E. (1987). Sensitivity of item difficulties to curricular validity. Journal of Educational Measurement, 24(4), 357–370.
Article Google Scholar
Millman, J. (1970). Reporting student progress: a case for a criterion-referenced marking system. Phi Delta Kappan, 54(4), 226–230.
Google Scholar
Musow, S., Naumann, A., Hartig, J., & Hochweber, J. (2018). Expertenratings – Ein Verfahrensvergleich zur Evaluation der Instruktionssensitivität von Testitems. Vortrag bei der 6. Tagung der Gesellschaft für Empirische Bildungsforschung (GEBF), Basel.
Google Scholar
Muthén, B. O. (1989). Using item-specific instructional information in achievement modeling. Psychometrika, 54, 385–396.
Article Google Scholar
Muthén, B. O., Kao, C.-F., & Burstein, L. (1991). Instructionally sensitive psychometrics: application of a new IRT-based detection technique to mathematics achievement test items. Journal of Educational Measurement, 28, 1–22.
Article Google Scholar
Nagy, G., Retelsdorf, J., Goldhammer, F., Schiepe-Tiska, A., & Lüdtke, O. (2017). Veränderungen der Lesekompetenz von der 9. zur 10. Klasse: Differenzielle Entwicklungen in Abhängigkeit der Schulform, des Geschlechts und des soziodemografischen Hintergrunds? Zeitschrift für Erziehungswissenschaft, 2(20), 177–203.
Article Google Scholar
Naumann, A., Hochweber, J., & Hartig, J. (2014). Modeling instructional sensitivity using a longitudinal multilevel differential item functioning approach. Journal of Educational Measurement, 51(4), 381–399.
Article Google Scholar
Naumann, A., Hochweber, J., & Hartig, J. (2015). An explanatory longitudinal multilevel IRT approach to instructional sensitivity. Paper presented at the Annual Meeting of the National Council on Measurement in Education (NCME), Chicago.
Google Scholar
Naumann, A., Hochweber, J., & Klieme, E. (2016). A psychometric framework for the evaluation of instructional sensitivity. Educational Assessment, 21(2), 1–13.
Article Google Scholar
Naumann, A., Hartig, J., & Hochweber, J. (2017). Absolute and relative measures of instructional sensitivity. Journal of Educational and Behavioral Statistics, 42(6), 678–705.
Article Google Scholar
Pellegrino, J. W. (2002). Knowing what students know. Issues in Science & Technology, 19(2), 48–52.
Google Scholar
Polikoff, M. S. (2010). Instructional sensitivity as a psychometric property of assessments. Educational Measurement: Issues and Practice, 29(4), 3–14.
Article Google Scholar
Polikoff, M. S. (2016). Evaluating the Instructional Sensitivity of Four States’ Student Achievement Tests. Educational Assessment, 21(2), 102–119.
Article Google Scholar
Polikoff, M. S., & Porter, A. C. (2014). Instructional alignment as a measure of teaching quality. Educational Evaluation and Policy Analysis, 36(4), 399–416.
Article Google Scholar
Popham, W. J. (2007). Instructional insensitivity of tests: accountability’s dire drawback. Phi Delta Kappan, 89(2), 146–155.
Article Google Scholar
Popham, J. W., & Ryan, J. M. (2012). Determining a high-stakes test’s instructional sensitivity. Paper presented at the Annual Conference of the National Council on Measurement in Education, Vancouver.
Google Scholar
Porter, A. C. (2002). Measuring the content of instruction: uses in research and practice. Educational Researcher, 31(7), 3–14.
Article Google Scholar
Ramsteck, C., & Maier, U. (2015). Testdatenbasierte Schul-und Unterrichtsentwicklung. Analyse von Handlungsmustern bei der Rezeption und Nutzung von Vergleichsarbeitsdaten. In J. Schrader, J. Schmid, K. Amos, & A. Thiel (Hrsg.), Governance von Bildung im Wandel (S. 119–144). Wiesbaden: Springer.
Chapter Google Scholar
Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: applications and data analysis methods. Thousand Oaks: SAGE.
Google Scholar
Rindermann, H. (2006). Was messen internationale Schulleistungsstudien? Schulleistungen, Schülerfähigkeiten, kognitive Fähigkeiten, Wissen oder allgemeine Intelligenz? Psychologische Rundschau, 57(2), 69–86.
Article Google Scholar
Robitzsch, A. (2009). Methodische Herausforderungen bei der Kalibrierung von Leistungstests. In D. Granzer, O. Köller, & A. Bremerich-Vos (Hrsg.), Bildungsstandards Deutsch und Mathematik (S. 42–106). Weinheim: Beltz.
Google Scholar
Ruiz-Primo, M. A., Shavelson, R. J., Hamilton, L., & Klein, S. (2002). On the evaluation of systemic science education reform: searching for instructional sensitivity. Journal of Research in Science Teaching, 39(5), 369–393.
Article Google Scholar
Ruiz-Primo, M. A., Li, M., Wills, K., Giamellaro, M., Lan, M.-C., Mason, H., & Sands, D. (2012). Developing and evaluating instructionally sensitive assessments in science. Journal of Research in Science Teaching, 49(6), 691–712.
Article Google Scholar
Spoden, C., & Leutner, D. (2011). Vergleichsarbeiten. URN: urn:nbn:de:0111-pedocs-107492
Google Scholar
Stanat, P., & Pant, H.-A. (2016). Die IQB-Bildungstrends als zentrales Element des Bildungsmonitorings. In D. P. Stanat, K. Böhme, S. Schipolowski, & N. Haag (Hrsg.), IQB-Bildungstrend 2015. Sprachliche Kompetenzen am Ende der 9. Jahrgangsstufe im zweiten Ländervergleich (S. 13–19). Münster: Waxmann.
Google Scholar
United States Court of Appeals, fifth Circuit. (1981). DEBRA P. v. Ralph D. TURLINGTON (Nr. No. 79-3074).
Google Scholar
Weinert, F. E. (2001). Leistungsmessungen in Schulen. Weinheim: Beltz.
Google Scholar
Yoon, B., & Resnick, L. B. (1998). Instructional validity, opportunity to learn, and equity: New standards examinations for the California Mathematics Renaissance. Los Angeles: Center for the Study of Evaluation.
Google Scholar

Download references

Author information

Authors and Affiliations

Deutsches Institut für Internationale Pädagogische Forschung (DIPF), Schloßstraße 29, 60486, Frankfurt am Main, Deutschland
Alexander Naumann, Christine Aichele & Johannes Hartig
Pädagogische Hochschule St. Gallen (PHSG), Notkerstraße 27, 9000, St. Gallen, Schweiz
Stephanie Musow & Jan Hochweber
IDeA Forschungszentrum, Schloßstraße 29, 60486, Frankfurt am Main, Deutschland
Alexander Naumann, Stephanie Musow, Jan Hochweber & Johannes Hartig

Authors

Alexander Naumann
View author publications
You can also search for this author in PubMed Google Scholar
Stephanie Musow
View author publications
You can also search for this author in PubMed Google Scholar
Christine Aichele
View author publications
You can also search for this author in PubMed Google Scholar
Jan Hochweber
View author publications
You can also search for this author in PubMed Google Scholar
Johannes Hartig
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alexander Naumann.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Naumann, A., Musow, S., Aichele, C. et al. Instruktionssensitivität von Tests und Items. Z Erziehungswiss 22, 181–202 (2019). https://doi.org/10.1007/s11618-018-0832-0

Download citation

Published: 18 June 2018
Issue Date: 05 February 2019
DOI: https://doi.org/10.1007/s11618-018-0832-0

Schlüsselwörter

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Instruktionssensitivität von Tests und Items

Zusammenfassung

Abstract

Access this article

Literatur

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Schlüsselwörter

Keywords

Search

Navigation