Psychometric properties of parent- and staff-reported measures and observational measures of infant and toddler development in Early Head Start

doi:10.1016/j.ecresq.2022.06.003

Early Childhood Research Quarterly

Volume 61, 4th Quarter 2022, Pages 132-144

https://doi.org/10.1016/j.ecresq.2022.06.003 Get rights and content

Highlights

•
The results suggest adequate internal consistency for most of the parent- and staff-reported measures.
•
We found stronger rater effects for staff-reported than observational measures.
•
The rater effects in the staff-reported measures varied by staff characteristics.
•
There was some evidence of validity at ages 2 and 3, but little evidence at age 1.
•
The estimates of sensitivity were low for all of the measures examined.

Abstract

Using longitudinal data from the Early Head Start Family and Child Experiences Survey 2009 (Baby FACES 2009), this study examines the psychometric properties of infant and toddler measures of language and cognitive development and social-emotional competence collected from multiple sources, including parent and Early Head Start staff reports and direct assessments from trained assessors. We examined the validity and reliability of parent/staff ratings and observational measures derived from adult/child play-based interactions using independently administered standardized measures from direct assessments as the criteria or comparison measures. Participants included 846 children and families with low incomes enrolled in Early Head Start in spring 2009 who were followed from ages 1 to 3, drawn from a nationally representative sample of programs. The results suggest adequate internal consistency reliability for most of the measures. We found stronger rater effects for staff-reported measures compared with assessor effects for direct child assessment and observational measures. Moreover, the rater effects in the staff-reported measures varied by staff characteristics. There was some evidence of validity for the staff- and parent-reported measures at ages 2 and 3, but little evidence of validity for the age 1 measures. We discuss the implications of these findings for research on infants and toddlers from families with low incomes. It is important to consider the purpose of the assessments and their psychometric properties for the particular group of children with whom the assessments will be used when selecting assessment tools.

Introduction

In the past 20 years, a dramatic growth in investments in early childhood education (ECE) and intervention has been accompanied by an increase in the number of children being served by programs designed to enhance their learning and healthy development (Barnett, Carolan, Fitzgerald, & Squires, 2012). As one example, Early Head Start—a two-generation program designed for expectant mothers and families with children from birth to age 3 who have low incomes—has grown from the initial 68 programs funded in 1995 to nearly 1200 programs today, serving more than 166,000 children and families throughout the nation (Office of Head Start [OHS], 2019). To meet quality standards, ECE programs are often required to collect data about children and families to inform program planning and guide a range of decisions aimed at supporting children's learning and development (Boller, Atkins-Burnett, Malone, Baxter, & West, 2010; Malone et al., 2010; National Research Council [NRC], 2008). The information garnered from such assessments is expected to inform individualization of instruction and services for children and families, and guide other aspects of program operation, such as deployment of staff training and technical assistance. The Survey of Early Head Start Programs (SEHSP), designed to provide information to support program improvement in Early Head Start, found that staff use information from ongoing assessments in lesson planning for individual children, making referrals for additional services, and planning home visit activities (Vogel et al., 2006).¹ Accordingly, there is growing recognition of the need for reliable and valid infant/toddler measures that capture the actual skills, knowledge, and behaviors critical to children's development and later school success.

Different types of infant/toddler measures can serve different purposes: determining the functioning of individual children, guiding instruction and tracking children's progress over time, measuring program performance, and advancing knowledge of infant/toddler development (NRC, 2008). The rise of “data-driven decision-making” in early childhood practice provides an opportunity to address important questions about the psychometric integrity of measures typically used to describe and monitor children's development (Boller et al., 2010; Malone et al., 2010). Questions about the reliability and validity of early childhood measures likewise arise when used in research-based settings: Is evidence of reliability and validity adequate for the purposes of describing samples of children followed longitudinally or assessing program impacts?

Examination of reliability and validity evidence provides information about the extent to which a measure yields trustworthy and meaningful results for its intended purpose. Evidence of reliability and validity are not inherent in a measure itself but rather are dependent on the characteristics of the sample, the context in which the measure is used, and its purpose or planned use. Both reliability and validity inform the inferences that can be drawn from assessment results; the higher the stakes of assessment, the more evidence of the psychometric soundness of the instrument there should be (NRC, 2008).

Reliability is the characteristic of a measure related to the amount of random error from the measurement process (National Council on Measurement in Education [NCME], 2021); it reflects whether the measure is producing consistent results across different circumstances. Numerous factors can introduce error and diminish the accuracy with which early childhood outcomes are measured, thereby capturing only part of the child's actual competencies or skill level (NRC, 2008). These sources of error may be attributable to underlying characteristics of the instrument itself or to other factors extraneous to the trait or ability being measured. One type of reliability—internal consistency reliability—demonstrates the extent to which the individual items in the measure are related to one another.

Rater or assessor effects are another source of error (Waterman, McDermott, Fantuzzo, & Gadsden, 2012). In the case of a direct assessment of a child's abilities, assessor effects occur when variation in resulting scores reflects differences in the assessors’ performance and not solely differences in the children's performance. In the same way, parent and teacher reports may be influenced by the characteristics or attitudes of the individual providing the rating. For example, among Head Start and kindergarten children, prior studies have indicated that a large proportion of the variance in children's scores on teacher-administered measures was attributable to teachers rather than children. Specifically, an average of 28% of score variation from Head Start teachers’ reports of language, literacy, and mathematics, and 31% from kindergarten teachers’ reports of reading ability, were unrelated to actual child differences (Waterman et al., 2012). In addition, a meta-analysis of 79 studies of rater effects in psychological research found that more than one-third of the variance in scores was due to rater effects (Hoyt & Kems, 1999). Thus, a child's score may reflect (1) the characteristics of the individual providing the rating, (2) differences in the performance of the assessors themselves, (3) the reliability of the measure for that child, and (4) the child's actual level on the trait being measured. Although training raters in using a measure and rating consistently across children can reduce measurement error and improve the measure's reliability, even good training will not eliminate rater effects when the measure requires raters’ subjective judgment (Hoyt & Kems, 1999; Raudenbush, Martinez, Bloom, Zhu, & Lin, 2008).

The concurrent and predictive validity of a measure are usually examined in relation to another measure at the same or a later time, respectively, which serves as the criterion. The strength of the relationship between measures depends on many different factors, including the reliability of the measures, the similarity in the mode of assessment over time (e.g., moving from parent-reported to direct assessments of language comprehension), the amount of variance and distribution of scores on each measure, and the similarity in the dimensions and constructs being assessed.

Evidence of the predictive validity of early childhood measures typically demonstrates low to moderate correlation coefficients. For example, measures of cognitive and language ability within the first 18 months of life only weakly predict children's school-age achievement (Burchinal, 2008; Colombo, 1993; McCall, 1983; Neisser et al., 1996). Cognitive abilities measured during the preschool years account for less than 25% of the variance in academic performance in kindergarten and first grade (La Paro & Pianta, 2000). In addition, predictive associations are more robust in measures of school-age children (5–6 years) than in their younger counterparts (2–3 years), and for standardized measures of cognitive and language ability than for teacher ratings of social-emotional and problem behavior (Belsky et al., 2007; National Institute of Child Health and Human Development [NICHD] Early Child Care Research Network, 2005). Social-emotional competencies are particularly difficult to measure with precision, given that they typically rely on observation or caregiver reports, which are themselves influenced by rater effects, particularly among very young children (Lambert, Nelson, Brewer, & Burchinal, 2006).

The paucity of predictive validity evidence of early childhood measures may be due in part to the rapid developmental changes that unfold during the infant and toddler years (Alfonso, Bracken, & Nagle, 2020; NRC, 2008). To measure the same underlying attribute as children develop, the corresponding skill being assessed may need to be different across time. For example, measures that assess children's developing language may need to focus on the use of gestures and receptive language during preverbal stages of language acquisition. As children become more expressive and begin to use words in speech, measurement may shift to capturing productive language use. Consequently, the measure of language ability at age 1 (foundational and rudimentary communication skills) may reflect something different from a measure of language obtained during a later developmental period (use of single- and multiple-word phrases). Thus, the predictive validity of early childhood measures may be limited by the challenges associated with measuring change over time. When the attributes of a given skill manifest differently over time, the measures used to adequately capture those skills during different developmental periods likewise must change.

Large-scale longitudinal research studies provide an important opportunity to examine measurement in early childhood. Designed in part to address how infants and toddlers are faring in key areas of development and well-being, the Early Head Start Family and Child Experiences Survey 2009 (Baby FACES 2009) is a descriptive study that includes a comprehensive set of measures designed to assess children's progress across a range of domains, including cognitive, language, and social-emotional competence (Vogel et al., 2011). In Baby FACES 2009, researchers obtained measures of children's abilities at ages 1, 2, and 3 across these developmental areas from varied respondents and administration modes. Data sources included a combination of parent, program staff, and assessor reports; direct child assessments; and coded adult/child play-based interactions—spanning those commonly used in research-based settings to those used by practitioners for monitoring child progress. The psychometric properties of most of the selected measures have not been examined with children from families with low incomes, however. Thus, Baby FACES 2009 provides a unique opportunity to compare the reliability and validity of lower burden and relatively lower cost measures (such as parent and Early Head Start staff reports) with more in-depth direct child assessments and observational ratings in a national sample of Early Head Start children.

Using data from Baby FACES 2009, the current study adds to the knowledge base about the measurement of infant and toddler development by examining the psychometric properties of Early Head Start staff- and parent-reported measures of language, cognitive development, and social-emotional competence alongside observational data from adult/child play-based interactions. We used independently administered standardized measures from direct assessments as the comparison measures, given their established reliability and validity. Specifically, we asked the following research questions:

1
What are the reliability estimates of the parent-reported, staff-reported, and observational measures of children's language and cognitive development and social-emotional competence?
2
Is there evidence of concurrent and predictive validity of the parent and staff reports and observational measures in relation to direct assessments of children's abilities and assessor ratings of children's behavior?

Section snippets

Participants

We culled study participants from Baby FACES 2009, a nationally representative sample of 89 Early Head Start programs located in 38 states (94% of programs approached consented to participate in the study). The study selected all children receiving center- and/or home-based services who fell within the study-defined windows based on date of birth or due date for expectant mothers as of the first data collection visit, which occurred in spring 2009: a newborn cohort and 1-year-old cohort. A

Internal consistency reliability

Overall, reliability estimates (Cronbach's alpha coefficients) for the parent/staff ratings and observational measures were adequate (0.70 or higher; Kline, 2000) (see Tables S.1 and S.2 of the supplementary tables available online). The Cronbach's alpha coefficients for all of the staff-reported measures and parent-reported CDI and BITSEA scores were above 0.70, with the exception of the parent-reported BITSEA Competence subscale at age 1, for which the coefficient was 0.66. The highest

Discussion

The present study examined psychometric properties of parent- and staff-reported measures (indirect measures; the ASQ-3, CDI, and BITSEA) and observational measures (the ECI-Adapted and PCI Child Rating Scales) of children's development in a nationally representative sample of children in Early Head Start programs. The findings have implications for research on infants and toddlers from families with low incomes and for programs that serve these children, as well as for other uses of these

Credit author statement

Yange Xue: Conceptualization, Data Analysis, Writing—original draft preparation; Eileen Bandel: Writing—Original draft preparation; Cheri Vogel: Conceptualization, Writing—Reviewing; Kimberly Boller: Conceptualization, Writing—Reviewing

Author note

This study is sponsored by the Office of Planning, Research and Evaluation, Administration for Children and Families. The views expressed in this study do not necessarily reflect the views or policies of the Office of Planning, Research and Evaluation, the Administration for Children and Families, or the U.S. Department of Health and Human Services.

The authors would like to express their appreciation to Sally Atkins-Burnett, Margaret Burchinal, Judith Carta, Charles Greenwood, Virginia

References (46)

C. Waterman et al.
The matter of assessor variance in early childhood education—Or whose score is it anyway?
Early Childhood Research Quarterly
(2012)
R.R. Abidin
Parenting stress index
(1995)
Head Start FACES: Longitudinal findings on program performance: Third Progress Report
(2001)
V.C. Alfonso et al.
Psychoeducational assessment of preschool children
(2020)
C. Andreassen et al.
Early childhood longitudinal study, birth cohort (ECLS–B) psychometric report for the 2-year data collection (NCES 2007–084)
(2007)
W.S. Barnett et al.
The state of preschool 2012: State preschool yearbook
(2012)
N. Bayley
Bayley scales of infant development
(1993)
J. Belsky et al.
Are there long-term effects of early child care?
Child Development
(2007)
K. Boller et al.
Compendium of student, teacher, and classroom measures used in NCEE evaluations of educational interventions. Volume I: Measures selection approaches and compendium development methods
(2010)
C. Brady-Smith et al.
Child-parent interaction rating scales for the Three-Bag assessment
(2000)

C. Brady-Smith et al.

Background and psychometric information for the child-parent interaction rating scales for the Three-Bag assessment: 14-, 24-, and 36-month waves

(2005)

M.J. Briggs-Gowan et al.

The Brief Infant–Toddler Social and Emotional Assessment (BITSEA)

(2006)

M.R. Burchinal

How measurement error affects the interpretation and understanding of effect sizes

Child Development Perspectives

(2008)

J.J. Carta et al.

Using IGDIs: Monitoring progress and improving intervention for infants and young children

(2010)

J.J. Carta et al.

Individual growth and development indicators (IGDIs): Assessment that guides intervention for young children

Young Exceptional Children Monograph Series

(2002)

A.S. Carter et al.

ITSEA infant-toddler social and emotional assessment-examiner's manual

(2006)

J. Clifford et al.

Examining the technical adequacy of the ages & stages questionnaires: Inventory

Infants & Young Children

(2018)

J. Colombo

Infant cognition: Predicting later intellectual functioning

(1993)

M. Cox

Qualitative ratings for parent-child interaction at 24–36 months of age. Unpublished Manuscript

(1997)

L.M. Dunn et al.

Peabody picture vocabulary test

(2007)

L. Fenson et al.

Short-form versions of the MacArthur communicative development inventories

Applied Psycholinguistics

(2000)

E.C. Furnari et al.

Factors associated with accuracy in prekindergarten teacher ratings of students’ mathematics skills

Journal of Psychoeducational Assessment

(2016)

C.R. Greenwood et al.

Preliminary investigations of the application of the Early Communication Indicator (ECI) for infants and toddlers

Journal of Early Intervention

(2006)

Cited by (0)

View full text

Psychometric properties of parent- and staff-reported measures and observational measures of infant and toddler development in Early Head Start

Highlights

Abstract

Introduction

Section snippets

Participants

Internal consistency reliability

Discussion

Credit author statement

Author note

Early Childhood Research Quarterly

Parenting stress index

Head Start FACES: Longitudinal findings on program performance: Third Progress Report

Psychoeducational assessment of preschool children

Early childhood longitudinal study, birth cohort (ECLS–B) psychometric report for the 2-year data collection (NCES 2007–084)

The state of preschool 2012: State preschool yearbook

Bayley scales of infant development

Are there long-term effects of early child care?

Child Development

Compendium of student, teacher, and classroom measures used in NCEE evaluations of educational interventions. Volume I: Measures selection approaches and compendium development methods

Child-parent interaction rating scales for the Three-Bag assessment

Background and psychometric information for the child-parent interaction rating scales for the Three-Bag assessment: 14-, 24-, and 36-month waves

The Brief Infant–Toddler Social and Emotional Assessment (BITSEA)

How measurement error affects the interpretation and understanding of effect sizes

Child Development Perspectives

Using IGDIs: Monitoring progress and improving intervention for infants and young children

Individual growth and development indicators (IGDIs): Assessment that guides intervention for young children

Young Exceptional Children Monograph Series

ITSEA infant-toddler social and emotional assessment-examiner's manual

Examining the technical adequacy of the ages & stages questionnaires: Inventory

Infants & Young Children

Infant cognition: Predicting later intellectual functioning

Qualitative ratings for parent-child interaction at 24–36 months of age. Unpublished Manuscript

Peabody picture vocabulary test

Short-form versions of the MacArthur communicative development inventories

Applied Psycholinguistics

Factors associated with accuracy in prekindergarten teacher ratings of students’ mathematics skills

Journal of Psychoeducational Assessment

Preliminary investigations of the application of the Early Communication Indicator (ECI) for infants and toddlers

Journal of Early Intervention