Simplifying quality rating systems in early childhood education

https://doi.org/10.1016/j.childyouth.2020.104947Get rights and content

Highlights

  • Quality Rating and Improvement Systems aim to improve early care and education quality.

  • Many QRIS have not been linked to either quality or children’s developmental gains.

  • In response to calls for second generation QRIS we explore a novel, single-indicator QRIS.

  • We find that this one observational measure QRIS is as predictive as more complex models.

  • Implications for updating QRIS policy to include fewer, more powerful indicators discussed.

Abstract

High-quality preschool experiences can promote children’s short- and long-term development, yet many children in the United States still lack access to high-quality care. Many states have turned to Quality Rating and Improvement Systems (QRIS) in an attempt to solve this problem. Unfortunately, recent empirical work has shown that most QRIS ratings are not linked to either program quality or children’s learning gains, leading to calls for reform. In particular, in recent years there have been calls to simplify QRIS and focus accountability on a smaller set of quality measures that are linked to children’s development. This exploratory study probes the potential of such an approach, comparing how a single measure of quality compares to a set of commonly-used QRIS indicators (e.g. teacher education, an observational measure of teacher-child interactions, classroom size/ratio, opportunities for family involvement, etc.). Our findings—that a QRIS system based on an observational measure of the quality of teacher-child interactions alone is equally effective at predicting children’s learning gains as a more complex set of indicators—align with recent calls for second generation QRIS that use fewer, but more powerful indicators. Implications for research and policy are discussed.

Introduction

Though there is ample evidence that high-quality preschool experiences can promote children’s short- and long-term development across multiple domains (Campbell et al., 2002, Deming, 2009, Phillips et al., 2017), there is significant variability in the quality of early care and education (ECE) settings in the United States (Bassok and Galdo, 2016, Burchinal et al., 2010, Dowsett et al., 2008). To improve the quality of ECE settings, many states have increasingly turned to accountability systems known as Quality Rating and Improvement Systems (QRIS). QRIS aim to improve the quality of ECE programs system-wide through incentives, supports for quality improvement, and informational campaigns designed to shape parent choice (Zellman & Perlman, 2008). While in 1997 there was just one QRIS in operation, forty-nine states now either operate a QRIS or are in planning or pilot phases (Build Initiative and Child Trends, 2019).

Much like K-12 accountability systems, QRIS aim to measure quality, incentivize improvement, and increase transparency. However, unlike the K-12 system, which typically rate programs’ quality based on children’s learning gains, no QRIS rates programs based on developmental gains. This is because assessing very young children is costly, time-consuming, and oftentimes unreliable (e.g. National Association for the Education of Young Children Association, 2003, National Early Childhood Accountability Task Force, 2007, Snow, 2011, Waterman et al., 2012). Instead, QRIS rate programs on a variety of classroom inputs that are either theoretically or empirically linked to children’s learning. QRIS quality indicators are wide-ranging, including staff qualifications, teacher-child ratios, assessment use, observed measures of classroom quality, and opportunities for family involvement.

Because the ultimate goal of QRIS is to support young children’s early development, it is important to understand the extent to which the broad set of quality measures collected by QRIS succeeds in identifying programs that are effectively supporting children’s learning. Indeed, there are now several QRIS validation studies across more than 12 states exploring these issues. Unfortunately, findings from these studies have been largely discouraging, showing weak, inconsistent associations between QRIS and children’s development, leading researchers and policymakers to reconsider QRIS design (Cannon et al., 2017, Karoly, 2014). In particular there have been recent calls for the creation of what Cannon and colleagues (2017) call “second generation” QRIS, which would be far more streamlined than those currently in place. These second generation QRIS should have both fewer quality indicators and include only indicators that have been robustly linked to child development (Cannon et al., 2017, Sabol et al., 2013).

Louisiana’s QRIS follows the suggested second generation approach, and thus provides a unique opportunity to study a system consistent with Cannon and colleagues’ recommendations. Specifically, unlike any other state, Louisiana has developed a QRIS that uses a single indicator of quality: an observed measure of teacher-child interactions, the Classroom Assessment Scoring System (CLASS, Pianta, La Paro, & Hamre, 2008). Although the focus on a single measure may be viewed by some as narrow, that is, as ignoring other important aspects of ECE quality, Louisiana chose to focus on this observational measure because of the growing consensus in the ECE literature that classroom-level processes, and specifically the provision of responsive teacher-child interactions, are among the most important elements of quality ECE programs, particularly with respect to promoting school readiness (Phillips et al., 2017, Yoshikawa et al., 2013). Louisiana identified process quality as the main developmental force in an ECE classroom, and chose to make it their sole focus. The state does collect information about many of the quality measures included in other states’ QRIS (e.g. teacher child ratios, teacher education levels, etc.) but these measures, though made public, do not factor into programs’ QRIS rating.

Louisiana’s approach makes it an important case study for both understanding the validity of one second generation QRIS, and exploring how second generation QRIS may compare to more typical QRIS scores, which combine many measures. That is, Louisiana provides a novel context for exploring the tradeoffs between a tightly focused definition of quality versus a more comprehensive approach, which can inform other states’ efforts to refine their QRIS. Answering this question is critical: if QRIS ratings measure quality in a way that is not systematically linked to children’s development, then these systems are unlikely to foster improvements that ultimately will benefit children.

This study combines CLASS observation data collected in Louisiana classrooms during the 2014–15 academic year, program structural quality measures (teacher education and experience, class size, parental involvement, health screenings, and developmental assessment use) and a set of direct child assessments to answer two key research questions. First, we explore the relationships between CLASS and children’s learning gains in math, language, and literacy as measured using the Woodcock Johnson, the Peabody Picture Vocabulary Test and the Test of Preschool Early Literacy (see Method for more information). We describe the extent to which categorical ratings of quality based only on the CLASS link to measures of children’s learning in those classrooms. Second, we examine whether these ratings provide stronger predictions of learning gains if other commonly used QRIS indicators were included. We do this using both regression and chi square analyses. This study provides new evidence about the predictive validity of one state’s approach to measuring quality within a QRIS; specifically, we find that CLASS scores predict children’s learning gains, and that adding additional measures of quality commonly used in QRIS does not meaningfully improve predictive power. Findings from this analysis can inform policymakers looking to balance reliable quality measurement with concerns about program resources.

Quality Rating and Improvement Systems reflect a policy effort to move educational accountability into the early childhood sector. Like K-12 accountability systems, QRIS aim to define and measure quality so that states can publicize these ratings for parents and caregivers; moreover some QRIS systems include additional incentives and sanctions designed to promote quality improvement. A substantial body of evidence from K-12 does suggest that accountability systems can lead to modest improvements in student outcomes (Dee and Jacob, 2011, Loeb and Figlio, 2011). However, this literature also highlights that the ways in which policymakers define, measure, and incentivize quality has a profound impact on how principals and teachers structure time and leverage limited resources (Grissom et al., 2017, Neal and Schanzenbach, 2010). There is substantial evidence that the design of the accountability system directly shapes both how educators define quality and the steps they take to make changes at their sites and in their classrooms. For example, research on No Child Left Behind finds that in response to the law’s incentives, schools reduced instructional time spent on non-tested academic subjects (Dee et al., 2013, Griffith and Scharmann, 2008, Hannaway and Hamilton, 2008, McMurrer, 2007, Pederson, 2007), and promoted the use of test-preparation pedagogies (Au, 2007, Diamond, 2007).

The notion that educators respond to the incentives laid out in an accountability system is not inherently problematic; rather, it dovetails with the goals of accountability systems. To the extent that the accountability system incentivizes practices that are valuable for children’s learning, this response is the very mechanism by which accountability promotes student’s development. However, such findings also suggest that accountability systems that do not correctly identify practices linked to children’s learning will not only fail to promote child development but may inadvertently undermine it (e.g., Burchinal et al., 2016, Cannon et al., 2017, Markowitz, 2018), highlighting the need for care and caution in defining and measuring quality, particularly in the ECE context.

To date, only one study has examined whether a similar responsiveness to QRIS ratings occurs in early childhood programs. Using data from North Carolina, Bassok, Dee, and Latham (2019) found strong evidence that programs do respond to ratings by making improvements on the quality metrics in the rating system. It is not clear however, whether these improvements in turn lead to improvements in children’s learning, and this link is likely to be based, at least in part, on whether the QRIS correctly incentivizes changes that are meaningfully related to young children’s development.

The central challenge of early childhood accountability systems is that it is difficult as well as expensive to reliably and validly measure children’s growth and development at scale for accountability purposes in a way that is developmentally appropriate for young children (National Association for the Education of Young Children Association, 2003, National Early Childhood Accountability Task Force, 2007, Snow, 2011, Waterman et al., 2012). Therefore measures of quality in ECE settings tend to focus on ECE inputs thought to be meaningfully linked to child development. These inputs tend to fall into two categories, measures of process quality, which are measures of the emotional warmth and instructional sophistication of the interactions between the teacher and child, and measures of structural quality, which includes program and teacher characteristics thought to be necessary, though not sufficient, for creating high levels of process quality (Burchinal, 2017).

QRIS ratings typically combine both structural and process quality measures, and vary widely in both how they define and measure quality and how these measures are combined into a single rating. Data from the QRIS Compendium, which tracks information about QRIS nationwide, show that nearly all QRIS include measures of staff qualifications (99%), curriculum use (90%), classroom environment or teacher-child interactions (90%), family partnerships (90%), program management and leadership (88%), health and safety (93%), and the provision of developmental assessments (75%) (Build Initiative and Child Trends, 2019). Some states use point systems, which allocate points for each quality indicator and assign ratings based on total points earned; other states use block systems which require that programs meet certain standards before moving up the rating system; and some states use a hybrid of these systems.

Historically, these measures have been developed by state policymakers with the guidance of researchers, experts, and stakeholders—and these policymakers faced myriad tradeoffs as they designed the first generation of QRIS (Connors & Morris, 2015). For instance, classroom observations have been consistently linked to child outcomes, and are therefore viewed as an important measure of quality (Burchinal, 2017). However, they are inherently time-consuming and costly to collect, and expensive rating systems leave fewer resources for quality improvement supports, an essential QRIS component in most states (Connors & Morris, 2015). Less costly measures of quality may free up resources for supports and may encourage more programs to participate, but may be weaker or more distal predictors of child outcomes, rendering the policy ineffective. Further, policymakers designing the first generation of QRIS were constrained by the lack of empirical evidence available; though many studies have examined predictors of learning in early childhood settings, a decade ago we did not have a body of research to turn to specifically on the validity of various approaches to combining quality indicators to create program ratings for the purpose of accountability.

Research on the predictive validity of both individual measures of ECE quality and QRIS ratings has increased substantially since the earliest QRIS were planned and implemented. Existing studies exploring the link between structural measures of and child outcomes largely have null findings (Early et al., 2007); in contrast, as noted above, observational measures of classroom processes are consistently, though modestly, linked to children’s development (Burchinal, 2017). Some studies do find associations between child outcomes and teacher and director qualifications, teacher professional development, program support and leadership (e.g. Barnett, 2011, Ehrlich et al., 2016), however these associations are not consistent, particularly in analyses using large multisite data and rigorous controls (Burchinal, 2017, Early et al., 2007, Mashburn et al., 2008). For example, Mashburn and colleagues (2008) found no relationship between several commonly used quality proxies including teachers’ education, field of study, class size, or teacher-child ratio and children’s academic and social skills during preschool.

Measures of process quality tend to be more consistently linked to children’s developmental outcomes, but associations vary based on the specific measure of process quality, and in most cases associations are modest to quite small. For example, Sabol and Pianta (2014) used a large, nationally representative sample to explore the association between children’s learning and their preschool program’s rating on the Early Childhood Environmental Rating Scale-Revised (ECERS-R), which is the most commonly used classroom observation tool in QRIS and found very little evidence of a relationship. Conversely, a large body of research has linked teacher-child interactions as measured by the Classroom Assessment Scoring System (CLASS, Pianta, Paro, & Hamre, 2008) to children’s learning gains, both academic and socio-emotional (Araujo et al., 2016, Hamre, 2014, Hamre et al., 2014), though, again, associations are small to modest, often ranging from 0.05 to 0.10 of a standard deviation (Burchinal, 2017). For example, children exposed to warm, responsive, and cognitively stimulating interactions with teachers as measured by the CLASS develop stronger social (Johnson, Seidenfeld, Izard, & Kobak, 2013), self-regulatory (Williford, Vick Whittaker, Vitiello, & Downer, 2013), language (Hindman & Wasik, 2015), and early academic skills (Howes et al., 2008). These associations are also found in studies using large samples (e.g. Keys et al., 2013), and are stronger at higher levels of quality (Weiland et al., 2013, Zaslow et al., 2016).

Finally, recent research also suggests that combining individual quality measures does not improve predictive validity. For example, Sabol et al. (2013) show that even when individual measures of quality are predictive of children’s learning, the ways in which state QRIS combine scores from several quality indicators into a composite score yields measures that are not systematically predictive of children’s learning. Moreover, QRIS validation studies, which attempt to link QRIS ratings to children’s learning gains (both academic and socio-emotional), have been largely discouraging, showing weak, inconsistent associations between QRIS ratings and developmental outcomes (Cannon et al., 2017, Karoly, 2014). In the 12 states that have been studied—California, Colorado, Delaware, Florida, Indiana, Minnesota, Missouri, Pennsylvania, Rhode Island, Virginia, Washington, and Wisconsin—four reports find no differences by QRIS (Sirinides, 2010, Tout et al., 2011, Tout et al., 2010, Zellman et al., 2008), and the rest find small associations that are typically non-linear (that is, not consistently distinguishing among all rating levels, but rather differentiating only among high- and low-quality programs) and significant for just one of the skill domains examined, most commonly subscales from teacher-reported socio-emotional skills (Elicker et al., 2011, Hestenes et al., 2015, Magnuson and Lin, 2016, Maxwell et al., 2016, Thornburg et al., 2009, Tout et al., 2016) or early literacy skills (Sabol and Pianta, 2015, Shen et al., 2009, Sirinides et al., 2015, Soderberg et al., 2016). Two studies have reported associations between QRIS ratings and directly assessed executive function (Karoly et al., 2016, Quick et al., 2016).

The lack of predictive validity among QRIS may be driven, in part, by limitations inherent to the existing research (Cannon et al., 2017, Karoly, 2014). Many QRIS validation studies were conducted as part of QRIS piloting, in young systems that were still working out the kinks of quality measurement. Many validation studies also relied on small samples drawn from voluntary systems, and these self-selected programs often lack variation with respect to quality.

At the same time, disappointing results from the existing body of validation studies suggest that current QRIS systems may not accurately identify centers that successfully promote child development. A central critique of QRIS is that they have failed to focus sufficiently on key drivers of quality that are most systematically related to child outcomes (e.g. Burchinal et al., 2016, Cannon et al., 2017, Karoly, 2014), leading to noisy measures that lack predictive validity. For this reason, there have been recent calls to amend existing QRIS systems, and in particular to create a second generation of QRIS that use fewer quality indicators that are more closely linked to children’s developmental outcomes (Burchinal et al., 2016, Cannon et al., 2017).

Each quality indicator incurs some cost to programs and states, both in terms of managing and evaluating programs’ data, and in terms of tradeoffs in professional development—centers attempting to improve on one QRIS indicator are not attempting to improve in another. Yet, if individual indicators are not tightly linked to child development such efforts may be wasted. QRIS focused on one or two core measures of quality that have been consistently linked to child outcomes may save resources in the long run, and in the short run help narrow quality improvement efforts to focus on only empirically validated inputs. While no one measure can capture the complex combination of factors that make some early childhood programs effective, and no single program characteristic is consistently linked with large gains in children’s development in the empirical literature, reducing complex QRIS to more transparent systems focused on just a few key indicators may help programs streamline improvement efforts and ultimately do more to promote high-quality early learning. To date, however, there is no empirical evidence that such a QRIS would outperform current rating systems.

Louisiana’s QRIS provides a unique opportunity to explore the relative efficacy of a second generation QRIS as compared to the typical multi-indicator QRIS currently in use in most states for a few reasons. First, Louisiana’s QRIS is mandated for all publicly-funded classroom-based ECE providers including all subsidized child care centers, Head Start programs and pre-kindergarten, meaning that it applies to a highly variable set of ECE providers, making it an interesting case for informing policymakers in other states. Second, although the state collects information on multiple quality indicators, the actual QRIS rating focuses exclusively on high-quality teacher-child interactions, a decision based on empirical evidence showing that measures of interactions are more consistently linked to child outcomes than most other measures of ECE quality (Burchinal, 2017, Hamre, 2014). Louisiana measures the quality of teacher-child interactions in every publicly-funded classroom in the state twice each year using a network of local raters and the Classroom Assessment Scoring System (CLASS). The CLASS tool is widely-used across policy applications; 18 states use CLASS as part of their QRIS systems, and Head Start uses CLASS as part of its professional development and quality monitoring. Louisiana is unique, however, in that it uses only CLASS scores to determine QRIS ratings; and these ratings are tied to incentives, including tax credits for programs and teachers. It is the only single-indicator QRIS in the country. Previous research in Louisiana has shown that these locally-collected CLASS scores are related to children’s learning gains in language, literacy, mathematics, and executive function (Vitiello, Bassok, Hamre, Player, & Williford, 2018); however it is unknown whether the addition of other QRIS indicators would better identify programs that are effectively promoting children’s development.

The present study links CLASS observations collected by the state of Louisiana’s local raters during the 2014–15 academic year with multiple program structural quality measures typically included in QRIS (teacher education and experience, class size, parental involvement, health screenings, and developmental assessment use) and a set of direct child assessments. We use data collected from 76 programs serving low-income four-year-olds across five diverse Louisiana parishes to address two questions. We first explore the predictive validity of a single-item QRIS, by exploring the relationship between CLASS and children’s learning gains in language, literacy, and mathematics. We replicate previous work linking CLASS to children’s development by estimating the association between CLASS and learning gains continuously, and then categorically to provide preliminary validation of the “second generation,” CLASS-only QRIS in Louisiana. Second, we explore whether the addition of other commonly used QRIS indicators improves predictive validity in two ways. First, we add seven additional structural quality measures to our regressions of learning gains on CLASS and compare coefficients. Second, we conduct policy simulations in which we compare the predictive validity of a categorical measure of CLASS to that of simulated QRIS scores created using the full suite of quality indicators.

Consistent with previous research both broadly speaking and in the Louisiana context (e.g. Hamre, 2014, Vitiello et al., 2018) we hypothesize that CLASS will be associated with learning gains both continuously and categorically. Furthermore, we expect limited associations between structural measures of quality and children’s gains. Finally, we expect that additional quality measures will not substantially improve the predictive validity of a CLASS-only QRIS.

Findings from this analysis provide preliminary evidence as to whether one second generation QRIS—in this case one with a single quality indicator chosen for its empirical links to child development—has predictive validity, and whether the measure is improved with the addition of other common quality indicators. Findings will inform states as they seek to amend their QRIS while balancing the costs of observational measurement with the importance of accurate measurement for achieving QRIS goals.

Section snippets

Data and sample

Data were collected during the 2014–15 academic year as part of a larger study of the pilot year of Louisiana’s QRIS. We combine data from multiple sources: director and teacher reported classroom quality information, CLASS scores collected by local, non-research team raters, and research-team collected direct assessments of children’s learning.

Our sample programs are drawn from five Louisiana parishes selected from 13 parishes that took part in Louisiana’s ECE reform “pilot year.” These

Results

Table 2 display correlations among student learning gains in language, literacy, and mathematics and all process and structural quality indicators. CLASS, teacher education, parent volunteer opportunities, and the provision of health screenings were all positively correlated with children’s learning gains in these uncontrolled correlations. Note that this table shows only small to modest correlations among independent variables, and some in unexpected directions. For example, though CLASS was

Discussion

Quality Rating and Improvement Systems are a near-ubiquitous policy tool designed to improve the quality of early educational programs system-wide. QRIS are modeled after K-12 accountability systems, but are limited by their inability to use direct assessments of children as indicators of quality, and instead measure classroom inputs. This limitation has led to diverse QRIS with complex multi-indicator systems typically designed to provide programs with many avenues for

Declaration of Competing Interest

The authors have no conflicts of interest to disclose.

Acknowledgements

This research was supported by a grant from the Institute of Education Sciences (R305A140069). Opinions reflect those of the authors and do not necessarily reflect those of the granting agency. We thank the Louisiana Department of Education for their willingness to share data for this project, and the children, teachers, and families who generously agreed to participate in the study.

References (70)

  • C. Weiland et al.

    Associations between classroom quality and children’s vocabulary and executive function skills in an urban public prekindergarten program

    Early Childhood Research Quarterly

    (2013)
  • Aikens, N., Bush, C., Gleason, P., Malone, L., & Tarullo, L. (2016). Tracking quality in Head Start classrooms: FACES...
  • M. Araujo et al.

    Teacher quality and learning outcomes in kindergarten

    The Quarterly Journal of Economics

    (2016)
  • W. Au

    High-stakes testing and curricular control: A qualitative metasynthesis

    Educational Researcher

    (2007)
  • W.S. Barnett

    Effectiveness of early educational intervention

    Science

    (2011)
  • D. Bassok et al.

    Inequality in preschool quality? Community-level disparities in access to high-quality learning environments

    Early Education and Development

    (2016)
  • D. Bassok et al.

    The effects of accountability incentives in early childhood education

    Journal of Policy Analysis and Management

    (2019)
  • Build Initiative and Child Trends. (2019). QRIS Compendium. Retrieved from...
  • M. Burchinal

    Measuring early care and education quality

    Child Development Perspectives

    (2017)
  • Burchinal, M., Hong, S. L. S., Sabol, T. J., Forestieri, N., Peisner-Feinberg, E., Tarullo, L., & Zaslow, M. (2016)....
  • F.A. Campbell et al.

    Early childhood education: Young adult outcomes from the abecedarian project

    Applied Developmental Science

    (2002)
  • J. Cannon et al.

    Quality rating and improvement systems for early care and education programs: Making the second generation better

    (2017)
  • T.S. Dee et al.

    The impact of No Child Left Behind on student achievement

    Journal of Policy Analysis and Management

    (2011)
  • T.S. Dee et al.

    The effects of NCLB on school resources and practices

    Educational Evaluation and Policy Analysis

    (2013)
  • D. Deming

    Early childhood intervention and life-cycle skill development: Evidence from head start

    American Economic Journal: Applied Economics

    (2009)
  • J.B. Diamond

    Where the rubber meets the road: Rethinking the connection between high-stakes testing policy and classroom instruction

    Sociology of Education

    (2007)
  • Dunn, L. M., & Dunn, D. M. (2013). Peabody picture vocabulary test, technical report. Retrieved from...
  • Dunn, L. M., & Dunn, L. M. (1997). Peabody picture vocabulary test-third edition: Manual. Circle Pines, MN: American...
  • D.M. Early et al.

    Teachers’ education, classroom quality, and young children’s academic skills: Results from seven studies of preschool programs

    Child Development

    (2007)
  • Ehrlich, S. B., Pacchiano, D. M., Stein, A. G., & Luppescu, S. (2016). Essential organizational supports for early...
  • Elicker, J., Langhill, C., Ruprecht, K., Lewsader, J., & Anderson, T. (2011). Paths to QUALITY, Indiana’s child care...
  • G. Griffith et al.

    Initial impacts of No Child Left Behind on elementary science education

    Journal of Elementary Science Education

    (2008)
  • J.A. Grissom et al.

    Strategic staffing? How performance pressures affect the distribution of teachers within schools and resulting student achievement

    American Educational Research Journal

    (2017)
  • B.K. Hamre

    Teachers’ Daily interactions with children: An essential ingredient in effective early childhood programs

    Child Development Perspectives

    (2014)
  • B.K. Hamre et al.

    Evidence for general and domain-specific elements of teacher–child interactions: Associations with preschool children’s development

    Child Development

    (2014)
  • Cited by (9)

    View all citing articles on Scopus
    View full text