When teenagers awake in the morning of their 18th birthday, neither their abilities nor their personalities have changed fundamentally. However, in several countries, they become full-grown adults overnight. Societies treat teenagers and adults in categorically different ways despite the dimensional nature of the relevant underlying traits and abilities (e.g., intelligence, decision-making ability, personality development, etc.).

This tendency of categorization also lies at the core of psychiatric diagnoses, fundamental controversies about categorical versus dimensional approaches to mental disorders notwithstanding (e.g., Coghill & Sonuga-Barke, 2012). The formation of diagnostic categories is important for the scientific investigation of mental health problems and for developing effective treatments. However, some scholars argue that mental disorder labels, such as “depression”, “anxiety disorder”, or “schizophrenia”, can be the causes of stigmatization (e.g., Corrigan, 2007; Link et al., 1989). These scholars worry that the public might treat people with mental health diagnoses negatively because diagnoses are associated with widespread stereotypical beliefs.

Teachers often encounter students with diagnoses. In a meta-analysis, Polanczyk et al. (2015) found a world-wide prevalence of mental disorders in children and adolescents of 13.4%. The reasoning outlined above gives rise to concerns about stigmatization of students with learning and behavioral disorders. People in general tend to associate diagnostic labels with stereotypical attitudes (e.g., Carrizosa-Moog et al., 2019; Cuttler & Ryckman, 2019) and teachers might be no exception to the rule. Take the example of ADHD. Given the prevalence of ADHD of about 5% (Polanczyk et al., 2007), teachers often will encounter at least one student in their class with this label. As a teacher learns that one of their new students has been diagnosed with ADHD, they might automatically expect the student to have serious problems paying attention and sitting still (Batzle et al., 2010). The teacher might even expect the student to disrupt the classroom and to perform poorly in class exams (Ohan et al., 2011). The present study seeks to understand such negative effects of the labels dyslexia, dyscalculia, and ADHD.

Mental Health Stigma

Theories of mental health stigma differentiate between stereotypes (i.e., beliefs about people suffering from mental illness), prejudice (i.e., evaluative reactions), and discrimination (i.e., overt negative behavior; A. B. Fox et al., 2018; Rüsch et al., 2005). Mental health stereotypes and prejudice are assumed to be learned via socialization (Link et al., 1989) and later triggered by diagnostic labels (Corrigan, 2007). However, diagnostic labels are not only presumed to be activating cues, theory suggests that they can also amplify stereotypes and prejudice (Corrigan, 2007). Every clinical label is associated with diagnostic criteria, and these criteria imply that mentally ill people share the same basic characteristics. This can evoke the perception of mentally ill people as a homogenous out-group. Furthermore, diagnostic criteria can be interpreted by people as evidence that mental disorders are extremely difficult to treat or even unalterable.

Research supports the assumed role of diagnostic labels in the formation of stigmatizing believes and discriminatory behavior. Foroni and Rothbart (2011, 2013) reported that female silhouettes were perceived as more homogenous when categorized with eating disorder labels. Meta-analytic research suggested that the labels psychopathy, psychosis, antisocial personality disorder, and paraphilia led to harsher punishments of the defendant. Labeled defendants were also perceived to be more dangerous and less amenable to treatment (Berryessa & Wohlstetter, 2019). Cuttler and Ryckman (2019) manipulated the presence of the labels delusional disorder, schizophrenia, bipolar disorder, major depressive disorder, and alcohol use disorder and found a negative impact on ratings of perceived aggressiveness, attention-seeking, incompetence, confusion, disorganization, embarrassment, unreliability, unhappiness, and volatility. Finally, there is evidence that labels which identify the person with the disorder (e.g., “He is an epileptic”) cause even worse judgements about the person than labels implying that a person has a certain disorder (e.g., “He has epilepsy”; Carrizosa-Moog et al., 2019; Cuttler & Ryckman, 2019).

Label Effects in the Classroom

Self-Fulfilling Prophecy

One process possibly associated with label effects on teachers’ performance expectations is the self-fulfilling prophecy in the classroom (e.g., Jussim & Harber, 2005; Madon et al., 2011). In studies exploring this phenomenon, teachers are told that some of their students had performed well in a test of the potential for IQ development. When tested later, the students with the alleged potential performed comparatively well in class exams or IQ tests (Raudenbush, 1984). Given that these students were randomly selected, their performance improvement can be attributed to the teachers’ expectation and corresponding behavior. Thus, simply labeling a child can have a positive impact on the child’s intellectual development by changing teachers’ expectations. However, existing research on negative effects of the self-fulfilling prophecy is inconclusive (Jussim & Harber, 2005; Madon et al., 2011).

Experimental Studies

After watching a video about a child labeled emotionally disturbed or learning disabled, teachers rated the child’s personality, behavior, and academic future (Foster et al., 1975; Foster & Ysseldyke, 1976; Jacobs, 1978) and the child’s academic skills, activity level, and personal-social adjustment (Foster et al., 1976; Foster et al., 1980; Ysseldyke & Foster, 1978) more negatively compared to an unlabeled child. Other video studies found comparable results when the child was labeled as behaviorally disordered (Johnson & Blankenship, 1984) and “educable mentally retarded” (Foster & Keech, 1977). Nonetheless, there are other video-based studies in which not every label had an effect (Allday et al., 2011; Shuller & McNamara, 1976), labels affected only some of the dependent variables (Fogel & Nelson, 1983), or a label effect was found only with vignettes but not with videos (Dukes & Saudargas, 1989; Reschly & Lamprecht, 1979; Salvia et al., 1973). Some studies that used only videos did not find a label effect at all (Cornett-Ruiz & Hendricks, 1993; Fernald et al., 1985; Yoshida & Meyers, 1975).

Vignettes with the label learning disability led to more negative academic, social, and behavioral expectations (Harvey & Pellock, 2003; Minner, 1982; Minner & Prater, 1984; Thelen et al., 2004) and to a reduced tendency to refer a talented child to a program for gifted children (Bianco, 2005; Bianco & Leech, 2010; Minner et al., 1987; Minner, 1990). Beyond that, a broad range of labels have been identified to cause a negative label effect such as “mentally retarded” or “educable mentally retarded” (Aloia & MacMillan, 1983; Carroll & Reppucci, 1978; Thelen et al., 2004), emotionally disturbed (Gillung & Rucker, 1977; Parish et al., 1979), behavior disorder (Harvey & Pellock, 2003), and ADHD (Ohan et al., 2011; Stinnett et al., 2001).

However, it is important to note that several vignette studies suggested that there are limitations of negative label effects (Aloia et al., 1981; Bromfield et al., 1988; Combs & Harper, 1967; J. D. Fox & Stinnett, 1996; Levin et al., 1982; Rolison & Medway, 1985). Finally, several vignette studies have yielded null results (Duke & Prater, 1991; Graham & Leone, 1987; Javel & Greenspan, 1983; Kedar-Voivodas & Tannenbaum, 1979; Minner, 1989; O’Donohue & O’Hare, 1997; Pfeiffer, 1980; Tournaki, 2003).

Summary

Despite inconsistent findings and null results, considerable evidence exists for negative effects of diagnostic labels on teachers’ expectations, especially for the learning disability label and—to a much lesser extent—ADHD. However, while many studies have explored effects of the broad term learning disability, research on specific learning disabilities is sparse. One study found negative effects on the evaluation of students described as having a specific learning disability (Harvey & Pellock, 2003). Furthermore, only three studies investigated the impact of the dyslexia label and yielded inconclusive results. Levin et al. (1982) found no effect of that label, whereas Gibbs and Elliott (2015) reported evidence for negative and positive effects, but these results were later only partially replicated (Gibbs et al., 2020). No published study has explored the effects of the dyscalculia label. Our first experiment was conducted to fill this research gap.

Attributional Patterns Triggered by Labels

It is vital to explore the specific cognitive mechanisms that are triggered by diagnostic labels. Several studies have provided some clues to these processes (Clark, 1997; Vlachou et al., 2014; Woodcock, 2014; Woodcock & Hitches, 2017; Woodcock & Moore, 2018; Woodcock & Vialle, 2010, 2011, 2016). In all of these studies, teachers experienced less frustration and more sympathy toward a student with a learning disability, and they gave the student more positive feedback compared to a student without the diagnosis. Furthermore, they expected future failure of a diagnosed child to be more likely than failure of an undiagnosed child.

The attributional pattern explored by these studies can be interpreted as an amplification of teachers’ general tendency for a fundamental attribution error (Wang & Hall, 2018). When a child has a learning disability, the disability itself is perceived as an internal and stable cause of the child’s academic problems. Thereby, teachers neglect situational influences on the child’s behavior and focus on the student’s lack of control over its condition. Consequently, teachers react with less frustration, more sympathy, and more positive feedback, and they expect the student to fail in the future. This interpretation coheres well with the literature on mental health stigma. The diagnosis of a learning disability triggers stereotypes about the stability of a student’s condition. Furthermore, the diagnostic criteria of learning disabilities (e.g., extreme difficulties in acquiring basic academic skills) are interpreted as indicative of students’ inability to improve their academic performance.

Nevertheless, this interpretation is limited by the fact that the learning disability label has not been stated in the studies (Clark, 1997; Vlachou et al., 2014; Woodcock, 2014; Woodcock & Hitches, 2017; Woodcock & Moore, 2018; Woodcock & Vialle, 2010, 2011, 2016). The learning disability diagnosis was only implied for participants in the experimental group by describing the student as having problems with math, reading, writing, and as receiving support from a resource specialist. Furthermore, participants were not asked about their perception of the student’s problem control and problem stability.

However, three other studies in which a label was mentioned provided evidence for the outlined attributional pattern (Severence & Gasstrom, 1977; Stanley & Comer, 1988; Weisz, 1981). Participants in these studies judged a labeled child’s problems to be more stable and less under the child’s control than the problems of an unlabeled child. However, these results are limited to the outdated “mentally retarded” label. Consequently, it remains unclear whether other labels, such as ADHD, influence teachers’ attributions. For this reason, we examined label effects and attributional patterns caused by an ADHD diagnosis in Experiment 2.

Counteracting Negative Effects of Diagnostic Labels

Despite the numerous studies that have investigated interventions for reducing mental illness stigma (Corrigan et al., 2012; Rüsch et al., 2005), research is lacking on interventions for changing negative label effects in the classroom, with the exception of familiarizing participants with rating methods (Graham & Dwyer, 1987; Madle et al., 1980), providing contact with diagnosed children (Herr, 1975; Herr et al., 1976), or educating participants about disorders (Kutcher et al., 2016; Ohan et al., 2008; Parish et al., 1977; Toye et al., 2019). However, the evidence provided by these studies is limited due to small sample sizes, partially outdated labels, or the lack of an unlabeled control group.

To explore a different approach, we drew on cognitive dissonance theory (Festinger, 1957). In a typical dissonance intervention, participants are asked to generate arguments that counter their initial attitude. This request is often couched in a cover story that makes this activity seem reasonable in the context of the experiment. Arguing against one’s own attitude creates cognitive dissonance, which can be reduced by changing the initial attitude. An important part of this procedure is that participants engage in the argument generation voluntarily because otherwise they might reduce cognitive dissonance by attributing the activity to external causes (Cooper, 2007). A dissonance-based intervention seems promising because research has shown that people change their attitudes and their behavior through the mechanisms of dissonance reduction (Festinger & Carslmith, 1959; Kenworthy et al., 2011; McGrath, 2017; Stone & Fernandez, 2008). Based on this research, psychologists have deployed dissonance-based strategies to reduce stereotypes and prejudice (Heitland & Bohner, 2010), promote condom use (Stone et al., 1994), encourage the conservation of water while showering (Dickerson et al., 1992), and diminish aggression and violence in relationships (Schumacher & Slep, 2004). Other researchers have developed effective interventions for promoting ecofriendly (Osbaldiston & Schott, 2012) or health-enhancing behavior (Freijy & Kothe, 2013), and for the prevention of eating disorders (Watson et al., 2016).

The Present Research

In order to pinpoint the influence of diagnostic labels, we opted for an experimental approach based on the presentation of vignettes similar to the vignette-based approaches in the literature reviewed above. Vignettes are well-suited for keeping the information constant that is presented to participants and for manipulating only the presence of a diagnostic label. Our first aim was to explore negative label effects of specific learning disorders in German teacher students. In Experiment 1, we added the label dyslexia or dyscalculia to otherwise identical descriptions of student’s performance in reading or mathematics. We constructed the vignettes as descriptions of problems that are typical for dyslexia or dyscalculia. For this purpose, the vignettes described a discrepancy both between the child’s domain-specific performance (i.e., math or reading and spelling) and its general school related skills and a discrepancy between the child’s domain-specific performance and the performance of children of similar age. We predicted that, compared to descriptions without the label, the dyslexia label would lead to more negative performance expectations in the area of reading (Hypothesis 1a) and that the dyscalculia label would lead to more negative performance expectations in the area of mathematics (Hypothesis 1b).

In Experiment 2, we examined label effects induced by the ADHD label. For this purpose, we used vignettes that were originally constructed, validated, and successfully deployed in research by Ohan et al. (2011). Because these vignettes contained typical descriptions of ADHD-related symptoms and because Ohan et al. (2011) found large negative label effects using them, we deemed the vignettes as suitable for our research purpose. We predicted that participants’ academic expectations would be more negative when a label was given compared to the same vignette without the label (Hypothesis 2). We further hypothesized that the ADHD label would lead to a stronger tendency to perceive a student’s academic problems as stable (Hypothesis 3) and as not under the student’s control (Hypothesis 4). We also predicted that a dissonance-based intervention would diminish the label effects (Hypothesis 5). Finally, we were curious about potential gender differences in label effects of ADHD. Existing research on such differences is inconclusive (Batzle et al., 2010; Eisenberg & Schneider, 2007; Lee et al., 2019; Ohan et al., 2011). Thus, this part of our study was exploratory.

The present study adds to existing literature by providing the first investigation of the effects of the dyscalculia label and the first comparison of the dyscalculia and the dyslexia label. Furthermore, the study contributes by exploring the multifaceted effects of the ADHD label and by trying to impede these effects with a dissonance-based approach. Both contributions are important because, for the purpose of make teaching practice more unbiased, it is vital to understand the different effects of different labels and to explore ways of counteracting these effects.

Experiment 1

Method

Participants

A total of 163 teacher students (studying to become elementary school teachers, middle/secondary school teachers, teachers at an academic-track high school, or special education teachers) participated. They were recruited from university lectures on school related psychology topics and participated voluntarily. Eleven participants were excluded because of randomly answering the questions, and thus 152 students (84% female) remained in the final sample. Most of the participants (82%) were first-semester students.

Material

We constructed two pairs of vignettes (length: 91–103 words) describing a boy in fourth grade with problems in mathematics or in reading and spelling. Two of them used the label dyslexia or dyscalculia, the other two the terms “reading and spelling difficulties” or “arithmetic difficulties.” Apart from this variation, the two pairs of vignettes were identical. The vignettes described the boys as having poor spelling and math abilities in the form of low normed scores in standardized tests prior to entering a specialized treatment with a percentile of 6 in spelling and a percentile of 5 in arithmetic tests. Additional information was stated concerning the number of correct answers and errors in the standardized tests. Prior to the training, the dyslexia vignette reported 30 spelling errors in 50 words and stated an average of 12 spelling errors as the norm in regular developing students in fourth grade. For the test of arithmetic skills, the information about the performance prior to the training was a raw score of 8 from 45 points, with average students achieving 29 points in fourth grade as the frame of reference (see supplementary material for full wording of the vignettes https://osf.io/rks5g/).

In addition, a filler vignette was constructed (length 45 words) describing a girl performing at an average level in physical education who is being trained in ball throwing.

Dependent Variables

The dependent variables assessed participants’ expectations regarding the boy’s future performance in the field of the difficulties described in the vignettes. First, participants were asked to predict the number of mistakes the same boy would make or the score the boy would achieve at the end of the school year after a one-year training in spelling (for the vignettes describing a boy with reading and writing difficulties) or calculation (for the vignettes describing a boy with arithmetic difficulties). Second, participants were asked to estimate the boys’ grade before and after the training (in Germany, grades range from 1 = very good to 6 = insufficient).

Procedure

At the end of lecture session, students were informed about the study and invited to participate on a voluntary and anonymous basis. The study was characterized as an exploration of how teachers evaluate their students. Students were asked to use their smartphones or laptops to participate. They were presented with a link leading them to an online questionnaire. Participants were randomly assigned to one of four groups. First, participants were asked to indicate their gender, their current semester, and the type of school for which they were studying. Then one of the four vignettes was presented, followed by the filler vignette and a second vignette. The presentation of the vignettes was balanced as follows: Group 1 (spelling difficulties – filler – dyscalculia; n = 41), Group 2 (dyscalculia – filler – spelling difficulties; n = 33), Group 3 (dyslexia – filler – arithmetic difficulties; n = 40), Group 4 (arithmetic difficulties – filler – dyslexia; n = 38).

Design and Data Analysis Strategy

The statistical analysis of Experiment 1 was based on a one-factorial between-group design with the independent variable presence of label (label vs. no label). Labeling effects were analyzed separately for dyslexia and dyscalculia because in the spelling vignettes, errors were estimated, and in the arithmetic vignettes, raw score performances were given. Therefore, the dependent measures were not directly comparable. A repeated measures ANOVA was computed for the grades, which participants predicted prior to and after the intervention. Participants’ predictions of standardized test results were analyzed with a one-way ANOVA. For all hypothesis tests, the Type-I error probability was set at .05. To facilitate potential future meta-analyses, we also report effect sizes for nonsignificant effects.

Results

We hypothesized that performance expectations would be more favorable when participants had received the vignette without the label compared to the vignette with the label (negative label effect, Hypothesis 1a). In line with this prediction, performance expectations (number of errors) for the boy’s spelling performance after the training were more favorable after reading the non-label vignette (M = 18.95; SE = 0.40) compared to the vignette with the dyslexia label (M = 20.22; SE = 0.41), F(1, 150) = 4.89, p = .014, ηP2 = .03. For the prediction of grades, the hypothesis of a negative label effect was also supported. We found no difference between the conditions on predicted grades before the training, (label: M = 4.99; SE = 0.08; non-label: M = 4.92; SE = 0.08), F(1, 150) = 0.387, p = .535, ηP2 < .01. Participants predicted a strong overall effect of the training F(1, 150) = 919.67, p < .001, ηP2 = .86, but an even stronger effect after reading the vignette without the label as indicated by a significant interaction of measuring time and label, F(1, 150) = 4.04, p = .023, ηP2 = .03.

The hypothesis of a negative label effect (Hypothesis 1b) was not supported for participants’ predictions of arithmetic performance. The predictions of the boy’s arithmetic test performance after the training did not differ significantly between the non-label vignette group (M = 19.32; SE = 0.57) and the vignette group with the dyscalculia label (M = 18.27; SE = 0.54), F(1, 150) = 1.77, p = .185, ηP2 = .01. Predicted grades before the training did not differ between the conditions, (label: M = 4.97; SE = 0.08; non-label: M = 5.06; SE = 0.07), F(1, 150) = 0.76, p = .386, ηP2 < .01. As with the reading and spelling vignettes, we found a strong main effect for the predicted improvement of grades from before to after the training, F(1, 150) = 1014.17, p < .001, ηP2 = .98. However, the interaction of time with presence of labels failed to reach significance, F(1, 150) = 1.80, p = .091, ηP2 = .01.

Discussion

Experiment 1 supported the prediction of a negative label effect for dyslexia (“problems in reading and spelling”, Hypothesis 1a), but no clear support was found for the dyscalculia label effect (“problems in arithmetic”, Hypothesis 1b). In both domains, participants demonstrated a general pedagogic optimism for the effects of the training described in the vignettes as indicated by the strong main effect of time on the boy’s grades. In the presence of this strong main effect, it might have been difficult to establish the predicted ordinal interaction effect with the presence of labels (Strube & Bobko, 1989). In sum, however, Experiment 1 suggests that negative label effects may differ between learning disorders.

Experiment 2

Method

Preregistration, Power Analysis, and Data Exclusion

Experiment 2 was fully preregistered at Open Science Framework (https://osf.io/te49j). Prior to data collection, we decided to sample 200 participants. A sensitivity analysis using G*Power for the expected 2 × 2 between-within interaction (α = .05, 1-β = .80) yielded an effect size of η2 = .0381. Thus, an experiment based on 200 participants was sufficiently powered to detect a small to medium effect.

A criterion for data exclusion was defined prior to data collection based on the minimum time participants needed to process the vignettes and questions. Assuming a reading speed of 400 words per minute, reading the shortest vignette and all related questions (265 words) would require at least 40 s. Assuming that answering each of the nine questions in fast speed takes about 0.5 s, we deemed a processing time of 45 s or more as necessary to read the vignette and answer the questions. Therefore, responses for a vignette were not regarded as valid and were discarded from the analysis if they were given in less than 45 s.

Participants

A total of 201 teacher students (studying to become elementary school teachers, middle/secondary school teachers, teachers at an academic-track high school, or special education teachers) voluntarily participated in Experiment 2. They were recruited from university courses on school related psychology topics. Based on the criteria for data exclusion, two participants were excluded, leaving 199 students (80% female) in the sample. Most of the participants (93%) were first-semester students.

Material

The first author adapted two critical vignettes from Ohan et al. (2011) and translated them into German. In the vignettes, a seven- and a nine-year-old student are described, both of which display typical ADHD behavior such as being constantly restless, not following instructions, getting distracted, and intruding on other children’s activities (see supplementary material). Depending on the group assignment, the student’s name was male (Eric/Alexander) or female (Erica/Alexandra). One type of vignette (Eric/Erica, 9 years old) always described the student without a diagnosis, whereas in the other type (Alexander/Alexandra, 7 years old), participants were given the additional information that the child was examined and diagnosed with ADHD. The four different vignettes were comparable in length (145–158 words). To disguise the research question and reduce carry-over effects, participants were presented with a filler vignette very similar to the one deployed in Experiment 1 (see supplementary material).

Moreover, we designed a short task to induce dissonance in the dissonance intervention group. The instruction was as follows (translated from German):

At first, we would like to ask you for your assistance: For the purpose of another study, we need some remarks about disadvantages of clinical diagnoses for class. Sometimes teachers have to deal with children that were diagnosed with certain disorders (e.g., “a specific reading and spelling disorder”). Please develop some arguments why diagnoses can have a negative impact on the teacher-student relationship and why teachers should not give too much weight to such diagnoses. Your arguments will be presented to teachers in another study in an anonymized form. Of course, you are free to follow this request or not. However, it would be very helpful to us if you developed some arguments.

Research suggests that people experience dissonance when they perceive their action as voluntary and when their action has the potential of impacting other people (Cooper, 2007). When people have the impression that they are forced into action by scientists or that their behavior does not really matter, they will unlikely experience dissonance. For this reason, participants were reminded that they were not required to do the task, if they did not want to, and they were told the cover story about other teachers being confronted with their arguments. Participants in the no intervention group were presented with an unrelated filler task. They were asked to list arguments why using computers in class can be beneficial for teaching. To keep the instructions similar to the dissonance task, participants were reminded that performing the task was voluntary and were told that teachers would be presented with their arguments in another study (see supplementary material).

Dependent Variables

Each of the three dependent variables (i.e., academic performance expectations, perceived stability, and perceived control) was measured using three items based on Likert-type ratings. Participants rated how much they agreed with specific statements on a 7-point scale (ranging from 1 = no support to 7 = full support). For example, one statement regarding future academic performance expectations was (translated from German): “[Child’s name] will experience difficulties following lessons in the future.” One statement about the stability of the child’s academic problems was: “[Child’s name]’s problems in school are stable by nature and difficult to change.” One statement regarding the child’s control over his problems was: “[Child’s name] has little influence on the extent of [his/her] problems” (see supplementary material for all statements). One of the three items about future expectations and about perceived stability and two items about perceived controllability were reverse coded. As a result, higher values from all scales indicated more negative evaluations. All nine statements were presented in a random order simultaneously with the related vignette.

Given that all items were designed specifically for this study, the quality of the deployed scales was unknown. We computed a factorial analyses and eliminated one item from further analyses because its factor loadings were contrary to theoretical expectations (see supplementary material). Cronbach’s alpha was computed separately for the three items that measured a given variable relating to the labeled case and for the three items that measured a given variable relating to the unlabeled case. The internal consistency of the scales on performance expectations was good (vignette with label: Cronbach’s α = .77; vignette without label: Cronbach’s α = .66), whereas the reliability of the scales about perceived control was just acceptable (vignette with label: Cronbach’s α = .56; vignette without label: Cronbach’s α = .61). The internal consistency of the perceived stability scales was rather poor (vignette with label: Cronbach’s α = .30; vignette without label: Cronbach’s α = .33).

Procedure

Participants were recruited in a manner similar to Experiment 1. At the beginning of the online survey, participants were randomly assigned to the dissonance intervention or the no intervention group. In the dissonance intervention group, participants started with the dissonance task, whereas no intervention participants started with the filler task. There was no time limit on these tasks. Next, all participants were informed that their task for the rest of the study was to evaluate fictitious stories about students. It was stressed that participants should always answer in accordance with their personal opinion.

Each participant saw one vignette about a labeled child and one vignette without a label. Order of presentation was counterbalanced. Each vignette and all related questions were always presented on one page. In both groups, the students in the vignettes were male for one half and female for the other half of the participants. Between both critical vignettes, participants processed the filler vignette.

After evaluating the vignettes, participants indicated their gender, their current semester, and the type of school for which they were studying. At the end, participants were thanked and provided with a short explanation of the study’s purpose.

Design and Data Analysis Strategy

Experiment 2 was based on a 2 (dissonance intervention: treatment vs. no treatment) × 2 (gender of depicted student: male vs. female) × 2 (presence of label: label vs. no label) mixed design, with the first two independent variables varied between participants and presence of label varied within participants. To examine the effects of intervention and presence of label, three separate ANOVAs based on a two-factorial 2 (dissonance intervention: treatment vs. no treatment) × 2 (label: present vs. absent) design were computed with performance expectations, perceived stability, and perceived control as the dependent variables. To address the additional exploratory research question on effects of the depicted student’s gender, three separate ANOVAs based on a three-factorial 2 (dissonance intervention: treatment vs. no treatment) × 2 (label: present vs. absent) × 2 (gender of depicted student: male vs. female) design were computed with performance expectations, perceived stability, and perceived control as the dependent variables. To facilitate possible future meta-analyses, effect sizes are presented also for nonsignificant effects, and full descriptive statistics are reported in the supplementary material.

Results

Effects of Label and Dissonance Intervention

Label Effect on Performance Expectations

We stated in Hypothesis 2 that performance expectations would be more negative for a labeled child compared to a child without a label. We found a significant main effect of label, F(1, 197) = 40.10, p < .001, ηp2 = .17. In contrast to the hypothesis, however, performance expectations were more negative regarding the unlabeled child (M = 4.67, SE = 0.06) compared to the labeled child (M = 4.28, SE = 0.07) (Fig. 1).

Fig. 1
figure 1

Performance expectations. Note: Ratings of performance expectations by presence of label and dissonance intervention vs. no intervention. Error bars show standard errors of the mean

Label Effect on Perceived Stability

Our prediction in Hypothesis 3 was that participants would judge the stability of a labeled child’s problems to be higher than the problem stability of an unlabeled peer. This hypothesis was supported by a significant main effect of label, F(1, 197) = 9.10, p = .003, ηp2 = .04. Participants perceived the stability of the child’s problems to be higher when the child was labeled (M = 3.03, SE = 0.07) compared to an unlabeled child (M = 2.84, SE = 0.07) (Fig. 2).

Fig. 2
figure 2

Perceived stability. Note: Ratings of perceived stability by presence of label and dissonance intervention vs. no intervention. Error bars show standard errors of the mean

Label Effect on Perceived Control

We predicted in Hypothesis 4 that perceived control would be lower for a labeled child than for an unlabeled child. This hypothesis received support from a significant main effect of label, F(1, 197) = 33.57, p < .001, ηp2 = .15. Participants rated the unlabeled child’s amount of control to be higher (M = 4.07, SE = 0.07) than the control of the labeled child (M = 4.48, SE = 0.07) (Fig. 3).

Fig. 3
figure 3

Perceived control. Note: Ratings of perceived control for label and no label condition and dissonance intervention vs. no intervention. Error bars show standard errors

Effect of Dissonance Intervention

In Hypothesis 5, we expected no label effects to occur in the group that received the dissonance intervention. We found no support for this hypothesis in our confirmatory analyses. No significant interaction was found for performance expectations, F(1, 197) = 1.52, p = .220, ηp2 < .01, and no significant main effect of the dissonance intervention, F(1, 197) = 0.38, p = .586, ηp2 < .01. Furthermore, no significant interaction F(1, 197) = 0.02, p = .895, ηp2 < .01, and no significant main effect of the dissonance intervention, F(1, 197) = 0.99, p = .322, ηp2 < .01, was found for perceived stability. Finally, no significant interaction was found for perceived control, F(1, 197) = 0.31, p = .578, ηp2 < .01. However, we found a small significant main effect of the dissonance intervention, F(1, 197) = 4.45, p = .036, ηp2 = .02. Participants in the group receiving no intervention judged the children to have lesser control over their problems (M = 4.40, SE = 0.09) than participants in the group receiving the dissonance intervention (M = 4.15, SE = 0.09).

To understand why there was no evidence for the effectiveness of the dissonance intervention we looked at the data more closely and found that a substantial amount of participants in both groups (no intervention n = 31, intervention n = 44) did not leave any comment in the control or dissonance task. Since it is highly unlikely that participants experienced dissonance without performing the initial task, we reran the ANOVAs with those participants only that followed instructions in the first task (no intervention group n = 68, intervention group n = 56). Again, there was no interaction for performance expectations, F(1, 122) < 0.01, p = .958, ηp2 < .01. However, we found a significant main effect of group, F(1, 122) = 5.51, p = .021, ηp2 = .04. Participants in the no intervention group (M = 4.60, SE = 0.09) had more negative performance expectations than participants in the intervention group (M = 4.27, SE = 0.10). For perceived stability, we found no significant interaction, F(1, 122) = 0.72, p = .399, ηp2 < .01, and no significant main effect of group F(1, 122) = 1.07, p = .303, ηp2 < .01. Participants’ ratings in the no intervention (M = 2.96, SE = 0.10) and in the intervention group (M = 2.81, SE = 0.11) were nearly identical with a slight descriptive tendency towards participants in the no intervention group perceiving the problem stability to be higher. For perceived control, we found no significant interaction, F(1, 122) = 0.36, p = .547, ηp2 < .01. However, there again was a significant main effect of group, F(1, 122) = 6.61, p = .011, ηp2 = .05. Participants’ ratings in the no intervention group (M = 4.46, SE = 0.10) were more negative than the ratings of participants in the intervention group (M = 4.09, SE = 0.11).

These exploratory results suggest that the intervention was indeed effective but that the effect did not depend on whether the ADHD label was mentioned or not. Participants in the dissonance intervention group evaluated students’ future performance and problem control more positive regardless of label.

Gender Effects

Performance Expectations

The analysis revealed no significant three-way interaction, F(1, 195) = 0.20, p = .648, ηp2 < .01, no significant interaction of label and gender, F(1, 195) = 0.99, p = .319, ηp2 < .01, no main effect of gender, F(1, 195) = 2.18, p = .142, ηp2 = .01, and no significant interaction of the dissonance intervention and gender, F(1, 195) = .78, p = .378, ηp2 < .01.

Perceived Stability

The three-way interaction was not significant, F(1, 195) = 1.00, p = .318, ηp2 < .01. Furthermore, the presence of a label did not interact significantly with the child’s gender, F(1, 195) = 1.94, p = .166, ηp2 = .01. The main effect of gender was not significant, F(1, 195) = .35, p = .555, ηp2 < .01. However, we found a significant group*gender interaction, F(1, 195) = 5.90, p = .016, ηp2 = .03 (boy control group M = 2.82, SE = 0.12, girl control group M = 3.17, SE = 0.12, boy dissonance group M = 3.00, SE = 0.12, girl dissonance group M = 2.77, SE = 0.12).

Perceived Control

The three-way interaction was not significant, F(1, 195) = 0.04, p = .847, ηp2 < .01. None of the two-way interactions were significant (interaction of label and gender: F(1, 195) = 0.43, p = .511, ηp2 < .01; interaction of dissonance intervention and gender: F(1, 195) = 2.01, p = .158, ηp2 = .01). Finally, we found no significant main effect of gender, F(1, 195) = 0.32, p = .859, ηp2 < .01.

Discussion

Experiment 2 yielded evidence contrary to the expected negative label effect (Hypothesis 2). The ADHD label did not lead to more negative performance expectations. Instead, participants had more negative expectations for the unlabeled child than the labeled child. However, our findings provided evidence for Hypotheses 3 and 4: Participants perceived the problems of a child with the ADHD label to be more stable and less under the child’s control than the problems of an unlabeled child. The results of Experiment 2 did not support Hypothesis 5. The effects of the ADHD label was not affected by a dissonance-based intervention. Finally, we found no evidence for an influence of the child’s gender on label effects.

In sum, Experiment 2 simultaneously provides evidence for positive and negative effects of the ADHD label. The finding that participants perceived a labeled student’s problems to be more stable and less under control is in line with the notion of negative attributional patterns triggered by diagnostic labels. However, participants unexpectedly had more positive performance expectations for a child with ADHD than for an unlabeled child, which suggests that the ADHD label can have a positive impact on teacher trainees. This result is in line with studies that found positive label effects (Fernald & Gettys, 1980; Jellison & Duke, 1994; Gibbs & Elliott, 2015; Nah & Tan, 2021; Ohan et al., 2011).

One speculative explanation for the positive label effect is that the ADHD label provides teachers with closure about the causes of their student’s problems and gives them hope for effective treatment. If teachers interpret the presence of the ADHD label as an indication that the child’s problems can be treated, they might expect better future performance by the child. This explanation is consistent with the finding of a negative impact of the label on evaluations of the child’s control. Teachers might conceive the labeled child’s problems as less under its control and simultaneously see the presence of a diagnostic label as a prospect for improvement. From this point of view, children with ADHD have less control over their problems because of the disorder, but the disorder will be treated therapeutically in the future. However, this explanation is difficult to reconcile with our finding that participants evaluated the labeled child’s problems as relatively stable.

The fact that the results supported Hypotheses 3 and 4 and no support was found for Hypothesis 5 in the confirmatory analyses can at first be interpreted as evidence against the effectiveness of dissonance-based interventions in changing negative label effects. These findings suggest that some of the processes triggered by diagnostic labels might not be altered by briefly raising awareness about potential problems of diagnostic labels for the teacher-student interaction. However, additional exploratory analyses of the data provided by those participants that actually followed instructions revealed that the dissonance intervention led to an overall more positive evaluation of students’ future academic performance and their problem control. Therefore, dissonance interventions might be effective but not in a label-dependent manner. Perhaps, participants resolve the cognitive dissonance raised by the intervention by paying less attention to students’ challenges no matter if a diagnostic label is present or not. Finally, the experiment provided no evidence for gender effects across all three dependent variables.

General Discussion

This study shows that diagnostic labels influence teachers’ evaluation of students in different ways. Experiment 1 yielded some support for Hypothesis 1, which stated that learning disorder labels lead to more negative performance expectations. Significant effects were found only for the dyslexia and not for the dyscalculia label. Perhaps German teacher trainees perceive dyslexia more as a “real” disorder than dyscalculia. This might be due to the fact that children with dyslexia often receive compensatory privileges in Germany (e.g., being allowed to spend more time with class exams), whereas these privileges are generally not given to students with dyscalculia.

Experiment 2 revealed negative as well as positive effects of the ADHD label. It is plausible that the ADHD label provides teachers with closure about their student’s challenges, which in turn can increase teacher’s optimism about effective interventions for the child. Thereby, the ADHD label raises the teachers’ academic expectations. Simultaneously the ADHD label leads to negative attributional patterns in teachers. The negative evaluation of the child’s problem stability and problem control triggered by the label is an adverse aspect of the categorical understanding of ADHD. Moreover, we found no unequivocal evidence for the effectiveness of the short dissonance intervention.

Taking into account that negative effects of the ADHD and the learning disorder label have been found in teachers who have evaluated their students in real world settings (e.g., Eisenberg & Schneider, 2007; Knight, 2021; Schwehr et al., 2014; Shifrer, 2013, 2016; Shifrer, Callahan, & Muller, 2013; Whitley, 2010), our results point to the urgency of developing effective interventions for counteracting negative label effects. For this purpose, more research into the specific cognitive processes triggered by diagnostic labels is needed. Understanding under which conditions a label triggers positive effects and under which conditions a label triggers negative effects is crucial. Identifying such boundary conditions could help in developing an intervention that mitigates negative label effects by emphasizing the positive aspects of a diagnostic label.

Limitations and Directions for Future Research

Our experiments are limited in several ways. First, only teacher trainees participated, and, therefore, results cannot be generalized to teachers in service or to other occupational groups.

Another limitation is the restricted quality of the scales used in Experiment 2. The measure of perceived stability had rather poor internal consistency and consisted of only two items, whereas the other two scales showed acceptable and good internal consistencies. Moreover, the scales have never been validated independently on a large-scale sample. Future research should design and validate a comprehensive questionnaire measuring teacher’s performance expectations, perceived stability, and perceived control of their students.

Using vignettes in studies inevitably creates an artificial situation for participants. Therefore, this study lacks ecological validity. Consequently, researchers should supplement experimental studies with ecologically valid studies in the future.

Furthermore, it is vital to note that the intervention deployed in Experiment 2 was rather short. Moreover, it seems possible that the dissonance intervention itself activated schematic knowledge associated with ADHD when participants encountered the vignette, leading to the unspecific positive effects of the intervention. A more comprehensive dissonance-based intervention that can avoid making the disorder salient might counter negative label effects in a more specific manner.

Finally, to our knowledge Experiment 2 is the first preregistered investigation of negative label effects on teacher’s evaluation of students. Since publications without preregistration on average report higher effects than registered publications (Schäfer & Schwarz, 2019), there is a need for more preregistered studies on label effects.

Conclusion

Our study shows that different labels can cause different label effects. These findings underscore the importance of addressing implicit personality theories, stereotypes, and label effects in teacher education to minimize biased teaching styles in the classroom. Furthermore, the study highlights the importance of further investigating differences in label effects in order to develop effective interventions to counteract negative label effects.