Disentangling the Association Between the Big Five Personality Traits and Student Achievement: Meta-Analytic Evidence on the Role of Domain Specificity and Achievement Measures

Meyer, Jennifer; Jansen, Thorben; Hübner, Nicolas; Lüdtke, Oliver

doi:10.1007/s10648-023-09736-2

Disentangling the Association Between the Big Five Personality Traits and Student Achievement: Meta-Analytic Evidence on the Role of Domain Specificity and Achievement Measures

META-ANALYSIS
Open access
Published: 31 January 2023

Volume 35, article number 12, (2023)
Cite this article

Download PDF

You have full access to this open access article

Educational Psychology Review Aims and scope Submit manuscript

Disentangling the Association Between the Big Five Personality Traits and Student Achievement: Meta-Analytic Evidence on the Role of Domain Specificity and Achievement Measures

Download PDF

8764 Accesses
8 Citations
19 Altmetric
1 Mention
Explore all metrics

Abstract

Students’ academic achievement is a central predictor of a long list of important educational outcomes, such as access to higher education and socioeconomic success. Prior studies have extensively focused on identifying variables that are related to academic achievement and an important variable in this context appears to be students’ personality. Notably, although findings from more recent studies suggested that the association between student achievement and personality varies by the subject domain (language vs. STEM) and the type of achievement measure (grades vs. test scores), systematic meta-analytical evidence is still lacking. To address this gap in the educational research literature, we conducted a meta-analysis based on 78 studies, with 1491 effect sizes representing data from 500,218 students and 110 samples from elementary to high school. We used a random-effects model with robust variance estimation to calculate mean effect sizes and standard deviations. We found moderating effects of measure or domain for all five personality traits, with differences in the direction of the effects. Our results highlight the importance of the domain and measure when examining how personality traits relate to academic achievement in school. The combination of subject domain and achievement was also found to be relevant for some of the traits. These findings emphasize that subject domains and types of achievement measures should be explicitly considered when investigating the personality saturation of student achievement. We discuss implications for future research, highlighting that there is no “best” or “more objective” achievement measure but, instead, that achievement measures should be chosen based on the research question of interest.

The Use of Cronbach’s Alpha When Developing and Reporting Research Instruments in Science Education

Article Open access 07 June 2017

Theories of Motivation in Education: an Integrative Framework

Article Open access 30 March 2023

Parental Educational Expectations and Academic Achievement in Children and Adolescents—a Meta-analysis

Article 24 October 2019

Students’ academic achievement is a central predictor of many important life outcomes such as educational attainment, retention, post-school choices, work and life satisfaction, long-run earnings, and work performance (Melin et al., 2003; Mottaz, 1984; Spengler et al., 2018; Wilmot & Ones, 2019). Over the past decades, educational researchers have invested considerable effort in identifying important predictors of students’ academic achievement. As a result of this, studies have repeatedly documented the relevance of students’ personality in predicting academic achievement (Borghans et al., 2016; Hübner et al., 2022; Lechner et al., 2017; Meyer et al., 2019). Two extensive meta-analyses have provided important support for these findings (Mammadov, 2022; Poropat, 2009). Specifically, Poropat (2009) operationalized student achievement in terms of grade point averages (GPA) and Mammadov (2022) considered a blend of all available different achievement measures. However, the moderating role of achievement measures or subject domains was not considered in either prior meta-analysis.

This gap in the literature is somewhat surprising because prior research has emphasized that the correlation of different types of achievement measures (e.g., grades and test scores) is far from perfect (Borghans et al., 2016; Willingham et al., 2002) and that the association between student achievement and personality can substantially vary by subject domain and achievement measure (e.g., Brandt et al., 2020; Hübner et al., 2022; Lechner et al., 2017; Meyer et al., 2019; Tetzner et al., 2020). For example, several studies found larger associations between openness and standardized test scores compared to grades, whereas associations between conscientiousness and grades were larger compared to standardized test scores (e.g., Meyer et al., 2019; Spengler et al., 2013).

Furthermore, recent educational psychological work has developed and successfully tested a comprehensive framework to better explain differential associations between personality traits and different types of achievement measures. Based on this, meta-analytically investigating the potentially moderating influence of subject domains and types of achievement measures on the personality saturation of student achievement seems highly important in order to extend our understanding of how students’ characteristics might differentially shape different measures of their achievement (Hübner et al., 2022).

Personality Traits and Academic Achievement

Personality traits describe “relatively enduring patterns of thoughts, feelings, and behaviors that reflect the tendency to respond in certain ways under certain circumstances” (Roberts, 2009, p. 140). The most frequently applied and studied taxonomies of personality traits are the five-factor model (FFM) and the Big Five, both of which contain neuroticism, extraversion, openness, conscientiousness, and agreeableness (John et al., 2008; McCrae & Costa, 1987; McCrae & John, 1992); these are the personality traits that we consider in this article. Please note that throughout the article, we use the terms FFM and Big Five synonymously.

The personality traits described in the FFM have been found to play a key role in student achievement (Borghans et al., 2016; De Raad & Schouwenburg, 1996; Israel et al., 2022). The personality saturation of achievement measures is typically explained by different types of behavior that are beneficial for student learning and achievement. As one example, students who score high on conscientiousness more actively engage in course work and in doing their homework (Gray & Watson, 2002). In addition, the role of personality traits in achievement situations has been found to accumulate across the life span; researchers have hypothesized several (causal) pathways for these cumulative effects, explaining why achievement and personality traits are related (Caspi et al., 2005; see also De Raad & Schouwenburg, 1996; Roberts et al., 2007). What has been found to be most relevant for the developmental transition during adolescence is the process of active niche picking (Caspi et al., 2005; Roberts et al., 2007). It has been suggested that students choose educational experiences and environments whose qualities match their personalities (Lüdtke et al., 2011). Differential mechanisms can be assumed for each of the five traits to understand how personality traits and academic achievement are related (for detailed descriptions of why each individual trait can be related to achievement, see De Raad & Schouwenburg, 1996; Poropat, 2009). In the current study, we focused on how personality is reflected in different achievement measures and subject domains, as described in the next section.

Taking a Closer Look: The Relevance of Academic Subject Domains and Different Achievement Measures

Recent research has suggested that the role that personality traits play in student achievement largely depends on the academic subject domain and varies for different achievement measures (Brandt et al., 2020; Hübner et al., 2022; Lechner et al., 2017; Meyer et al., 2019).

Subject Domains

Considering subject domains when investigating the association between personality traits and achievement in educational research is important for at least two reasons. First, learning in school takes place in different subject domains, such as students’ first and second languages, mathematics, and sciences, and domain-specific achievement shapes student career decisions even beyond subject-specific expectancy beliefs and interests (e.g., Guo et al., 2017; Nagy et al., 2008) and, thus, can influence the decision to pursue careers in different fields (e.g., STEM; Hübner et al., 2017; Jansen et al., 2015; Perez et al., 2014; Schoon & Eccles, 2014). In addition, a person’s belief that they are either a “math person” or a “language person” can be relevant to student achievement and engagement (Wan et al., 2021). Using an overall GPA when investigating the association between students’ personality and their achievement, as was done in prior meta-analyses on this topic, does not adequately reflect such domain specificities, as such an achievement measure is based on a blend of different subject grades that might count differently toward the GPA (Brookhart et al., 2016; Hübner et al., 2020). Second, taking a closer look at the specific patterns of the personality saturation of achievement measures on the level of subject domains was found to constitute a promising step towards achieving a better understanding of the differential associations and underlying mechanisms. This is reflected in many empirical studies that have hypothesized and tested such domain-specific associations (e.g., Meyer et al., 2019; Spengler et al., 2013; Spinath et al., 2010). Vedel (2016) investigated the subject domain as a moderator of the association between personality traits and achievement in higher education. In her study, Vedel (2016) found that different traits were related more or less strongly to different academic majors in college, thus providing important evidence to support the idea that subject domain might play an important role in understanding how personality relates to student achievement.

Achievement Measures

Currently, a long list of different achievement measures exists, which typically distinguishes between grades and standardized test achievement (Willingham et al., 2002). As outlined in previous research, grades and standardized test scores are only moderately correlated, which means that they, at least in part, might depend on different student characteristics (Borghans et al., 2016; Hübner et al., 2022). A major difference between the two achievement measures lies in the process of assigning grades, which is based on students’ work in class and is less objective/standardized than the process of assigning scores directly from standardized tests. This means that grades more strongly depend on subjective teacher evaluations than standardized test results do.

PASH (personality–achievement saturation hypothesis; Hübner et al., 2022) provides a comprehensive framework to describe differences between different achievement measures. In this framework, it is argued that personality manifests itself in a more pronounced way (“higher personality saturation”) under certain circumstances that are reflected in different achievement measures. PASH distinguishes between five different features of achievement measures that should be considered when studying the personality–achievement association in the field of education (i.e., standardization, relevance, curricular validity, instructional sensitivity, and cognitive ability saturation). Grades are typically characterized by a low level of standardization, a moderate to a high level of relevance, high curricular validity, high instructional sensitivity, and moderate to high cognitive ability saturation. Standardized tests, in contrast, are described as being highly standardized, lower in relevance, curricular validity, and instructional sensitivity, and high in cognitive ability saturation. In sum, the framework and the empirical findings of Hübner et al. (2022) underline the great importance of making a more fine-grained distinction between different types of achievement measures in order to better understand the differential associations between personality and student achievement.

Empirical Findings: An Overview of Each Personality Trait

Openness to Experience

Openness describes “the breadth, depth, originality, and complexity of an individual’s mental and experiential life” (John et al., 2008; p. 120). Meta-analytical results on GPA or composite scores have shown positive associations between openness and student achievement (r = 0.10; Poropat, 2009; r = 0.13; Mammadov, 2022). However, empirical evidence suggests that there is a domain-specific pattern: openness can be assumed to reflect a more verbal and less of a numerical orientation (e.g., Marsh et al., 2006; Noftle & Robins, 2007) and, thus, might be less beneficial for subject domains requiring the ability to revise default procedures or emphasize rules and regulations (see Gatzka & Hell, 2017) and might be more beneficial for subject domains that include rewards for critical thinking and creative ideas (i.e., language subject domains; Gatzka & Hell, 2017; Lipnevich et al., 2016). Some studies were able to support these assumptions, showing larger associations between openness and achievement in language compared to STEM subject domains (e.g., Hübner et al., 2022; Meyer et al., 2019; Spengler et al., 2013). However, there are some contrasting results, showing negative associations with language grades (e.g., Brandt et al., 2020; Hendriks et al., 2011; Westphal et al., 2020b).

Considering the role of achievement measures, standardized tests might include unfamiliar tasks that are less closely associated with the curriculum and are less instructionally sensitive. It could be assumed that students scoring high on openness might be able to better apply their cognitive prerequisites in such foreign assessment situations as they tend to seek out intellectually stimulating situations (Schwaba et al., 2017). Knowledge acquired during these intellectual free-time activities can enhance achievement in standardized tests (Willingham et al., 2002). Some prior findings support these assumptions, indicating a higher relevance of openness for standardized tests (e.g., Hübner et al., 2022; Meyer et al., 2019; Spengler et al., 2013). However, some studies found evidence that openness had negative effects on STEM achievement that was measured with tests (Meyer et al., 2019) and grades (Hendriks et al., 2011; Spengler et al., 2013).

Conscientiousness

Conscientiousness describes “socially prescribed impulse control that facilitates task- and goal-directed behavior, such as thinking before acting, delaying gratification, following norms and rules, and planning, organizing, and prioritizing tasks” (John et al., 2008, p. 120). Conscientiousness is the personality trait with the most substantial association with academic achievement, with a mean correlation of r = 0.19 (Poropat, 2009; r = 0.20; Mammadov, 2022). Associations of conscientiousness with grades are consistent for both language and STEM subject domains across studies (e.g., De Fruyt et al., 2008; Rosander et al., 2011). Its associations with academic effort and hard work make conscientiousness a beneficial trait for students in all areas of academic achievement (Noftle & Robins, 2007; Trautwein et al., 2009). Nonetheless, even though generally positive effects were found across studies, some studies observed stronger associations for STEM compared to language subject domains (e.g., Brandt et al., 2020; Meyer et al., 2019). This could be due to the fact that these particularly challenging subject domains (i.e., STEM) require persistent learning behavior and analytical thinking to understand equations and solve problems (see Duckworth & Seligman, 2005; MacCann et al., 2009). However, some studies also found effects in the opposite direction, with higher effect sizes found for language than for STEM subject domains (e.g., Israel et al., 2019; Spengler et al., 2013).

PASH (Hübner et al., 2022) suggests that for achievement measures where teachers’ personal preferences have less influence on the evaluation process, the role of conscientiousness should decrease as conscientiousness-related behavior that students show in class (e.g., doing homework, actively engaging in class) is not as relevant for highly standardized achievement measures as for achievement measures with low standardization (e.g., grades). Thus, conscientiousness is likely to be more closely related to measures that are low in standardization, as is typical of course grades (e.g., Brandt et al., 2020; Lechner et al., 2017; Noftle & Robins, 2007; Spengler et al., 2013). However, previous findings for standardized test scores are not consistent, as some studies obtained nonsignificant results and also found negative associations in both language and STEM subject domains (Meyer et al., 2019; Spengler et al., 2013; Westphal et al., 2020b).

Extraversion

Extraversion is “an energetic approach toward the social and material world and it includes traits such as sociability, activity, assertiveness, and positive emotionality” (John et al., 2008, p. 120). It can be assumed that associations of extraversion with achievement might depend on the social features of the learning contexts (De Raad & Schouwenbourg, 1996; Mammadov, 2022), which can vary across subject domains (Brandt et al., 2021; Meyer et al., 2019). For example, extraverted behavior in the classroom can be beneficial in language subject domains because oral language competencies can be a part of the assessments, but extraverted behavior could be distracting in STEM subject domains. Such effects might also depend on the achievement measures, as extraverted behavior is easily observable and might therefore influence grading but be less relevant for standardized tests. Some empirical studies were able to support these assumptions, showing higher associations of extraversion with grades (Brandt et al., 2021; Meyer et al., 2019) and in language subject domains (Brandt et al., 2020, 2021; Israel et al., 2019).

Agreeableness

Agreeableness is a factor that “contrasts a prosocial and communal orientation toward others with antagonism and includes traits such as altruism, tender-mindedness, trust, and modesty” (John et al., 2008, p. 120). It can be assumed that students’ agreeable behavior in the classroom can be beneficial in influencing teachers’ evaluations and, thus, grading. For example, agreeable students enjoy classroom discussions (Chamorro-Premuzic et al., 2007), and agreeableness relates to a preference for structure, cooperation, and social participation in the classroom (Pawlowska et al., 2014). Such effects may be less relevant for standardized tests. Prior studies have supported this assumption by finding larger effect sizes for grades than for test scores (e.g., Brandt et al., 2021). However, some studies also showed small positive associations with test scores (Brandt et al., 2021), whereas others found negative correlations (Meyer et al., 2019). With regard to a possible moderating effect of the subject domain, some studies have suggested positive associations between agreeableness and achievement in both language (e.g., Brandt et al., 2021; Israel et al., 2019; Steinmayr & Spinath, 2008) and STEM (Dumfart & Neubauer, 2016) domains; others found negative correlations (Meyer et al., 2019; Spengler et al., 2013).

Neuroticism

Neuroticism contrasts “emotional stability and even-temperedness with negative emotionalities, such as feeling anxious, nervous, sad, and tense” (John et al., 2008, p. 120). Differential effects can be hypothesized regarding the role of achievement measures. Students who score higher on neuroticism take their schoolwork seriously to prevent making mistakes and they strive for perfectionism (Smith et al., 2019). However, students with higher scores on neuroticism perceive their environment as more stressful compared to more emotionally stable students; they are more vulnerable to stress (Ebstrup et al., 2011; McCrae, 1990; Murberg & Bru, 2007; Szabó, 2011; Uliaszek et al., 2010) and disengagement in coping (Carver & Connor-Smith, 2010), which is related to test anxiety (Hoferichter et al., 2014). Thus, neuroticism can be observed by teachers and incorporated into grading.

However, the direction of the associations is not fully clear. On the one hand, high neuroticism can lead to careful learning behavior given its association with perfectionism (Smith et al., 2019). On the other hand, nervousness can also lead to less accepted social behavior. For example, high neuroticism, which can be associated with higher anxiety, can lead to less involvement in classroom discussions and this can have detrimental effects on grades. Matching these contrasting hypotheses, empirical studies found that, for STEM grades, effect sizes tended to be negative (e.g., Israel et al., 2019) or nonsignificant (Meyer et al., 2019). However, some findings also indicated positive associations of neuroticism with language grades (e.g., Brandt et al., 2020; Meyer et al., 2019; Rosander et al., 2011). For standardized test scores, negative associations with neuroticism are assumed as anxiety can be triggered by highly stressful (testing) situations (Byrne et al., 2015). Previous studies have shown associations with test scores to be largely negative, with some nonsignificant findings found in language subject domains (e.g., Furnham et al., 2009; Israel et al., 2019; Meyer et al., 2019). Accordingly, domain-specific effects can be hypothesized, for example, anxiety often occurs in math, as reflected by a large amount of research on mathematics anxiety and evidence of negative associations with achievement (Barroso et al., 2021).

Summary of Empirical Findings

Overall, the empirical literature suggests that considering both the academic measure and the domain is important. However, the results of the studies conducted up to now do not make it possible to answer the question of how important these features are for the overall associations of personality traits with achievement. To illustrate the need for a meta-analysis that takes achievement measures and subject domains into account, Table 1 shows how the pattern of results varies depending on the achievement measure and the domain (see also Supplementary Material, Table S3 for more details on the individual studies).

Table 1 Distribution of effect sizes in the studies depending on measure and domain

Full size table

For example, for openness, previous findings have been positive in language subject domains for both tests and grades, but largely nonsignificant for STEM grades. For conscientiousness, most correlations have been positive for grades across domains but nonsignificant for test scores across domains. For extraversion, more studies have shown positive associations in language subject domains, whereas the results in STEM subject domains have been nonsignificant. For agreeableness, the pattern suggests more positive findings in language subject domains and nonsignificant or negative findings in STEM subject domains. However, some studies have also shown negative associations in language subject domains. For neuroticism, most findings have been nonsignificant for language grades, negative for STEM grades, and negative for test scores in both language and STEM subject domains.

The Present Study

Previous meta-analyses that considered students in elementary and secondary education (e.g., Mammadov, 2022; Poropat, 2009) did not consider potential domain-specific association patterns. This is surprising because a rich set of recent studies has emphasized the importance of a closer consideration of subject domains and achievement measures as potential moderators of the personality–achievement association (e.g., see Table 1). At the same time, a fine-grained distinction between the association patterns of different subject domains (i.e., language vs. STEM) and achievement measures (i.e., grades vs. standardized tests) is required for a better theoretical and practical integration of findings on the personality saturation of achievement measures. To the best of our knowledge, the current study is the first to make this distinction while meta-analytically investigating this research topic.

To do this, we examined three main research questions. In Research Question (RQ) 1, we investigated whether the subject domain moderates the association between personality traits and student achievement. For openness, we expected to find higher effect sizes for language subject domains, on the basis of the current literature (e.g., Hübner et al., 2022; Meyer et al., 2019; Spengler et al., 2013). For conscientiousness, we expected to find positive associations across domains. For extraversion, agreeableness, and neuroticism, we conducted exploratory analyses given the largely inconsistent pattern of results found in empirical studies up to now.

In RQ2, we investigated whether the type of achievement measure moderates the association between personality traits and student achievement. For openness, we expected to find larger associations with standardized tests compared to grades (e.g., Hübner et al., 2022; Meyer et al., 2019; Spengler et al., 2013). For conscientiousness, we expected to find larger associations with grades compared to test scores (e.g., Brandt et al., 2019; Lechner et al., 2017; Noftle & Robins, 2007; Spengler et al., 2013). For extraversion and agreeableness, it seems plausible that the personality saturation would be greater for grades compared to standardized tests. However, we conducted exploratory analyses given the largely inconclusive pattern of empirical results this far. For neuroticism, we had reasons to assume effects in both directions, as described above, and therefore conducted exploratory analyses.

Finally, in RQ3, we investigated whether there is evidence for a two-way moderation effect of academic subject domain and type of achievement measure (i.e., subject domain × achievement measure) on the effect sizes (i.e., the correlation between personality trait and achievement). In other words, we investigated whether the effects found between subject domains are similar across measures or whether the effects found between subject domains are enhanced or reduced in grades or standardized test scores, respectively. As no prior studies were identified that investigated this interaction effect, we addressed this research question in an exploratory way.

Methods

Literature Search

We conducted a search of the following databases to identify relevant articles for this meta-analysis: PsycINFO, Web of Science, PubMed, ERIC, and ProQuest. The database searches used the following terms and Boolean operators: (academic OR education OR school) AND (grade OR GPA OR performance OR achievement) AND (personality OR temperament). We limited our search to articles and dissertations that were made available before August 10, 2022. We excluded correlations with measures comparable to the FFM traits, such as the Extraversion and Neuroticism Scales of the Eysenck Personality Questionnaire (Eysenck & Eysenck, 1975) or the HEXACO (Ashton & Lee, 2008). Thus, in all the studies that we selected, measures had been used that had been validated as assessing the FFM dimensions. This ensured comparability between the findings. The abstracts were screened by the first author. A subsample of 617 studies was screened by a second rater (student assistant); this resulted in an absolute agreement of 95.7% regarding the reports sought for retrieval and 99.5% regarding the reports ultimately included in the meta-analysis.

In addition to the literature search in the databases, all of the studies reported in the meta-analyses of Poropat (2009) and Mammadov (2022) were considered for inclusion in this meta-analysis (see Supplementary Material, Table S2). We included all of the studies included in prior meta-analyses that reported estimates differentiating between subject domains and/or measures (Mammadov: n = 13; Poropat: n = 39), resulting in 41 studies that were included from previous reviews. This means that our review includes 37 additional studies that were not considered in the previous meta-analyses. The literature search is illustrated in Fig. 1.

Please note that we excluded studies that did not provide individual effect sizes for the respective measures or subject domains, that is, studies that reported only composites averaged across both measures and subject domains (see “Eligibility Criteria” below). Because of this, we did not include as many studies as Mammadov (2022) did. In the Supplementary Material, Table S2, we provide an overview of the studies that were included in the previous meta-analyses by Poropat (2009) and Mammadov (2022) on how personality traits relate to academic achievement. Neither of those prior meta-analyses investigated the role of different types of achievement measures or subject domains. We also list the studies that were included in the current meta-analysis and we provide reasons for why we did not include some of the studies that were included in the prior meta-analyses.

Eligibility Criteria

Inclusion

We included (a) studies reporting at least one bivariate correlation between one of the five-factor personality traits and at least one achievement measure, (b) studies reporting separate effect sizes for achievement measures (e.g., grades, test scores) or subject domains (e.g., STEM, language), (c) studies conducted in populations from elementary and secondary education, (d) studies written in the English language, and (e) studies using self-reported grades as the achievement measure.

Exclusion

We excluded (a) studies reporting self-ratings, academic self-concepts, or other measures of students’ academic self-beliefs as the achievement measure (e.g., Cao & Meng, 2020; Gatzka, 2021; Ghapanchi et al., 2011), and (b) studies that did not provide bivariate correlation coefficients. This ensured comparability between studies. If the studies only reported betas (i.e., standardized coefficients) of a regression (n = 19), we contacted the corresponding author; eight authors (42%) replied and sent us the correlations. We also excluded (c) studies whose effect sizes were based on the same sample as another study included in our meta-analysis: when two studies analyzed the same samples, we included the study that provided more data relevant to our research question (e.g., we included Kappe & van der Flier, 2010, and Westphal et al., 2020b, but excluded Kappe & van der Flier, 2012, and Westphal et al., 2020a).

Similarly, some studies provided longitudinal analyses with measures of personality and achievement at multiple time points. To avoid using the same data basis more than once, we coded only the first measurement because, usually, the first data collection contains the most data points due to dropouts in later measurements.

Further, we excluded (d) studies that focused on other personality frameworks (e.g., Avram et al., 2019). Although there is some overlap between the dimensions of the FFM and those of the HEXACO (Ashton & Lee, 2008) and Eysenck and Eysenck’s Big Three (Eysenck & Eysenck, 1975), we followed Poropat (2009) and included only studies with measures validated to assess the FFM personality traits. This allowed for a high internal validity of our results. Finally, we excluded (e) studies that reported only composite measures of achievement, such as GPA, or other composites that were averaged across both measures and subject domains. We did not exclude studies that averaged across subject domains but differentiated between test scores and grades (i.e., if individual grades in math and English were averaged, we included this effect size but coded it as mixed for the subject domain).

Coding^{Footnote 1}

Two independent raters coded each study that fulfilled the eligibility criteria. Overall, three raters were involved in the coding procedure: the first author and two trained student research assistants. Discrepancies were solved by discussion. Interrater reliability, measured with Cohen’s kappa, was 0.95. We coded all studies on the following variables for each reported effect size: personality trait, subject domain, achievement measure, effect size (correlation), publication status, sample size, first variable of the correlation (i.e., FFM traits or intelligence), personality reliability, and achievement reliability. We coded reliability to correct measurement errors (see “Statistical Corrections”). For descriptive reasons, we also coded mean age, percentage of female students, and region of data collection (see Supplementary Material, Table S1). We also coded educational level but we were unable to perform more specific analyses due to the small number of effect sizes in elementary education (see Supplementary Material, Table S1). We coded the subject domain, classifying science and mathematics as STEM and subject domains focusing on languages as language. We coded subjects that could not be classified according to this distinction, such as history, sports, and social sciences, as “other domains”. These subjects included religious studies, geography, history, practical (a composite of art, music, home and consumer studies, and crafts), social science/studies, sports, humanities, and music. However, each of these subjects appeared in no more than two studies, which is why a detailed, subject-specific investigation was not possible in this category. Details on the effect sizes that we coded as “other domains” can be found in Table S6 in the Supplementary Material.

One important coding issue related to FFM measures was the use of deviating labels for the same scales. This was most important for the neuroticism dimension, which is sometimes called emotional stability, reflecting the opposite pole of the same measure. In this case, the arithmetic signs for all correlations between neuroticism and achievement were reversed (e.g., minus to plus) when we coded the correlations in order to ensure comparability. If necessary, we also recoded data on achievement so that higher values indicated higher levels of achievement. We also coded whether studies reported correlations with measures of intelligence to investigate the association of personality traits and achievement while controlling for cognitive abilities.

Statistical Analyses

We summarized 1491 effect sizes, representing data from 500,218 students and 110 samples from elementary to high school. We used the Fisher r-to-z transformed correlation coefficient as the outcome measure. Effect sizes were weighted by the inverse sampling variance. We calculated the sampling variance using Formula 12.27 from Borenstein et al. (2009). We applied a random-effects model to calculate the mean effect size for each personality trait and estimated the amount of heterogeneity using the restricted maximum-likelihood estimator (Viechtbauer, 2005). To calculate standard errors within the subgroup analyses, we used the pooled subgroup variance instead of the variance of each subgroup, following recommendations from Rubio-Aparicio et al. (2020). We estimated τ (i.e., the standard deviation of the true effects across studies) as a measure of the between-study variation.

Dependency of Effect Sizes

Some studies investigated the relation between multiple measures and subject domains and thus reported more than one effect size. Effect sizes from the same meta-analytic sample are dependent because they share the same method and sample. In this meta-analysis, the number of effect sizes within studies ranged from 1 to 94. Averaging effect sizes from the same study without correction would underestimate the amount of between-study heterogeneity (Schmidt & Hunter, 2015). Taking into account the dependency in our analyses, we corrected the sampling error using (cluster) robust variance estimation (Hedges et al., 2010). We carried out the analysis using R (version 4.1.0, R Core Team, 2020) and the metafor package (version 3.0.2, Viechtbauer, 2010).

Computing Partial Correlations ρ^I

To investigate partial correlations while correcting for intelligence, we used the sample estimates for each of the effect sizes and computed the partial correlations using the correlations we had coded from the original studies between intelligence and personality traits. The average correlations between personality and intelligence, as well as between achievement and intelligence can be found in Table S5 in the Supplementary Material.

Statistical Corrections

Measurement error can affect the magnitude of the correlations. Following the approach of Poropat (2009), we corrected the correlations on the basis of the reliability of the instruments before we averaged them. For these corrections, we used estimates of Cronbach’s alpha provided by each study wherever possible. If studies did not report an estimate of alpha (which was the case in 7% of the effect sizes included), estimates were obtained from the original validating studies for the relevant personality scale (Poropat, 2009). If no validating study was available, we used an estimate of alpha derived from Viswesvaran and Ones’s (2000) meta-analysis of FFM reliabilities. For the achievement measures, only 19% of the studies reported estimates of alpha for academic achievement. Following a procedure reported by Poropat (2009), we corrected self-reported grades for their unreliability using the estimated reliability of self-report grades of 0.86 as used by Mammadov (2022). When studies did not report reliabilities for standardized tests, we used the mean reliability of all achievement measures as an estimate (0.90). We chose this estimation method because the studies did not provide the information necessary to obtain more accurate reliability estimates for the achievement tests.

For the achievement measures, reliability, as measured with Cronbach’s alpha, ranged from 0.68 (Rosander et al., 2011) to 0.95 (Westphal et al., 2020b), with a mean of 0.80 based on all reliabilities that were reported in the original studies. For the personality measures, reliability ranged from 0.32 (Brandt et al., 2020) to 0.85 (Bergold & Steinmayr, 2018), with a mean of 0.69 across all personality traits. The greatest difference between reported and corrected effect size was found in Hübner et al. (2022), with 0.44 (reported) and 0.64 (corrected). Across all studies, the absolute difference between reported and corrected effect size was small (absolute mean difference = 0.03). As a robustness check, we compared the results for corrected and uncorrected effect sizes and found a similar pattern (see Table 3, columns for ρ and ρ^raw). We corrected correlations for scale reliability prior to the combination of the correlations into overall estimates because corrected correlations are more directly comparable than raw correlations (Schmidt & Hunter, 1996).

Analyses of Potential Bias

Publication Bias

First, to assess potential publication bias, we aimed to compare the effect sizes of published and unpublished studies but we found only one unpublished study. Second, we conducted a funnel plot analysis for each of the five traits (see Fig. 2) to examine whether studies with significant correlations were more likely to be published than studies with nonsignificant results. Egger’s test (Egger et al., 1997) showed potential funnel plot asymmetry for openness (z = 3.46, p < 0.001), extraversion (z = 2.31, p = 0.021), and neuroticism (z = 2.57, p = 0.010). No potential asymmetry was found for conscientiousness (z = –0.17, p = 0.866) or agreeableness (z = –1.14, p = 0.252). Additionally, we conducted trim-and-fill analyses (Duval & Tweedie, 2004) for openness, extraversion, and neuroticism. The trim-and-fill funnel plots (see Figure S1 in the Supplementary Material) showed that the filled studies would have had significantly lower effect sizes than zero, suggesting that the asymmetry found in the funnel plots might be due to reasons other than publication bias (Peters et al., 2008).

Outliers

We conducted an outlier analysis (Viechtbauer & Cheung, 2010) and identified four influential effect sizes within the 1491 effect sizes included in the data set. These cases comprised one correlation between conscientiousness and grades in the language domain (Brandt et al., 2021), two correlations between agreeableness and test scores in the STEM domain (Mammadov et al., 2021), and one correlation between neuroticism and test scores in the language domain (Brandt et al., 2022). We performed robustness checks without the outlier effect sizes, which showed a similar result pattern (see Supplementary Material, Table S6). The influential cases were kept in the data set because they seemed plausible and may reveal patterns in future research. For example, the particularly high effect size found in Brandt et al. (2021) might be an influential case because it refers to elementary school students. As our analyses on the educational level as a moderator showed, samples from elementary schools had higher effect sizes than those from secondary schools or high schools (see moderator analyses regarding educational level as shown in Table S4 in the Supplementary Material). However, for conscientiousness, we found only 50 effect sizes from elementary schools, compared to 303 effect sizes from secondary and 95 from high school samples (see Table S1 in the Supplementary Material). If the effect sizes were balanced across educational levels, the case would most likely not be an influential case.

Results

We provide the characteristics of the individual studies and their effect sizes in the raw data spreadsheet, which is available as an Excel file.^{Footnote 2} Overall, the pattern of results for the mean associations of achievement with the FFM traits (see Table 2) was comparable to the pattern found in the previous meta-analyses of Poropat (2009) and Mammadov (2022; see Table 2). The true outcomes appeared to be heterogeneous (p < 0.001) for every personality trait according to the Q-test (Q_O[241] = 13,888, Q_C[299] = 26,072, Q_E[244] = 13,874, Q_A[239] = 22,158, Q_N[258] = 10,774) and according to the values of τ (see Table 3).

Table 2 Overview of mean effect sizes per trait

Full size table

Table 3 Results showing mean correlations depending on subject domain, measure, and interaction of domain*measure for the five traits

Full size table

In the following, we report the specific results, addressing the three research questions for each of the traits, respectively. First, we report results on the moderating effects of the subject domain (composite across achievement measures). Second, we report the moderating effects of the achievement measure (composite across subject domains). Third, we report results on the effect sizes for the subject domain considering measures and vice versa. Here, we only included studies reporting both measure and subject domain for each effect size. All results reported in the following sections of the manuscript are based on coefficients corrected for measurement error. As robustness checks, we also report results for uncorrected effects in Table 3 (r^raw). The pattern of findings was largely replicated in these robustness checks, with slightly smaller effect sizes overall. Further, we report our findings after we controlled for cognitive abilities (ρ^I, Table 3).

Openness

Addressing RQ1, for openness, we found a domain-specific effect, with larger correlations in language compared to STEM subject domains (ρ = 0.25 vs. 0.13; z = –7.80; p < 0.001). Investigating the role of achievement measure (RQ2), we did not find a significant difference between the effect sizes for grades and for standardized test scores (ρ = 0.21 vs. 0.22; z = 0.39; p = 0.696). Addressing RQ3, and looking at grades more specifically, we found a significant difference between the correlations in language and in STEM subject domains (ρ = 0.25 vs. 0.12; z = –7.70; p < 0.001); the correlations of openness with language grades were larger than the correlations with STEM grades. Also, we found a significant difference between test scores in language and STEM subject domains (ρ = 0.27 vs. 0.16; z = –5.25; p < 0.001), with larger correlations found for languages compared to STEM test scores. The overall interaction between domain and measure was not significant (t[43] = –0.84; p = 0.405), indicating that the difference between language and STEM achievement was similar for grades and test scores. For an illustration, see Fig. 3.