A robust finding for psychopathology (viewed either categorically or dimensionally) is the high comorbidity/co-occurrence across the different disorders and syndromes (Angold et al. 1999; Angold and Costello 2009; Krueger 1999; Krueger and Markon 2006). Traditionally, and based on factor analysis studies, support for two broad correlated dimensions of internalizing and externalizing symptoms has emerged consistently for children and adolescents (Achenbach 1966; Achenbach and Edelbrock 1978; Lahey et al. 2015; Patalay et al. 2015; Tackett et al. 2013). The internalizing dimension includes behaviors that have the propensity to express distress inwards, such as mood and anxiety disorders. In children and adolescents, the externalizing dimension includes behaviors that have the propensity to express distress outwards (Krueger 1999; Krueger and Finger 2001), such as the symptoms in attention deficit hyperactivity disorder (ADHD), oppositional defiant disorder (ODD), and conduct disorder (CD). Other researchers have proposed that the broad dimension of internalizing is better considered to comprise separate, but correlated, dimensions for fear-related behaviors/disorders or syndromes and distress-related behaviors/disorders or syndromes (Slade and Watson 2006; Vollebergh et al. 2001). Thus, psychopathology is viewed in this model in terms of three correlated broad dimensions: fear, distress, and externalizing (Martel et al. 2017; Lahey et al. 2012). More recently, other broad dimensions have been suggested, such as psychotic symptoms (Stochl et al. 2015), thought problems (Carragher et al. 2016; Caspi et al. 2014; Laceulle et al. 2015), and autism spectrum-related problems (Noordhof et al. 2015). Therefore, overall, at least two to four correlated broad dimensions have been proposed for psychopathology.

Independent of the different numbers of broad dimensions involved, a robust finding in these studies is that the proposed dimensions are consistently moderately to highly correlated with each other (Achenbach and Rescorla 2001; Angold and Costello 2009; Angold et al. 1999; Krueger and Markon 2006). For example, among children and adolescents, the correlations between internalizing and externalizing factors appear to range between 0.40 and 0.60 (Achenbach and Rescorla 2001). Similar correlations have been found for fear, distress, and externalizing factors (Angold and Costello 2009; Angold et al. 1999) and for the inter-correlation between internalizing, externalizing, and thought disorder problems (Lahey et al. 2004; Wright et al. 2013).

The high inter-correlations between the various broad psychopathology dimensions raise the possibility that an even broader overall psychopathology dimension could exist that potentially explains the co-variances found between them. Since 2012, a growing number of studies have examined this possibility using a type of confirmatory factor analysis (CFA) model called the bi-factor model (Reise 2012). A conventional bi-factor model is an orthogonal first-order factor model with a general factor in which all items (usually) load along with separate group factors for the different dimensions, after removing the variances accounted by the general factor. In such a model, the general factor captures the covariance across all the items, and the group factors capture the unique covariance of the items within the relevant dimensions, after accounting for their variance due to the general factor (Reise 2012). Thus, when applied to set of psychopathology constructs, the general factor or more specifically, the general psychopathology factor (or P-factor; Caspi et al. 2014) reflects the covariance across all the items (be they classified as problem behaviors, symptoms, disorders and/or syndromes) forming the broad dimensions (such as internalizing symptoms/disorders), and the group factors reflect the unique covariance of the broad dimensions, after accounting for the variance allocated to the general P-factor.

For a bi-factor model or a higher order factor model, the appropriate internal consistency reliability indices for the general factor and the group factors are omega hierarchical (ωh) and omega-subscale respectively (Brunner et al. 2012; Zinbarg et al. 2005). The values for ωh and ωt range from 0 to 1, with 0 indicating no reliability and 1 reflecting perfect reliability. According to Reise et al. (2013a, b), ωh and ωt values of at least 0.75 are preferred for meaningful interpretation of a scale. A number of other fit indices have also been proposed that could enable a more sophisticated and accurate interpretation of the dimensionality of the general factor (or P-factor) in the bi-factor model. These include explained common variance (ECV; Reise et al. 2013a, b), percentage of uncontaminated correlations (PUC; Bonifay et al. 2015), and the index of construct reliability (H; Hancock and Mueller 2001).

To date, bi-factor models of psychopathology have been examined for adults (Caspi et al. 2014; Lahey et al. 2012; Stochl et al. 2015), adolescents (Carragher et al. 2016; Castellanos-Ryan et al. 2016; Laceulle et al. 2015; Noordhof et al. 2015; Patalay et al. 2015), children (Lahey et al. 2015; Martel et al. 2017), and adolescents and children together (Tackett et al. 2013). Given the focus of the present study, past studies involving children and adolescents (Carragher et al. 2015; Castellanos-Ryan et al. 2016; Laceulle et al. 2015; Lahey et al. 2015; Martel et al. 2017; Noordhof et al. 2015; Patalay et al. 2015; Tackett et al. 2013) are particularly relevant for the present study. A robust finding in past studies of the bi-factor model of psychopathology in children and adolescents is the support for the bi-factor model (Carragher et al. 2015; Castellanos-Ryan et al. 2016; Laceulle et al. 2015; Lahey et al. 2015; Martel et al. 2017; Noordhof et al. 2015; Patalay et al. 2015; Tackett et al. 2013). Independent of whether these studies included two group factors (Castellanos-Ryan et al. 2016; Laceulle et al. 2015; Lahey et al. 2015; Patalay et al. 2015; Tackett et al. 2013), or three (Carragher et al. 2015; Martel et al. 2017) or four group factors (Noordhof et al. 2015), they have consistently found that the bi-factor model fitted better than the corresponding two or three or four first-order oblique models (Carragher et al. 2015; Laceulle et al. 2015; Lahey et al. 2015; Martel et al. 2017; Patalay et al. 2015). Additionally, the P-factor in the bi-factor model has shown acceptable external validity. As examples, Martel et al. (2017) reported that the P-factor (but not the group factors) was significantly associated with global executive functioning, and Patalay et al. (2015) reported that the P-factor best predicted future psychopathology and academic attainment. To date, only two studies have examined reliability for the bi-factor model (i.e., Martel et al. 2017; Murray et al. 2016), and the findings have been mixed. Based on ωh and ωt values of at least 0.75 for meaningful interpretation of a scale, Martel et al. (2017) reported acceptable reliability for the P-factor (ωh = 0.898). In contrast, Murray et al. (2016) reported unacceptable reliability for the P-factor (ωh values ranging from 0.53 to 0.64 for eight age groups, ranging from 7 to 15 years).

Although there have been numerous studies of the bi-factor model of psychopathology in children and adolescents, there are limitations and omissions in the existing literature. First, to date, all relevant studies have been on community samples and have used dimensional scores (derived from questionnaires and rating scales) of children’s and adolescents’ problems. No study has examined clinic-referred samples, based on clinical diagnosis, to model their broad dimensions (such as internalizing and externalizing). Thus, it is uncertain how these findings are directly applicable to samples of children and adolescent referred to clinical settings and who are given clinical diagnoses. Indeed, it is possible that there could exist qualitatively different features in clinical and non-clinical levels for some psychopathologies, thereby raising concerns over measuring and using clinical traits in non-clinical population to understand the clinical level (Murray et al. 2016; Reise and Waller 2009). Second, there have been only two studies involving children (Lahey et al. 2015; Martel et al. 2017) and a third study combining children with adolescents (Tackett et al. 2013). Therefore, the data in this age group are presently limited. Third, the reliability data for the bi-factor model of psychopathology is also limited, and they have all reported only ωh and ωt scores, and none of the other indices (such as, ECV, PUC, and H) that allow for a more sophisticated and accurate interpretation of the dimensionality of the general factor (in the present study’s case, the P-factor) in the bi-factor model. Fourth, although Martel et al. (2017) examined the convergent and discriminant validity of the P-factor and group factors (fear, distress, externalizing) by investigating how these factors correlated with similar factors for maternal psychopathology, none of the other studies involving children and adolescents provided similar information. Thus, there is need for studies to include such an evaluation to reinforce or question these initial (though innovative) findings.

Given the aforementioned limitations and omissions on the bi-factor model of psychopathology in children and adolescents, the first aim of the current study was to use CFA to simultaneously examine the structure of the major childhood internalizing and externalizing disorders, based on interviews of parents of clinic-referred children in Australia. For Diagnostic and Statistical Manual of Mental Disorders, 4th Edition (DSM-IV) (also DSM-IV-TR), the common internalizing disorders for children and adolescents included in the present study were separation anxiety disorder (SAD), social phobia (SOP), specific phobia (SPP), panic disorder (PD), agoraphobia (AG), generalized anxiety disorder (GAD), obsessive–compulsive disorder (OCD), post-traumatic stress disorder (PTSD), dysthymia (DYTH), and major depressive disorder (MDD). The externalizing disorders included were ODD, CD, and ADHD. The present study focused on parent interviews for reasons explained in the “Methods” section below. Five different measurement models were examined. They were one-factor (model 1); two-factor oblique with primary factors for the internalizing disorders and the externalizing disorders (model 2); three-factor oblique model with primary factors for distress disorders, fear disorders and externalizing disorders (model 3); bi-factor model with orthogonal factors for a P-factor and group factors for the internalizing disorders and externalizing disorders (model 4); and bi-factor model with orthogonal factors for a P-factor and group factors for distress disorders, fear disorders, and externalizing disorders (model 5). All five models are shown in Figs. 1, 2, 3, 4, and 5. To allow examination of the robustness of the findings, all the models were examined in children and adolescents separately. A second aim of the study was to examine model-based reliabilities of the factors(s) in the optimum bi-factor model for these groups, contingent on such a model being supported. More specifically, the focus was to examine ωh and ωt, as well as ECV, PUC, and H reliability indices. A third aim of the study was to test the convergent and divergent validities of the factors in the optimum bi-factor model, contingent on such a model being supported. Based on the recent findings reported by Martel et al. (2017), support was expected (in terms of fit) for a bi-factor model (either with a P-factor and three group factors [fear, distress, externalizing] or a P-factor and two group factors [internalizing and externalizing]), and relatively good support in terms of reliability and validity for the P-factor is such a model.

Fig. 1
figure 1

1-factor (Model 1). ADISC-IV Disorders: SAD separation anxiety disorder, SOC social phobia, SPE specific Phobia, PD panic disorder, AGOR agoraphobia, GAD generalized anxiety disorder, OCD obsessive compulsive disorder, PTSD post traumatic stress disorder, DYSTH dysthymia, MDE major depressive disorder, ADHD attention seficit/hyperactivity disorder, CD conduct disorder, ODD oppositional defiant disorder

Fig. 2
figure 2

2-factor oblique: primary factors = internalizing & externalizing disorders (Model 2). ADISC-IV Disorders: SAD separation anxiety disorder, SOC social phobia, SPE specific Phobia, PD panic disorder, AGOR agoraphobia, GAD generalized anxiety disorder, OCD obsessive compulsive disorder, PTSD post traumatic stress disorder, DYSTH dysthymia, MDE major depressive disorder, ADHD attention seficit/hyperactivity disorder, CD conduct disorder, ODD oppositional defiant disorder

Fig. 3
figure 3

3-factor oblique: primary factors = distress, fear & externalizing disorders (Model 3). ADISC-IV Disorders: SAD separation anxiety disorder, SOC social phobia, SPE specific Phobia, PD panic disorder, AGOR agoraphobia, GAD generalized anxiety disorder, OCD obsessive compulsive disorder, PTSD post traumatic stress disorder, DYSTH dysthymia, MDE major depressive disorder, ADHD attention seficit/hyperactivity disorder, CD conduct disorder, ODD oppositional defiant disorder

Fig. 4
figure 4

Bifactor, general factor and internalizing and externalizing group factors (Model 4). ADISC-IV Disorders: SAD separation anxiety disorder, SOC social phobia, SPE specific Phobia, PD panic disorder, AGOR agoraphobia, GAD generalized anxiety disorder, OCD obsessive compulsive disorder, PTSD post traumatic stress disorder, DYSTH dysthymia, MDE major depressive disorder, ADHD attention seficit/hyperactivity disorder, CD conduct disorder, ODD oppositional defiant disorder

Fig. 5
figure 5

Bifactor, one general factor and distress, fear and externalizing group factors (Model 5). ADISC-IV Disorders: SAD separation anxiety disorder, SOC social phobia, SPE specific Phobia, PD panic disorder, AGOR agoraphobia, GAD generalized anxiety disorder, OCD obsessive compulsive disorder, PTSD post traumatic stress disorder, DYSTH dysthymia, MDE major depressive disorder, ADHD attention seficit/hyperactivity disorder, CD conduct disorder, ODD oppositional defiant disorder

Methods

Participants

The data for all participants were collected from archival files from the Academic Child Psychiatry Unit (ACPU) of the Royal Children’s Hospital, Melbourne, Australia. The ACPU is an outpatient psychiatric unit that provides services for children and adolescents with behavioral, emotional, and/or learning problems. Referrals are generally from other medical services, schools, and social and welfare organizations. The present study used the records of children, aged between 6 and 18 years, referred between 2004 and 2016, who had been interviewed for clinical diagnosis. In total, there were 2099 children, comprising 1504 males (71.8%) and 592 females (28.2%). The overall mean age of participants was 11.22 years (SD = 3.10). The frequencies of children (< 12 years) and adolescents were (≥ 12) were 1233 and 866, respectively (see Table 1).

Table 1 Demographic characteristics of participants

Table 1 provides the sociodemographic characteristics and clinical diagnoses of participants in the study. As shown, most fathers were employed, and most mothers were mainly employed or involved in home duties. About two thirds of participants had mothers and fathers who had attended at least secondary schools, and most were from families with income less than $50,000 (AUS) per year. These figures correspond close to the Australian population. In terms of parental relationship, approximately 48.68% were living together, 43.6% were separated or divorced, and the remainder were single for other reasons (e.g., death of partner).

In relation to clinical disorders, externalizing disorders were highly prevalent, with around 75.3% and 66.8% of the participants having ADHD and ODD/CD, respectively (see Table 1). Among the internalizing disorders, GAD, SPP, DYTH, and SOP were more prevalent. Approximately 44.6%, 32.30%, 39.5%, and 32.2% of the participants were diagnosed with GAD, SPP, DYTH, and SOP, respectively. PD, PTSD, and AG were relatively rare. Table 2 shows the frequencies of different levels of comorbidity for the sample used in the present study. As shown of those with a clinical diagnosis, only 9% had no comorbidity. Approximately, 69.3% had one to five other disorders. Although details are not presented, for those with an anxiety disorder, 57% had a depressive disorder, and for those with a depressive disorder, 87.2% had an anxiety disorder. A total of 75% of children were comorbid for at least one externalizing and one internalizing disorder.

Table 2 Frequencies of different levels of comorbidity

Measures

Anxiety Disorders Interview Schedule for Children

Clinical diagnosis was based on the Anxiety Disorders Interview Schedule for Children (ADISC-IV; Silverman and Albano 1996). The ADISC-IV is a semi-structured interview, based on the DSM-IV/DSM-IV-TR diagnostic system (APA 2000). Although ADISC-IV has been designed primarily to facilitate the diagnosis of the major childhood internalizing disorders, it can also be used for diagnosing other major childhood disorders. ADISC-IV diagnoses do not take into account the hierarchical, exclusionary rules outlined by the DSM-IV for making diagnoses. The ADISC-IV guideline for diagnosis is that the child be given diagnosis of all disorders meeting the diagnostic criteria. There are different ADISC-IV versions for parent interview and for child interview, and clinical diagnosis can be based either on parent or child interview or on both interviews considered together. All diagnoses reported in this study were based on parent interviews, as the child interview version does not allow for the diagnoses of CD and ODD, both of which were part of the externalizing disorders modeled in the psychopathology measurement models evaluated in the present study. Additionally, it should be noted that there is evidence of poor levels of agreement for diagnosis between information across the child and parent versions of the ADISC-IV (Grills and Ollendick 2003) and that clinical interviews of children can lead to unreliable diagnosis (Jensen et al. 1999). The parent version of ADISC-IV has robust psychometric properties (Silverman et al. 2001). Test–retest reliability for the ADISC-IV scores over a 7- to 14-day interval has shown good-to-excellent reliability. Kappa values between interviewers for this version range from 0.65 to 1.00 (Silverman et al. 2001).

Child Behavior Checklist/6–18

The Child Behavior Checklist/6–18 (CBCL) is a measure in the Achenbach System of Empirically Based Assessment (Achenbach and Rescorla 2001). Completed by parents, it has 113 items and is used to rate children between 4 and 18 years of age. Respondents indicate the degree or frequency of each behavior described in the item on a scale of 0 (not true), 1 (somewhat or sometimes true), or 2 (very true or often true). The standard rating period for the CBCL is 6 months. The CBCL has excellent psychometric properties and includes scales for various behavior and emotional problems (Achenbach and Rescorla 2001). In addition, it provides two broad scores for internalizing behavior problems and externalizing behavior problems. These broad scores were used in the present study to examine the external validities of the factors in the optimum model.

Procedure

The study was approved by the RCH ethics committee as part of ACPU’s comprehensive examination of children and adolescents referred for psychological problems. Each legal guardian and participant provided informed written consent for any data provided by them to be used in future research studies. This is a standard part of the ACPU assessment procedure. All children and their parents participated in separate interviews and testing sessions with breaks over 2 days. Information was also obtained from teachers using various checklists and questionnaires. In all cases, parental consent forms were completed prior to the assessment. The data collected covered a comprehensive demographic, medical, educational, psychological, familial, and social assessment of the child and child’s family. All psychological data were collected by research assistants, who were students in clinical psychology, and under the supervision of registered psychologists. The research assistants were provided with extensive supervised training by their supervisors prior to them collecting data.

This training for the ADISC-IV included observations of it being administered by the psychologists. The research assistants commenced administering the ADISC-IV only after they attained competence in its administration, as assessed by their supervisors. There was adequate inter-rater reliability for the diagnoses made between the research assistants and the psychologists (κ = 0.88). Standard procedures were used for the administration of all measures. However, where necessary, researchers read the items to participants (approximately 5% of the sample). Approximately 95% of the parent ADISC-IV interviews involved mothers only, and the remainder involved fathers only or both fathers and mothers together. Using the categorical data from the parent ADISC-IV, clinical diagnosis was also determined by a consultant child psychiatrist, who independently reviewed these data. The inter-rater reliability for diagnoses (for 10% of the parent interviews) of the initial diagnosis and the consultant child psychiatrist was high (kappa values of 0.90).

Data Analysis

Software

All the CFA models in the study were conducted using Mplus (version 7) software (Muthén and Muthén 2013).

Extraction

As clinical diagnosis for each disorder resulted in binary scores (disorder present that was coded 1, and disorder not present that was coded 0), the mean and variance-adjusted weighted least squares (WLSMV) extraction was used for all the CFA analyses (Rhemtulla et al. 2012). This is a robust estimator, recommended for CFA with ordered-categorical scores, including dichotomous scores. The WLSMV estimator does not assume normally distributed variables. According to measurement experts, relative to other estimators, the WLSMV estimator provides the best option for modeling categorical data, including dichotomous data (Beauducel and Herzberg 2006; Lubke and Muthén 2004; Millsap and Yun-Tein 2004).

Model Fit

For the CFA models, at the statistical level, model fit can be examined using χ2 values (WLSMVχ2 values in the current case). As all types of χ2 values, including WLSMVχ2, are inflated by large sample sizes, the fit of the models is generally interpreted by researchers using approximate fit indices, such as the root mean squared error of approximation (RMSEA), the comparative fit index (CFI), Tucker–Lewis Index (TLI), and the weighted root mean square residual (WRMR). For models based on maximum likelihood estimation, the guidelines suggested by Hu and Bentler (1998) are that RMSEA values close to 0.06 or below can be taken as good fit, close to 0.07 to < 0.08 as moderate fit, close to 0.08 to 0.10 as marginal fit, and close to > 0.10 as poor fit. For the CFI and TLI, values of 0.95 or above are taken as indicating good model-data fit, and values of 0.90 and < 0.95 are taken as acceptable fit. The cutoff score for good fit suggested for WRMR is less than 0.90 (Yu and Muthen 2002). For the present study, these appropriate fit indices, rather than the χ2 statistic, were used as evidence of model fit. However, it is worth noting that despite the widespread use of these indices and fit values, a simulation study by Nye and Drasgow (2011) concluded that appropriate indices cutoff values for WLSMV estimation can vary across conditions.

Reliability

In relation to reliability, ωh and ωt, ECV, PUC, and H values were computed using the program developed by Watkins (2013) and cross-checked using the program developed by Dueber (2016). In a bi-factor model, the ECV of the general factor will be high and the ECV of the group factors will be low whenever there is little common variance beyond that of the general factor. High values for general factor indicate the presence of a strong general factor dimension (unidimensionality) in the bi-factor model (Reise et al. 2013a). In contrast, low ECV values for the general factor do not indicate support for the presence of a strong general factor dimension (unidimensionality) but support for a multidimensionality model.

A model-based internal consistency reliability that is analogous to alpha coefficient that is especially useful for bi-factor models is omega hierarchical when referring to the general factor (ωh; Zinbarg et al. 2006) and the omega subscale (ωt) when referring to the group factors. The ωh can be interpreted as an estimator of how much variance in summed (standardized) scores can be attributed to the general factor (Brunner et al. 2012). The values for ωh and ωt range from 0 to 1, with 0 indicating zero reliability and 1 reflecting perfect reliability. For a bi-factor model, the percentage of uncontaminated correlations (PUC; Bonifay et al. 2015) indicates the bias that could result from forcing multidimensional data into a unidimensional model (Bonifay et al. 2015, p. 507). The ECV, the PUC, and ωh and ωt values can be examined concurrently to decide if the indicators in a bi-factor model can be interpreted as having sufficient reliability to view them as essentially unidimensional or if they should be considered multidimensional. According to Reise et al. (2013a, b), if PUC > 0.80, then the indicators in the bi-factor model, if supported, can be interpreted as primarily unidimensional. When the PUC < 0.80, then such an interpretation requires ECV > 0.60 and ωh > 0.70. Failure to meet these criteria would mean that a multidimensional interpretation is warranted for the set of indicators, even if the bi-factor measurement model shows good fit. H is an index of construct reliability or replicability to estimate the reliability of the underlying group and general factors (Hancock and Mueller 2001). According to Rodriguez et al. (2016), H values > 0.80 are indicative of a stable well-defined latent variable, and values < 0.70 are indicative of a factor that is not worth specifying.

Convergent and Divergent Validities

To test the convergent and divergent validities of the factors in the optimum model, the broad CBCL internalizing and externalizing scores were regressed on the factors of the optimum model.

Results

Missing Values and Fit of the Null Model

There were no missing values for the clinical cases used in the present study.

Fit of the Models Tested in the Present Study

Table 3 shows the results of all the CFA models tested for children and adolescents separately. Based on guidelines proposed by Hu and Bentler (1998), all fit indices for the one-factor model (M1s) in both groups showed poor fit. For the two-factor (M2s) and the three-factor (M3s) models, for both groups, the RMSEA indicated good fit, whereas the CFI and TFI for both groups indicated close to acceptable or acceptable fit. The exception was that the CFI value for the three-factor model in the adolescent group indicated good fit. The WRMR values for the one-, two-, and three-factor model for both groups were noticeably above 0.90. The fit values for both the bi-factor models (i.e., with two [M4s] and three [M5s] group factors) showed good fit in terms of the RMSEA, CFI, and TLI values. For both groups, the WRMR values for the bi-factor model with two group factors were lower than that for the bi-factor model with three group factors. Indeed, these values for the bi-factor model with two group factors were either just below or close to 0.90. Overall, there was reasonable support for both the bi-factor models. As the two bi-factor models were not nested, it was not possible to compare these models using the χ2 (or WLSMVχ2) difference test or the approximate fit indices based on χ2 for the models. However, given that the WRMR values in the bi-factor model with two group factors were close to 0.90, and also because the bi-factor model with two group factors is more parsimonious than the bi-factor model with three group factors, the bi-factor model with two group factors was interpreted as the optimum model for both the groups in the present study and was used in subsequent reliability and validity analyses.

Table 3 Fit of the childhood disorders models for psychopathology tested in the study

Factor Loadings for the Factors in the Optimum Bi-factor Model

Table 4 presents the standardized factor loadings of the 13 disorders on their respective latent factors in the optimum bi-factor model. For the adolescent P-factor, all disorders except GAD, ADHD, CD, and ODD had salient loadings, based on Thurstone’s (1947) classical criterion for “salience” as standardized loading ≥ 0.3. For the internalizing group factor for adolescents, all disorders except OCD, PTSD, and DYTH had salient loadings, and for the externalizing group factor, all three externalizing disorders (ADHD, CD and ODD) had salient loadings on it. The loadings for SPE, GAD, ADHD, CD, and ODD were much higher (relatively) on their group factors (0.40, 0.50, 0.55, 0.88, and 0.76, respectively) than the P-factor (0.34, 0.29, 17, 20 and 0.24, respectively). Thus, much of the variance in the P-factor can be attributed to the internalizing disorders, with the externalizing disorders contributing negligible to low amounts of variances. Indeed, at the disorder level, with the exception of SPE, GAD, ECV values (which indicate the amount of common variance contributed by a disorder to the P-factor), the other internalizing disorders were high, ranging from 0.53 to 0.99. These values were very low for the externalizing disorders (ADHD = 0.09, ODD = 0.05, and CD = 0.09). Taken together, these findings can be interpreted as indicating questionable support for a P-factor since it was saturated with mostly variances from the internalizing disorders and negligible to low variances from the externalizing disorders. Additionally, because there was low amount of variance left in the group factor for internalizing disorders, the internalizing factors may be substantively of less use. In contrast, because there was high amount of variance left in the group factor for externalizing, it can be interpreted that there is support for the externalizing group factor, even after removing the variances allocated to the general factor.

Table 4 Factor loadings and reliabilities indices for the factors in the two-group (internalizing and externalizing) bi-factor model

For the child P-factor, all disorders except SPE, AGRO, GAD, ADHD, CD, and ODD had salient loadings. For the internalizing group factor in this group, all disorders except DYTH and MDD had salient loadings. For the externalizing group factor, all three externalizing disorders (ADHD, CD, and ODD) had salient loadings. The loadings for ten disorders (seven internalizing disorders [SAD, SOP, SPE, PD, AROG, GAD, and OCD], and three externalizing disorders [ADHD, CD, and ODD]) were relatively higher on their group factors than the P-factor. Thus, relatively more of the variance in the P-factor can be attributed to the internalizing disorders, with the externalizing disorders contributing negligible to low amounts of variances. At the disorder level, with the exception of PTSD, DYTH, and MDD, ECV values of the other internalizing disorders were low, ranging from 0.00 to 0.47. These values were especially low for the externalizing disorders (ADHD = 0.00, ODD = 0.10, and CD = 0.12). Taken together, these moderate amounts of variances for the internalizing disorders and negligible to low variances from the externalizing disorders on the P-factor can be interpreted as indicating support for a weak P-factor. Additionally, as there was a moderate amount of variance left in the group factor for internalizing, the internalizing factors may be substantively meaningful. In contrast, because there was high amount of variance left in the group factor for externalizing, it can be interpreted that there is support for the externalizing group factor, even after taking out the variances allocated to the general factor.

Reliability for the Factors in the Optimum Bi-factor Model

Table 4 also presents the model-based reliability indices for all the factors in the optimum bi-factor models for children and adolescents separately. As shown, for both groups, the ECV values of the P-factor were much higher than the internalizing and externalizing group factors, with the values for the P-factor being higher for adolescents than children (for adolescents, P-factor = 0.52, internalizing group factor = 0.21, externalizing group factor = 0.27; for children, P-factor = 0.42, internalizing group factor = 0.35, externalizing group factor = 0.24). For adolescents and children, the ωh values for the P-factor were 0.65 and 0.48, respectively. The ωt values for the internalizing and externalizing group factors were 0.14 and 0.75, respectively, for adolescents, and 0.41 and 0.73, respectively, for children. Thus, although much of the reliable variance were attributed to the P-factor, there were still high levels of variances for the externalizing group factors left in both groups and moderate levels of variances for the internalizing group factor in children. The PUC values of the P-factor in adolescents and children were 0.53 and 0.39 respectively. As noted earlier, Reise et al. (2013a, b) have proposed that if PUC > 0.80, or if the PUC < 0.80, ECV > 0.60, and ωh > 0.70, then the bi-factor model can be interpreted as primarily unidimensional. As the findings failed to meet either of these criteria, it is not appropriate to interpret the findings of the present study as supporting a unidimensional model for the childhood disorders in the optimum bi-factor model in both age groups.

The H values for the adolescent P-factor and internalizing and externalizing group factors were 0.85, 0.62, and 0.84, respectively. They were 0.87, 0.77, and 0.79, respectively, for children. According to Rodriguez et al. (2016), H values > 0.80 are indicative of a stable well-defined latent variable, and values < 0.70 are indicative of a factor that is not worth specifying. Based on this guideline, the P-factor and the externalizing group factor for adolescents can be considered well-defined stable factors for this group and the P-factor and the externalizing and, to a lesser degree, the internalizing group factors for children can be considered well-defined stable factors accordingly.

Convergent and Divergent Validities of the Factors in the Two-Group (Internalizing and Externalizing) Bi-factor Model

Table 5 shows the findings of the predictions of broad CBCL externalizing and internalizing scores by the factors in the two group (internalizing and externalizing) bi-factor model for children and adolescents. As shown, for children, the P-factor predicted both the CBCL internalizing and externalizing scores positively. The internalizing group factor predicted CBCL internalizing positively, and the externalizing group factor predicted CBCL externalizing positively. For adolescents, the P-factor also predicted both the CBCL internalizing and externalizing scores positively. The externalizing group factor predicted CBCL externalizing positively, and the internalizing group factor did not predict CBCL either externalizing or internalizing. Taken together, these findings can be interpreted as support for the convergent and divergent validities of the factors in the two-group (internalizing and externalizing) bi-factor model for both children and adolescents.

Table 5 Standardized path coefficients for the regression of the CBCL internalizing and externalizing scores on the factors in the two-group (internalizing and externalizing) bi-factor model

Discussion

Based on interviews of parents of clinic-referred (ACPU) children and adolescents, the first aim of the present study was to simultaneously examine the structure of the major DSM-IV/DSM-IV TR childhood internalizing disorders (SAD, SOP, SPP, PD, AG, GAD, OCD, PTSD, DYTH, and MDD) and externalizing disorders (ADHD, CD, and ODD). Five models were compared: (i) one-factor, (ii) two-factor oblique with primary factors for internalizing and externalizing disorders, (iii) three-factor oblique model with primary factors for distress, fear and externalizing disorders, (iv) bi-factor model with orthogonal P-factor and internalizing and externalizing group factors, and (v) bi-factor model with orthogonal P-factor and group factors for distress, fear, and externalizing. For both adolescents and children, the two- and three-factor models (but not the one-factor model) showed adequate fit. Also, for both groups, both bi-factor models showed good fit and were supported. Between these models, the bi-factor model with two group factors is more parsimonious and showed slightly better fit in terms of their WRMR values. Thus, the bi-factor model with the internalizing and externalizing group factors was interpreted as the optimum model. For this model, there was support for the convergent and divergent validities of the factors. For children, the P-factor predicted both the broad CBCL internalizing and externalizing scores positively. The internalizing group factor predicted CBCL internalizing positively, and the externalizing group factor predicted CBCL externalizing positively. For adolescents, the P-factor predicted both the CBCL internalizing and externalizing scores positively. The externalizing group factor predicted CBCL externalizing positively, and the internalizing group factor did not predict either CBCL externalizing or internalizing.

The relatively good and better support for the bi-factor models over the first-order oblique models found in the present study is consistent with past studies involving children (Lahey et al. 2015; Martel et al. 2017; Tackett et al. 2013) and adolescents (Carragher et al. 2016; Castellanos-Ryan et al. 2016; Laceulle et al. 2015; Noordhof et al. 2015; Patalay et al. 2015; Tackett et al. 2013), as well as adults (Caspi et al. 2014; Lahey et al. 2012; Stochl et al. 2015). Despite this, the findings also extend existing data in this area. The present study used clinic-referred children provided with specific clinical diagnoses. All past studies in this area involving children and adolescents have utilized community samples and have used dimensional scores (derived from questionnaires and rating scales) of children’s and adolescents’ problems, to model their broad dimensions, such as internalizing and externalizing.

Although it is worth noting that the support for the bi-factor model in this study corresponds to past findings, the P-factor and group factors in the present study differed in important ways from those of past studies. Most past studies have generally found that both the internalizing and externalizing disorders contributed high variances to the P-factor. In contrast, in the present study, for both age groups, there were relatively low variances for the externalizing disorders on the P-factors, with the P-factor being saturated with mostly variances from the internalizing disorders (similar to studies by Laceulle et al. (2015)). Consequently, in both age groups, and unlike previous studies, there was high amount of variance in the externalizing group factors. It is possible that the differences found across the present study compared to those of past studies may be explained by the fact that unlike past studies that used community samples and measured psychopathology dimensionally using rating scales, the present study used a clinic-referred sample and measured psychopathology in terms of categorical clinical diagnosis.

The support for a strong P-factor and externalizing group factor for adolescents were also reinforced in terms of the reliability of these factors. The ECV value of the P-factor was much higher than the internalizing and externalizing group factors, and the ωh value for the P-factor was 0.65, while the ωt values for the internalizing and externalizing group factors were 0.14 and 0.75, respectively. Similarly, a moderately strong P-factor, a moderately strong internalizing group factor, and a strong externalizing group factor for children were supported in terms of the reliability of these factors. The ECV value of the P-factor was much higher than the internalizing and externalizing group factors for this group, and the ωh value for the P-factor was only 0.48, and the ωt values for the internalizing and externalizing group factors were 0.41 and 0.73, respectively. Additionally, the findings here indicated that the PUC values of the P-factor in adolescents and children were 0.53 and 0.39 respectively. Reise et al. (2013a, b) have proposed that if PUC > 0.80, or if the PUC < 0.80, ECV > 0.60, and ωh > 0.70, then the bi-factor model can be interpreted as primarily unidimensional. As the findings of the present study failed to meet either of these criteria, it can be argued that they are not supportive of unidimensional P-factors for the childhood internalizing and externalizing disorders in both children and adolescents. Also, the H values for the adolescent P-factor and internalizing and externalizing group factors were 0.85, 0.62, and 0.84, respectively. The same factors were 0.87, 0.77, and 0.79, respectively, for children. According to Rodriguez et al. (2016), H values > 0.80 are indicative of a stable well-defined latent variable, and values < 0.70 are indicative of a factor that is not worth specifying. Based on this guideline, and consistent with the argument presented, the P-factor and the externalizing group factor for adolescents can be considered well-defined stable factors for this group. Furthermore, the P-factor and the externalizing (and to a lesser degree, the internalizing group) factors for children can be considered well-defined stable factors in this group. Based on the combined findings for children and adolescents, the present authors’ speculate that the P and all group factors in children and adolescent are meaningful, and they need to be considered when examining substantive issues, such as the validity and external correlates of the factors. In support of this, as shown in the present study, for children, the internalizing group factor still predicted CBCL internalizing positively, and the externalizing group factor still predicted CBCL externalizing positively, even after removing the variances for the P-factor. For adolescents, the externalizing group factor still predicted CBCL externalizing positively, even after removing the variance for the P-factor. It may be worth noting that as there has been limited evaluation of the reliability of the factors in the bi-factor model (Martel et al. 2017; Murray et al. 2016), the present reliability findings extend existing data. Furthermore, ωh and ωt values and ECV, PUC, and H reliability indices (and not just ωh and ωt values, as in past studies) were combined and applied here, providing a more sophisticated and accurate interpretation of the dimensionality of the P-factor in the bi-factor model compared to that of previous studies.

As already illustrated, the findings in the present study also showed differences in the bi-factor model across children and adolescents. While there were negligible to low variances for the internalizing group factor in adolescents, there was moderate amount of variance in this factor in children. Thus, for adolescents, there was support for the P-factor and the externalizing group factor. For children, there was support for the P-factor and both the externalizing and internalizing group factors. Also, because the reliability was relatively higher for the P-factor in adolescents than children, it can be argued that the adolescent P-factor is relatively stronger than the child P-factor.

Given that the P-factor can be interpreted as strength of comorbidity of the internalizing and externalizing disorders (Murray et al. 2016), the present findings appear to propose that co-morbidity (in particular considering the internalizing disorders, as the P-factor was saturated with variances from the internalizing disorders) may manifest differently at different developmental stages and that there is stronger comorbidity in adolescents than children. Indeed, consistent with this view, Castellanos-Ryan et al. (2016) have suggested that the factor loading on the P-factor could vary developmentally, with internalizing symptoms becoming stronger with increasing age. Caspi et al. (2014) has proposed a dynamic mutualism process hypothesis to account for this. According to this hypothesis, symptoms both across and within domains can reinforce one another through local (bi-directional associations between different symptom typologies) interactions such that, over time, these local interactions can lead to an increase in symptom inter-correlations. However, it should be noted that Murray et al. (2016) found no support for this hypothesis and therefore further research is recommended.

The findings of the present study have implications for (i) taxonomy in relation to children and adolescents, (ii) understanding the comorbidity of the internalizing disorders and externalizing disorders, (iii) trans-diagnostic assessment and diagnosis and treatment, and (iv) research on bi-factor models of psychopathology. In relation to taxonomy, support for the P-factor and broad internalizing and externalizing factors is inconsistent with the DSM approach that separates anxiety, depressive, and externalizing disorders into different diagnostic groups. It is also inconsistent with how the relevant disorders are organized in DSM-5, which suggests four different groups for the internalizing disorders. One group comprises SAD, SOP, SPP, PD, and AG, whereas GAD, OCD, PTSD, and MDD combined with DTYH are each in three different groups.

The support for the bi-factor model implies that current taxonomy that considers the different types of these disorders as discrete diagnostic categories may need reconsideration to recognize the high degree of comorbidity among them. Instead, the support for the bi-factor model in the present study suggests that for children and adolescents, all the common childhood psychopathologies could be grouped under an overall group called childhood psychopathology and separated into subgroups of internalizing disorder and externalizing disorders. The list of symptoms in the internalizing disorder group could be the key non-overlapping symptoms for the different anxiety and depressive disorders, and the list of symptoms in the externalizing disorder group could be the key non-overlapping symptoms for ADHD, CD, and ODD. To recognize that the major specific symptoms present in an individual, appropriate descriptors could be added to the diagnosis of internalizing disorder or externalizing disorder. For example, when an individual has panic and social phobia, the diagnosis could be internalizing disorder with panic and social phobia. Despite this proposal, it needs to be stressed that further research and replication of the findings in the present study is needed before such changes could be adopted.

In relation to understanding the comorbidity of the internalizing and externalizing disorders, the support found in the present study for the P-factor, with salient loadings for virtually all internalizing disorders, can be inferred as indication of the strength of the associations of these disorders with the underlying latent factor and by extension the comorbidity of the disorders. Consistent with past studies (for a meta-analysis study, see Angold et al. 1999; Krueger and Markon 2006), these findings suggest high comorbidity among the internalizing disorders on one hand, and the externalizing disorders on the other, and between the internalizing and externalizing disorders.

In relation to assessment, diagnosis, and treatment, the close associations between the internalizing and externalizing disorders, via the P-factor, found in the present study highlight the need for a comprehensive evaluation of all the internalizing and externalizing disorders for a better understanding of a child’s or an adolescent’s psychopathology. The findings also imply that treatment may have to focus on general risk factors for psychopathology. In this respect, recently developed trans-diagnostic treatment approaches, such as for anxiety and depression disorders in children and adolescents (Ehrenreich-May and Bilek 2012), would be valuable. In brief, trans-diagnostic approaches focus on common factors that produce symptoms in related classes of disorders, thereby addressing multiple concerns or disorders within an individual (McEvoy et al. 2009).

In relation to research on the bi-factor model of psychopathology, the present findings suggest that researchers not only need to demonstrate good fit of such models (as has been the case with most of the studies in this area) but also need to examine the reliability of the P and group factors in the bi-factor model, as carried out in the present study. This enabled the present work to demonstrate that although the bi-factor model with two group factors was supported, all group factors in children and adolescents were also meaningful, and they need to be considered in structural models of psychopathology across these groups.

There are several strengths to the present study. First, it involved a large clinical sample, with clinical diagnoses of the major internalizing and externalizing disorders, derived via structured clinical interviews. Therefore, these findings appear more useful from a clinical viewpoint. Second, to increase the creditability of our findings, structural models were conducted separately for children and adolescents. The close comparability of the findings across these groups attest to the robustness of the findings reported here. Third, recent methodological developments were applied to evaluate the reliability of the factors in a bi-factor model (e.g. ECV, PUC, and H).

Despite these significant strengths, there are limitations in this study that need to be considered when interpreting the findings. First, not all disorders relevant to children and adolescents, such as eating disorders, autism spectrum conditions, psychotic symptoms, and substance abuse disorders, were included in the analyzed sample. It cannot be ruled out that the inclusion of these disorders may have produced different results. Second, the present study examined the factor structure of the common DSM-IV anxiety and depressive disorders at the diagnostic level. Thus, the findings may not reflect the factor structure of anxiety and depressive disorders at the level of symptoms. As noted by Seeley et al. (2011), when analyses are focused at the disorder level, the underlying dimensionality associated with diagnostic criteria for specific disorders is ignored, and consequently, associations among symptoms might show different patterns of associations than those obtained on the basis of diagnostic categories/classifications. Third, approximately 75.3% and 66.8% of the participants had ADHD and ODD/CD respectively. Thus, it is not known if this exerted any influence on the findings. Fourth, as this study examined clinic-referred children, the findings here may not be applicable to internalizing and externalizing disorders in children and adolescents from the general community. Indeed, there is evidence that in clinical samples, comorbidity rates are usually higher than among general population samples (Angold et al. 1999) due to a methodological problem called Berkson’s bias (i.e., people with multiple disorders are more likely to be referred to clinics than are people with single disorders). Fifth, all the participants in this study were from the same clinic. It is possible that this may constitute an additional bias. Sixth, the present study used a predominantly male sample, and this may have added some gender-related bias to the findings. Seventh, it is important to keep in mind that although children were the target of analysis, the information about these disorders was derived from interviews of parents and not children themselves. It is therefore possible that this may have also influenced parameter estimates. Finally, the present data (in line with previous P-factor literature) refer to a Westernized population of developing individuals. Therefore, generalization to different cultural groups should be treated with caution (i.e., potential cultural bias of the findings needs to be considered).

In conclusion, although, the present study found that the bi-factor model with the internalizing and externalizing disorders showed good fit and was the optimum model. However, the fact that there was (i) support for a strong externalizing group factor for adolescents, (ii) a moderately strong internalizing group factor, and (iii) a strong externalizing group factor for children weakens the support for a dominant P-factor in both these groups. Thus, the conclusion of the present study differs from the prevailing general consensus for a robust dominant P-factor for psychopathology (Castellanos-Ryan et al. 2016; Laceulle et al. 2015) because it calls for the inclusion of the group factors in structural models of psychopathology. Nevertheless, this view is not without support from previous studies (Laceulle et al. 2015). Like the findings in the present study, these studies also showed that there was substantial variance in the group factors even after removing the variances for the general factor. Given the discrepancy of the findings in the present study and that of past studies, more studies are needed that take into consideration the aforementioned limitations. Such studies will be valuable as they could have implications for the understanding of the development and the course of childhood psychopathology across different age and cultural groups (prioritizing those referring to relatively under-researched populations), which in turn can have implications for more developmentally and culturally responsive diagnosis and treatment of childhood disorders.