Over the last three decades, the analysis of repeated measures data has centered on the linear mixed model (LMM). Laird and Ware (1982) established the basis of the LMM with the incorporation of the within-subjects error correlation. Their work was subsequently extended by Cnaan, Laird, and Slasor (1997) and Verbeke and Molenberghs (2000), who applied the LMM to longitudinal data. In contrast to analyses that are based on variances (ANOVA and MANOVA), the LMM models the structure of the covariance matrix. This enables a more efficient estimation of the fixed effects and, consequently, yields more robust statistical tests. However, when the covariance structures are not properly fitted and the sample sizes are small, the Type I error rate tends to rise (Wright & Wolfinger, 1996).

With respect to the covariance structure, Keselman, Algina, Kowalchuk, and Wolfinger (1999) demonstrated that when the covariance matrix is not spherical, the degrees of freedom associated with the conventional F test are too large. One way of controlling this bias is to apply the procedure developed by Kenward and Roger (1997; henceforth, the KR method) to correct the degrees of freedom. Studies by Kowalchuk, Keselman, Algina, and Wolfinger (2004) and Vallejo and Ato (2006) showed that with an adequate covariance structure, the KR method is able, in many cases, to control Type I error rates with small sample sizes and when the assumption of sphericity is violated. Thus, by applying the KR correction, one can obtain more efficient estimates of the fixed effects associated with repeated measures (Arnau, Bono, & Vallejo, 2009; Kowalchuk et al., 2004; Schaalje, McBride, & Fellingham, 2001).

Generally speaking, studies based on the LMM assume that the data are normal. However, in applied contexts, data distributions tend to depart from normality. For example, in a study of 440 distributions corresponding to a range of achievement and psychometric data, Micceri (1989) found that only 28.4% of the distributions were relatively symmetrical, while 40.7% were moderately asymmetrical, 19.5% were extremely asymmetrical, and 11.4% showed exponential asymmetry. As regards kurtosis, only 15.2% of the distributions approached normality, whereas 49.1% of the distributions could be considered to show extreme kurtosis. According to Micceri, the most common distributions within the psychometric, educational, and psychological contexts are moderately asymmetrical and show kurtosis that deviates considerably from normal. These results are supported by research showing that in the social and health sciences, many variables follow a log-normal distribution (Limpert, Stahel, & Abbt, 2001). Examples in the field of medicine include the latency period of infectious diseases (Kondo, 1977), survival times for certain types of cancer (Claret et al., 2009; Qazi, DuMez, & Uckun, 2007), and the age of onset of Alzheimer’s disease (Horner, 1987). In the social sciences, an example would be the age at which people first get married (Preston, 1981), while in psychology the log-normal distribution provides a good fit to data on reaction times or response latencies (Shang-wen & Ming-hua, 2010; Ulrich & Miller, 1993; Van der Linden, 2006), as well as to data on attentional skills (Brown, Weatherholt, & Burns, 2010). With survival data, a good fit is provided not only by the log-normal distribution, but also by the exponential distribution and its extensions (Weibull, gamma, and Gompertz), as well as by the generalized gamma and log-logistic distributions (Lee & Wang, 2003).

Given that the log-normal distribution is common with applied data, particularly in the behavioral sciences (Micceri, 1989; Wilcox, 1994), some authors have based their simulation studies on this distribution (Keselman et al., 1999; Keselman, Algina, Wilcox, & Kowalchuk, 2000). In this context, Sawilowsky and Blair (1992) obtained nonrobust results from the t test when the distributions showed extreme skewness (e.g., γ 1 = 1.64). Algina and Oshima (1995) found that tests for the equivalence of means are affected when the distribution is log-normal and the assumption of homogeneity is not fulfilled. As regards the effect of kurtosis, research has shown that the Type I error rate tends to decrease as kurtosis increases (Wilcox, 1993). In light of these findings, the present study considers the log-normal distribution, as well as distributions involving slight skewness combined with different levels of kurtosis.

The aim of this research was to analyze longitudinal data by means of the LMM, comparing normal and nonnormal distributions. Specifically, here we seek to examine the robustness of the LMM when applied to data whose distribution is closer to that found in real life—that is, the log-normal distribution and other nonnormal distributions with slight skewness and extreme kurtosis, according to the values described by Lei and Lomax (2005). According to these authors, when the absolute values of skewness and kurtosis are less than 1.0, the distribution deviates slightly from normality; values between 1.0 and 2.3 correspond to moderate deviation; and values above 2.3 indicate extreme deviation from normality.

The LMM and longitudinal data

Application of the LMM to longitudinal data involves adapting the hierarchical structure to a repeated measures design. Thus, the observations or repeated measures are located on the first level, and the subjects on the second level (Cnaan et al., 1997; Van der Leeden, 1998; Van der Leeden, Vrijburg, & De Leeuw, 1996; Wu, Clopper, & Wooldridge, 1999). Both levels are clearly represented by the LMM. This kind of model, in which repeated measures are nested within individuals, constitutes an extension of the multilevel methodology that is widely used to analyze longitudinal data. An illustration of the LMM as applied to longitudinal data can be found in Arnau, Balluerka, Bono, and Gorostiaga (2010).

This section examines, first, the regression model on two levels and, second, the integration of the two levels within the LMM.

First level: Within-subjects model

With longitudinal data, and in accordance with the model proposed, the direct observations or repeated measurements that make up the first level are nested within subjects, or the second level. Since growth curves are processes over time, they can be modeled as polynomial functions. On a first level (within-subjects model), the equation can be fitted to a polynomial function of order p, expressed by

$$ {y_{{ti}}} = {\beta_{{0i}}} + {\beta_{{{1}i}}}{T_{{_{{ti}}}}} + {\beta_{{{2}i}}}T_{{ti}}^{{2}} + \cdot \cdot \cdot + {\beta_{{pi}}}T_{{ti}}^p + {e_{{ti}}},\;{e_{{ti}}} \cong N\left( {0,{\sigma_e}^{{2}}} \right), $$
(1)

where y ti is the measurement of the dependent variable for subject i on occasion t, the β pi s are the coefficients of the polynomial function of degree p, which represent the growth trajectory of the subject, and e ti is the random error. It is assumed that e ti has a normal and independent distribution with mean zero and constant variance. The T ti s are the explanatory variables of subject i on occasion t. Note that the parameters β pi and the residual variance σ e 2 are specific to subject i. Thus, with the Level-1 model, the repeated measurements of a growth process are described as a polynomial function. In this model, both the number of observations and the interval between observations may vary between subjects.

Applying compact matrix notation, the Level-1 model for longitudinal data can be represented by

$$ {y_i} = {{\mathbf{T}}_i}{\beta_i} + {e_i},\;{e_i} \cong N\left( {0,{\mathbf{R}}} \right), $$
(2)

where y i is the t × 1 repeated measure vector of the subject i, β i is a p × 1 individual parameter vector that specifies the shape of the growth curve for subject i, and T i is the t × p matrix of known variables and their polynomial transformations. Let us suppose that the response of the subject follows a quadratic function. T i is then a t × 3 matrix, in which the first column is formed by 1’s, the second corresponds to the observation occasions, and the third to the occasions squared. The vector e i is a t × 1 random error vector, and it is assumed that the errors are independent with normal multivariate distribution N(0, R). Thus, R = var(e i ) is a positively defined covariance matrix. However, when the observations have a given order or a specific structure, it should be assumed that the between-error correlation is distinct from zero and that it varies systematically.

Second level: Between-subjects model

In the longitudinal context, the between-subjects model takes the individual growth parameters (β pi s) to be random dependent variables that, in the simplest case, are expressed by the following equations:

$$ {\beta_{{0i}}} = {\gamma_{{00}}} + {u_{{0i}}}, $$
(3)
$$ {\beta_{{{1}i}}} = {\gamma_{{{1}0}}} + {u_{{{1}i}}}, $$
(4)
$$ {\beta_{{{2}i}}} = {\gamma_{{{2}0}}} + {u_{{{2}i}}}, $$
(5)

$$ {\beta_{{pi}}} = {\gamma_p}_0 + {u_{{pi}}}. $$
(6)

In Eqs. 3, 4, 5, and 6, the between-subjects variation in the parameters that express the individual growth trajectories is modeled as a function of population averages and of the deviations from these averages that the subjects show.

If there is a predictive variable Z, the Level-2 model with p + 1 parameters is expressed by

$$ {\beta_{{pi}}} = {\gamma_p}_0 + \sum\limits_{{{{_q}_{{ = {1}}}}}}^{{{{^Q}_p}}} {{\beta_{{pq}}}{Z_{{qi}}} + {u_{{pi}}},\;{u_{{pi}}}} \cong N\left( {0,{\mathbf{G}}} \right), $$
(7)

where γ p0 is the average intercept of the subjects, Z qi represents a given Level-2 predictive variable, β pq expresses the effect of Z qi on the growth variable p, and u pi represents the random error term. The random effects p + 1 associated with subject i are assumed to follow a normal multivariate distribution.

In matrix notation, the Level-2 model for longitudinal data would be

$$ {\beta_i} = {{\mathbf{Z}}_i}\gamma + {u_i},\;{u_i} \cong N\left( {0,{\mathbf{G}}} \right), $$
(8)

where Z i is the p × q between-subjects design matrix with known and fixed items. When only the random variation in individual growth parameters is modeled, Z i takes the form of an identity matrix. More elaborate models can be formulated if Z i contains dummy variables that codify subgroups of subjects or Level-2 explanatory variables that enable the reasons for variability between growth parameters β i to be examined. The covariables can be fixed or may vary over time. In Eq. 8, γ is a q × 1 fixed coefficient vector, and u i is a p × 1 random error vector with zero mean and variance G.

Integrating the two levels in the LMM

The LMM corresponds to the integration of the two levels described above. It can be used to analyze a wide variety of data structures that are commonly found in psychological, social, and health research and whose analysis with the classic linear model is problematic.

The complete LMM is obtained by inserting Eq. 8 into Eq. 2, as follows:

$$ {{\mathbf{Y}}_i} = {\mathbf{T}}{{\mathbf{Z}}_i}\gamma + {\mathbf{T}}{u_i} + {e_i}. $$
(9)

It is assumed that the elements of e i are distributed independently and normally with constant variance, e i ≅ N(0, R), where R = σ e 2 I; that the Level-2 random terms are distributed normally—that is, u i ≅ N(0, G); and that the Level-1 error terms (e i ) are distributed independently of the Level-2 terms (u i ). In the complete model, the term TZ i γ is the fixed part, and the term T u i + e i is the random part. The fixed effects define the expected values of the observations, while the random effects are variances and covariances. The covariance between the elements of Y i (the complete longitudinal data) consists of two parts—that is, the between-subjects part and the within-subjects part—such that

$$ {\text{Var}}\left( {{{\mathbf{Y}}_i}} \right) = {\text{Var}}\left( {{\mathbf{T}}{u_i} + {e_i}} \right) = {\mathbf{TGT}}' + {\sigma_e}^{{2}}{\mathbf{I}}. $$
(10)

Note that in Eq. 10, the assumptions relating to errors, and especially to the term e i , lead to very simple covariance structures at the individual level of the model (constant and noncorrelated errors across points in time). However, when there are many time points per subject, the residual components often show a pattern of autocorrelation (Ware, 1985). With multilevel models, a covariance matrix that reflects dependence between observations can be chosen. Thus, modeling the within-subjects covariance structure is particularly relevant, since the accuracy of the regression parameter estimates depends on the right choice (Littell, Pendergast, & Natarajan, 2000; Park & Lee, 2002).

One of the key advantages of the LMM is that it enables a choice to be made among the various covariance structures that could be used to model the data. This is of considerable importance, because the better the fit of the covariance structure, the greater the accuracy of the regression estimates.

KR method for correcting the degrees of freedom

In this study, the degrees of freedom are corrected using the KR method, which yields more precise and efficient estimates of the fixed effects with small samples (Arnau et al., 2009; Kowalchuk et al., 2004; Schaalje et al., 2001).

If C is a contrast matrix with range q, the Wald F for the hypothesis H0: C β = 0 is F = W/q, where

$$ {\text{W = (C}}\mathop{\beta }\limits^{ \wedge } {\text{)'(C(X '}}{\mathop{\text{V}}\limits^{ \wedge }{^{ - 1}}}{\text{X}}{{)}^{{ - 1}}}{\text{C'}}{{)}^{{ - 1}}}{\text{(C}}\mathop{\beta }\limits^{ \wedge } {)}{.} $$
(11)

If we calculate a scale factor δ and an approximate value for the degrees of freedom ν, the F statistic for the KR method is given by

$$ \begin{array}{*{20}c} {F* = \delta F_{{{\text{KR}}}} } \\ { = \frac{\delta } {q}{\left( {{\mathbf{C}}\widehat{{\mathbf{\beta }}}} \right)}\prime {\left( {{\mathbf{C}}{\left( {{\mathbf{X}}\prime \widehat{{\mathbf{V}}}\,^{{ - 1}} {\mathbf{X}}} \right)}^{{ - 1}} {\mathbf{C}}\prime } \right)}^{{ - 1}} {\left( {{\mathbf{C}}\widehat{{\mathbf{\beta }}}} \right)}.} \\ \end{array}. $$
(12)

The moments of F* are generated and matched to the moments of the distribution F so as to solve δ and ν. Under the null hypothesis, it is assumed that F* is approximately distributed in the same way as F, with q degrees of freedom in the numerator and ν degrees of freedom in the denominator. Therefore, it is necessary to calculate two values from the data: the degrees of freedom in the denominator ν, and a scale factor δ. Hence,

$$ v = {4} + \frac{{q + 2}}{{qy - 1}}, $$
(13)

where

$$ y = \frac{{{\text{V}}\left[ {{F_{\text{KR}}}} \right]}}{{{\text{2E}}{{\left[ {{F_{\text{KR}}}} \right]}^{{2}}}}}, $$
(14)

and

$$ \delta = \frac{v}{{{\text{E}}\left[ {{F_{\text{KR}}}} \right]\left( {v - {2}} \right)}}. $$
(15)

The inferences derived from simulation studies that use the KR method are usually more accurate, even with complex covariance structures (Arnau et al., 2009; Keselman, Algina, Kowalchuk, & Wolfinger, 1998; Schaalje, McBride, & Fellingham, 2002). It has been shown that under a normal distribution with heterogeneous within-group covariance structures, the KR procedure satisfies the criterion of robustness (Livacic-Rojas, Vallejo, & Fernández, 2006). Furthermore, Arnau et al. (2009) concluded that, as compared with the Satterthwaite (1941) procedure, the KR correction offered better control of Type I error rates under most of the normal distribution conditions they studied, and particularly with unstructured and nonspherical covariance structures. A recent simulation study also showed that the KR method is better than the Satterthwaite approach when applying the multilevel model to data from multiple-baseline designs (Ferron, Farmer, & Owens, 2010).

The present study focuses on analyzing the performance of the KR method with nonnormal as opposed to normal distributions. To this end, the KR procedure was used to estimate the fixed effects associated with time and with the Time × Group interaction in small samples corresponding to normal, exponential, and log-normal distributions. In addition, the Akaike information criterion (AIC; Akaike, 1974) was used to select the covariance structure that showed the best fit among 11 possible structures. These structures were (a) compound symmetry (CS); (b) unstructured (UN); (c) first-order autoregressive (AR); (d) Huynh–Feldt spherical (HF); (e) within-subjects heterogeneous compound symmetry (CSH); (f) within-subjects heterogeneous first-order autoregressive (ARH); (g) random coefficients (RC); (h) between-subjects heterogeneous unstructured (UN j ); (i) between-subjects heterogeneous Huynh–Feldt spherical (HF j ); (j) within-subjects and between-subjects heterogeneous first-order autoregressive (ARH j ); and (k) between-subjects heterogeneous random coefficients (RC j ). The subscripts j indicate that the covariance matrices are not equal between the groups.

A Monte Carlo study

The data for the simulations were generated using a series of macros created ad hoc in SAS 9.2 (SAS Institute, 2008). The first step involved generating the covariance matrices from variances and correlations with sphericity values of ε = .57 and .75. The RANNOR generator was then used to obtain normally distributed pseudorandom observations by applying the Cholesky factor of the covariance matrix R. The nonnormal data distributions were generated via the same procedure but were transformed by the Fleishman coefficients (Fleishman, 1978) corresponding to each of the distributions studied. The Fleishman method is used to calculate the coefficients a, b, c, and d by means of a polynomial transformation based on different values of skewness (γ 1) and kurtosis (γ 2). The restriction imposed on these four coefficients was mean zero and variance 1, such that

$$ {\gamma_{{1}}} = {2}c\left( {{b^{{2}}} + {24}bd + {1}0{5}{d^{{2}}} + {2}} \right) $$
(16)

and

$$ {\gamma_{{2}}} = {24}\left( {bd + {c^{{2}}}\left[ {{1} + {b^{{2}}} + {28}bd} \right] + {d^{{2}}}\left[ {{12} + {48}bd + {141}{c^{{2}}} + {225}{d^{{2}}}} \right]} \right), $$
(17)

where the constant a is equal to –c.

Appendix A shows how the nonnormal data matrices were generated by means of Fleishman coefficients, yielding both exponential distributions, with fixed skewness (γ 1 = 0.8) and two values of kurtosis (γ 2 = 2.4 and 5.4), and log-normal distributions (γ 1 = 1.75 and γ 2 = 5.9). These values are well within the range of skew and kurtosis that represent the real-world situation (Lei & Lomax, 2005).

Study variables

The robustness of the KR approximation with normal and nonnormal distributions was examined by conducting a simulation study with split-plot designs. This study used three groups for the between-subjects factor (J = 3) and three levels for the within-subjects factor (K = 4, 6, and 8). The decision to investigate these levels was based on a review of 61 simulation studies of repeated measures designs that were published between 1967 and 2010. Of these 61 studies, 62.3% used J = 3, 72.1% used K = 4, and 26.2% used K = 8. In the present study, we decided to include an intermediate value (K = 6) as well (Arnau et al., 2009; Padilla & Algina, 2007). It should also be noted that the levels of J and K used in the present simulation study are the ones most commonly found in educational and psychological research. For each value of K, combinations of four variables were selected: (a) homogeneous and heterogeneous between-groups covariance structures; (b) total sample size; (c) equal and unequal group sizes; and (d) pairings of the covariance matrices and group sizes. The indices of sphericity used were ε = .57 and .75. The latter value (.75) was taken to be a good approximation to sphericity, while the former (.57) represented nonsphericity.

The data were generated using the unstructured (UN) covariance structure, as this is the most typical with longitudinal data. The UN covariance matrix requires no assumptions with regard to the error terms and allows any pattern of correlation between the observations. With this matrix, the variances and correlations are assumed to be nonstationary; that is, all the variances and covariances are different. Appendix B shows the values of the covariance matrices for the corresponding sphericity indices and levels of repeated measures.

From the set of UN covariance structures obtained, both the equal and unequal between-groups covariance matrices were analyzed. With heterogeneous matrices, the inequality of groups was adjusted to the ratio 1:3:5. For the analysis we considered small total sample sizes of N = 30, 36, and 42, as well as equal and unequal group sizes. With unbalanced groups, the variance coefficient of the group size, Δn j , was 0.41, while for balanced groups Δn j = 0. With Δn j = 0.41, the group sizes were 5, 10, 15 (hence, N = 30); 6, 12, 18 (N = 36); and 7, 14, 21 (N = 42). With Δn j = 0, the group sizes were 10, 10, 10 (N = 30); 12, 12, 12 (N = 36); and 14, 14, 14 (N = 42). Finally, the type of pairing between group sizes and covariance matrices was defined as one of the following: null, positive, or negative. In positive pairings, the largest group is associated with the covariance matrix whose values are largest. Conversely, a negative pairing associates the largest group with the covariance matrix containing the smallest element values. Pairing is null in the case of balanced groups.

Table 1 shows the different combinations of variables examined in this study. For each combination, 1,000 replications were performed at a significance level of .05, for both normal and nonnormal distributions.

Table 1 Group sizes for balanced and unbalanced designs with J = 3; K = 4, 6, and 8; the UN population covariance matrix; and ε = .57 and .75

Results

As indicated above, the first stage of this study consisted of generating the UN population covariance matrices for ε = .57 and .75. In a second stage, these matrices were used to obtain normal and nonnormal data distributions. In a third stage, the 11 covariance structures were fitted to each data set by means of the Proc Mixed from SAS and according to the AIC. Finally, Proc Mixed was again used to calculate the Type I error rates for the effects of time and the Time × Group interaction. The next two sections report the fit percentages of the covariance matrices selected by means of the AIC, as well as the Type I error rates, specifying the covariance structure selected by the AIC.

Selecting the covariance structure Given the presence of both within-subjects and between-subjects heterogeneity, 11 covariance structures were fitted according to the AIC generated by Proc Mixed. The present study used the AIC as it is a more effective indicator of goodness of fit than the Bayesian information criterion (BIC; Schwarz, 1978). Keselman et al. (1998) demonstrated that the AIC selects the true population covariance structure on 47% of occasions, whereas the BIC does so only 35% of the time. A similar study by Ferron, Dailey, and Yi (2002) reported correct selection rates of 79% for the AIC and 66% for the BIC. Subsequently, Vallejo and Livacic-Rojas (2005) showed that the test based on the AIC outperformed that based on the BIC when it came to controlling the Type I error rate, especially when used in conjunction with the KR method. These authors found that with the BIC, Proc Mixed offered poor control of the estimated probabilities of Type I error. However, in another study of the KR procedure, Gomez, Schaalje, and Fellingham (2005) concluded that the AIC only performs better with complex covariance structures, such as the UN covariance matrix. This is consistent with the findings of Keselman et al. (1998). Recently Vallejo, Ato, and Valdés (2008) demonstrated that with different group sizes, the AIC provides a better estimate of standard errors. In light of these findings, the present simulation study made use of the AIC. Note, however, that the AIC does not always select the true structure, as other structures may provide more adequate approximations. In this regard, Vaida and Blanchard (2005) proposed the conditional AIC (CAIC) as an effective and useful alternative for selecting mixed-effects models, arguing that it provided a good approximation that was adequate to the amount of data and the amount of information that they contained. More recently, Vallejo, Fernández, Livacic-Rojas, and Tuero-Herrero (2011) found that the shape of the distribution did not affect the correct decision rates of the AIC and the BIC with UN covariance patterns, and also showed that the AIC performed better with both normal and nonnormal distributions.

The results of the present study are presented in Tables 2 and 3, which show the percentages of fit for the most common covariance matrices in relation to the UN population matrix and for homogeneous and heterogeneous between-groups covariances, respectively.

Table 2 Percentages of fit of the most common covariance structures (UN, CSH, ARH, and UN j ) in relation to the UN population covariance matrix
Table 3 Percentages of fit of the most common covariance structures (UN, UN j , and ARH j ) in relation to the UN j population matrix

It can be seen in Table 2 that with a normal distribution, the structure with the best fit (66.7%) is the UN population matrix. With ε = .57, it shows a good fit in all cases (for K = 4, 6, and 8). However, when ε = .75, it only shows the best fit for K = 4, and as the number of repeated measures increases (K = 6 and 8), the CSH matrix offers the best fit (33.3%). When the distribution is exponential with γ 1 = 0.8 and γ 2 = 2.4, the best fit (33.4%) is provided by the UN population matrix when K = 4. As the value of K increases, the best fit is shown by other structures: CSH with ε = .75, and ARH and UN j with ε = .57. When kurtosis increases to γ 2 = 5.4, the UN j matrix provides the best fit (66.6%) when K = 4 and 6. However, when the number of repeated measures increases to K = 8, the best fit is offered by the UN population matrix with ε = .57, and by the CSH matrix with ε = .75. In the case of the log-normal distribution, the population matrix never provides the best fit, and the UN j matrix has a fit percentage of 100.

Table 3 shows the percentages of fit for heterogeneous between-groups covariance matrices. As regards the type of pairing, with normal distributions the best fit (22.2%) is shown by the UN j population matrix only in the case of null pairing. This is also the case for exponential distributions (16.7% when γ 2 = 2.4, and 22.2% when γ 2 = 5.4). With the log-normal distribution, the UN j population matrix provides the best fit (33.4%) for all values of K. When the distribution is exponential with γ 1 = 0.8 and γ 2 = 2.4, the ARH j structure also shows a correct fit (16.7%), as it does, to a slightly lesser extent (11.2%), when the distribution is normal or exponential with γ 1 = 0.8 and γ 2 = 5.4. With positive and negative pairings, the UN j population matrix only provides the best fit when K = 4, this being the case for both normal and nonnormal distributions. As the number of repeated measures increases, the ARH j matrix shows the best fit for all of the distributions, especially when ε = .75. Note, however, that when ε = .57 and K = 6, the UN matrix offers a good fit for all distributions except the exponential distribution with γ 1 = 0.8 and γ 2 = 2.4. The reason why the ARH j matrix provides the best fit with positive and negative pairing is probably that the dependency structure of the data becomes more apparent as the number of repeated measures increases.

Type I error rates The simulated data were then analyzed with Proc Mixed, specifying the covariance structure with the best fit according to the AIC (Tables 2 and 3). The analysis involved estimating the p values associated with the fixed effects, via the KR approximation, and the empirical Type I error rates for each combination of the different study variables (Table 1).

Robustness was determined according to Bradley’s criterion, whereby the effect estimate is robust when the empirical Type I error rate is between .025 and .075 for α = .05. A test is considered to be liberal when the empirical Type I error rate is above the upper limit, and conservative when it is below the lower limit.

Table 4 shows the empirical Type I error rates for the time effect, as well as the percentages of robustness. With K = 4 and ε = .57, the KR method shows 66.7% robustness when the distribution is normal and when it is exponential with γ 1 = 0.8 and γ 2 = 2.4. The percentage of robustness falls slightly, to 58.3%, when kurtosis increases to γ 2 = 5.4, and the test is never robust when the distribution is log-normal (γ 1 = 1.75 and γ 2 = 5.9). With K = 4 and ε = .75, the difference between normal and log-normal distributions is not so marked: When the distribution is normal, the robustness percentage is the same as for ε = .57, while with the log-normal distribution, the test achieves 41.7% robustness. In this case, and in contrast to what occurs with the normal distribution, the test is liberal with homogeneous between-groups covariances. The highest robustness percentage for ε = .75 is 75%, corresponding to exponential distributions with γ 1 = 0.8 and γ 2 = 5.4, where the test is robust for both null and positive pairings.

Table 4 Empirical rates of Type I error for the time effect (nominal value .05)

With K = 6 and ε = .57, robustness is 41.7% with the normal distribution, but only 8.3% with the log-normal distribution. With ε = .75, robustness is 75% with the normal distribution and 25% with the log-normal distribution, where the test is liberal with null pairings. Note, however, that with positive pairings and ε = .57, the test is conservative for both normal and log-normal distributions. As regards the exponential distributions, robustness ranges between 50% and 83.3%, being greater with ε = .75. When the distribution is exponential with γ 1 = 0.8 and γ 2 = 5.4, the test is robust even with negative pairings.

With K = 8 and ε = .57, robustness is 50% with the normal distribution but 0% for the log-normal distribution. An even greater difference can be seen with ε = .75, where robustness is again 0% with the log-normal distribution but 75% in the case of the normal distribution. As regards exponential distributions, the performance is similar when K = 6, although the test is slightly less robust.

Table 5 shows the empirical Type I error rates and percentages of robustness for the effect of the Time × Group interaction. With K = 4, there are hardly any differences between normal and nonnormal (exponential and log-normal) distributions. When ε = .57, robustness is 58.3% with the normal distribution, 41.7% for the exponential distribution with γ 1 = 0.8 and γ 2 = 2.4, 58.3% for the exponential distribution with γ 1 = 0.8 and γ 2 = 5.4, and 50% with the log-normal distribution. These percentages are more or less maintained when ε = .75, where robustness is 41.7% for the normal distribution, for the exponential with γ 1 = 0.8 and γ 2 = 2.4, and for the log-normal distribution, and 66.7% for the exponential distribution with γ 1 = 0.8 and γ 2 = 5.4.

Table 5 Empirical rates of Type I error for the interaction effect (nominal value .05).

Similarly, with K = 6 and ε = .75, there are no differences between the normal distribution, the exponential with γ 1 = 0.8 and γ 2 = 2.4, and the log-normal distribution. In these cases, robustness ranges between 41.7% and 50%, although it rises to 75% for the exponential distribution with γ 1 = 0.8 and γ 2 = 5.4, the test being robust with negative pairings. With ε = .57, the test is again more robust (41.7%) when the distribution is exponential with γ 1 = 0.8 and γ 2 = 5.4. When the distribution is log-normal and ε = .57, robustness reaches 33.3%, compared to 25% for the normal distribution. It can be seen that the normal distribution and the exponential with γ 1 = 0.8 and γ 2 = 2.4 have the same percentages of robustness for both sphericity indices.

With K = 8, the test is more robust when the distribution is exponential with γ 1 = 0.8 and γ 2 = 5.4, followed by the log-normal, and finally the normal distribution. The greatest difference between the normal and log-normal distributions is observed with positive pairings, for which the test is robust with the log-normal distribution but not with normally distributed data.

Overall, Tables 4 and 5 show that the Proc Mixed in combination with the KR method is unable to control the Type I error rate when pairing is negative. This finding is corroborated by previous analyses of the UN and UN j population matrices. Vallejo and Ato (2006) concluded that Proc Mixed in conjunction with the KR procedure based on the AIC tends to inflate Type I error rates for the interaction effect when pairing is negative, with both normal and nonnormal distributions. Moreover, this tendency of the test to be liberal increases as the sample size gets smaller. The present results also illustrate this. As regards the main time effect, Vallejo and Livacic-Rojas (2005) found that when the condition of sphericity was fulfilled, the test was robust even with distributions that deviated slightly from normality and with small sample sizes and positive pairing. The present study reached the same conclusion with respect to exponential distributions.

Discussion

Moderate nonnormality has a minimal effect on the standard errors of estimation methods. However, standard error bias tends to increase in line with the degree of nonnormality (Lei & Lomax, 2005). Hence, the focus of the present analysis of longitudinal data was to examine the extent to which nonnormality influenced the estimation of fixed effects. Specifically, the aim of the study was to determine the robustness of the LMM in mixed longitudinal designs when the data are not normally distributed. A previous study with normally distributed data showed that the LMM in combination with the KR method is more robust than the between–within and Satterthwaite approximations, particularly when the population covariance matrices are unstructured or heterogeneous first-order autoregressive (Arnau et al., 2009). The main contribution of this study by Arnau et al. (2009) was to demonstrate that the KR procedure corrects the liberal Type I error rates obtained through the between–within and Satterthwaite methods, especially when there are positive pairings between group sizes and covariance matrices. Vallejo and Ato (2006) concluded that in split-plot designs, the KR correction may be a viable alternative for estimating the interaction effect. In their analysis of the estimation of time and interaction effects, Vallejo and Livacic-Rojas (2005) found that the KR correction based on the AIC was able to control the Type I error rates in most of the distributions studied. Hence, in the present study we examined the Type I error rates for time and interaction effects in order to determine the influence of skewness and kurtosis on the estimation of fixed effects when distributions are nonnormal.

An interesting result of this study, in relation to selecting the covariance structure with heterogeneous between-groups covariance matrices, is that the ARH j matrix showed the best fit, regardless of the type of distribution and with both positive and negative pairings. This ARH j structure of dependency in repeated measures data was previously reported by Keselman et al. (1998). In their simulation study, they found that with normal distributions and ε = .75, the ARH j matrix showed the best fit to the UN j population matrix, this being the case for both positive and negative pairings. By contrast, with log-normal distributions and ε = .75, there were no differences between the ARH j and UN j matrices, both matrices showing a correct fit to the UN j population matrix.

With respect to Type I error rates, the present results show that for the time effect, the test is more robust with normal than with log-normal distributions. However, the difference was not so marked between normal and exponential distributions. With a log-normal distribution and ε = .57, the test is not robust: The percentages are null or close to zero. With a normal distribution and ε = .75, the test is robust for all values of K. However, with a log-normal distribution and ε = .75, the test becomes less robust as the number of repeated measures increases, and it ceases to be robust when K = 8. It should be noted, therefore, that the test is more robust with a normal than with a log-normal distribution, whereas, overall, there are no significant differences in performance between normal and exponential distributions. In addition, when the covariance matrix is spherical, the test tends to become more robust with normal and exponential distributions, especially when the number of observations increases; this is contrary to what occurs with the log-normal distribution. Finally, a comparison of the two exponential distributions shows that the test becomes more robust as kurtosis increases, regardless of whether or not the assumption of sphericity is fulfilled.

As compared to the estimation of the time effect alone, the interaction between time and group leads to a considerable increase in the test’s robustness when the distribution is log-normal. Indeed, with a log-normal distribution and ε = .57, the test has zero robustness in relation to the time effect, but is much better when estimating the interaction effect. Conversely, robustness decreases when the distribution is normal, particularly as the number of repeated measures increases. In this regard, Padilla and Algina (2007) showed that the Type I error rate tends to be higher as the value of K rises. When ε = .75, no differences are observed between the normal distribution, the exponential with γ 1 = 0.8 and γ 2 = 2.4, and the log-normal with K = 4 and 6. By contrast, with K = 8 the test is more robust with the log-normal distribution. As occurs when estimating the time effect, the highest percentage of robustness in relation to the interaction effect corresponds to the sphericity condition ε = .75 and the exponential distribution with γ 2 = 5.4.

To summarize, the KR approximation is least robust when the distribution is log-normal, in which case robustness is null when estimating the time effect and with nonspherical covariance matrices (ε = .57). In our view, this is largely due to the increase in skewness (γ 1 = 1.75), as this lack of robustness is not observed for exponential distributions with γ 1 = 0.8. Robustness is high when the exponential distribution has a degree of kurtosis that is very similar to the log-normal distribution—that is, γ 2 = 5.4. Therefore, we conclude that the KR procedure is compromised with log-normal distributions that show moderate skewness, especially as regards the estimation of the time effect. By contrast, when distributions are normal or have slight skewness (γ 1 = 0.8), the test is robust even with extreme kurtosis (γ 2 = 2.4 and γ 2 = 5.4). In fact, the percentage of robustness is high for the exponential distribution with γ 1 = 0.8 and γ 2 = 5.4, as well as when estimating both the time and interaction effects.

In conclusion, two effects are revealed by the present results, one due to skewness and the other to kurtosis. The effect of skewness is detected when comparing exponential with log-normal distributions, for which robustness decreases as skewness increases. On the other hand, the effect of kurtosis is revealed when comparing two exponential distributions, in which case robustness increases in line with the degree of kurtosis.

Two final reflections can be made. First, while acknowledging that the results are limited to the conditions examined in this study, we believe that they could in fact be generalized to a wide variety of conditions. However, the results obtained here are difficult to compare with other simulation studies, since the sample sizes and values of skewness and kurtosis differ from one study to another. Secondly, it would be interesting in future research to study the functioning of generalized linear mixed models. These techniques do not require the error terms to be normally distributed, and they are well-suited to most of the distributions found with real-life data.