Next Article in Journal
An Application of p-Fibonacci Error-Correcting Codes to Cryptography
Next Article in Special Issue
Reliability Properties of the NDL Family of Discrete Distributions with Its Inference
Previous Article in Journal
Supply Chain Coordination with a Risk-Averse Retailer and the Call Option Contract in the Presence of a Service Requirement
Previous Article in Special Issue
On the Omega Distribution: Some Properties and Estimation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Exhaustive Power Comparison of Normality Tests

by
Jurgita Arnastauskaitė
1,2,*,
Tomas Ruzgas
2 and
Mindaugas Bražėnas
3
1
Department of Applied Mathematics, Kaunas University of Technology, 51368 Kaunas, Lithuania
2
Department of Computer Sciences, Kaunas University of Technology, 51368 Kaunas, Lithuania
3
Department of Mathematical modelling, Kaunas University of Technology, 51368 Kaunas, Lithuania
*
Author to whom correspondence should be addressed.
Mathematics 2021, 9(7), 788; https://doi.org/10.3390/math9070788
Submission received: 12 February 2021 / Revised: 17 March 2021 / Accepted: 31 March 2021 / Published: 6 April 2021
(This article belongs to the Special Issue Probability, Statistics and Their Applications 2021)

Abstract

:
A goodness-of-fit test is a frequently used modern statistics tool. However, it is still unclear what the most reliable approach is to check assumptions about data set normality. A particular data set (especially with a small number of observations) only partly describes the process, which leaves many options for the interpretation of its true distribution. As a consequence, many goodness-of-fit statistical tests have been developed, the power of which depends on particular circumstances (i.e., sample size, outlets, etc.). With the aim of developing a more universal goodness-of-fit test, we propose an approach based on an N-metric with our chosen kernel function. To compare the power of 40 normality tests, the goodness-of-fit hypothesis was tested for 15 data distributions with 6 different sample sizes. Based on exhaustive comparative research results, we recommend the use of our test for samples of size n 118 .

1. Introduction

A priori information about data distribution is not always known. In those cases, hypothesis testing can help to find a reasonable assumption about the distribution of data. Based on assumed data distribution, one can choose appropriate methods for further research. The information about data distribution can be useful in a number of ways, for example:
  • it can provide insights about the observed process;
  • parameters of model can be inferred from the characteristics of data distributions; and
  • it can help in choosing more specific and computationally efficient methods.
Statistical methods often require data to be normally distributed. If the assumption of normality is not satisfied, the results of these methods will be inappropriate. Therefore, the presumption of normality is strictly required before starting the statistical analysis. Many tests have been developed to check this assumption. However, tests are defined in various ways and thus react to abnormalities, present in a data set, differently. Therefore, the choice of goodness-of-fit test remains an important problem.
H 0 : T h e   d i s t r i b u t i o n   i s   n o r m a l , H A : T h e   d i s t r i b u t i o n   i s   n o t   n o r m a l .
For these reasons, this study examines the issue of testing the goodness-of-fit hypotheses. The goodness-of-fit null and alternative hypotheses are defined as:
A total of 40 tests were applied to analyze the problem of testing the goodness-of-fit hypothesis. The tests used in this study were developed between 1900 and 2016. In the early 19th century, Karl Pearson published an article defining the chi-square test [1]. This test is considered as the basis of modern statistics. Pearson was the first to examine the goodness-of-fit assumption that the observations x i   can be distributed according to the normal distribution, and concluded that, in the limit as n becomes large, X 2 follows the chi-square distribution with k 1 degrees of freedom. The statistics for this test are defined in Section 2.1. Another popular test for testing the goodness-of-fit hypothesis is the Kolmogorov and Smirnov test [2]. This test statistic quantifies a distance between the empirical distribution function of the sample and the cumulative distribution function of the reference distribution [3]. The Anderson and Darling test is also often used in practice [4]. This test assesses whether a sample comes from a specified distribution [3]. The end of 19th century and the beginning of 20th century was a successful period for the development of goodness-of-fit hypothesis test criteria and their comparison studies [5,6,7,8,9,10,11,12,13,14,15,16,17,18,19].
In 2010, Xavier Romão et al. conducted a comprehensive study comparing the power of the goodness-of-fit hypothesis tests [20]. In the study, 33 normality tests were applied to samples of different sizes, taking into account the significance level α and many symmetric, asymmetric, and modified normal distributions. The researchers found that the most powerful of the selected normality tests for the symmetric group of distributions were Coin β 3 2 , Chen–Shapiro, Bonett–Seier, and Gel–Miao–Gastwirth tests; for the asymmetric group of distributions, Zhang–Wu Z C and Z A , and Chen–Shapiro; while the Chen–Shapiro, Barrio–Cuesta-Albertos–Matrán–Rodríguez-Rodríguez, and Shapiro–Wilk tests were the most powerful for the group of modified normal distributions.
In 2015, Adefisoye et al. compared 18 normality tests for different sample sizes for symmetric and asymmetric distribution groups [3]. The results of the study showed that the Kurtosis test was the most powerful for a group of symmetric data distributions and the Shapiro–Wilk test was the most powerful for a group of asymmetric data distributions.
The main objective of this study is to perform a comparative analysis of the power of the most commonly used tests for testing the goodness-of-fit hypothesis. The procedure described in Section 3 was used to calculate the power of the tests.
Scientific novelty—the comparative analysis of test power was carried out using different methods for goodness-of-fit in the case of many different types of challenges to curve tests. The goodness-of-fit tests have been selected as representatives of popular techniques, which have been analyzed by other researchers experimentally. We have proposed a new kernel function and its usage in an N-metric-based test. The uniqueness of the kernel function is that its shape is chosen in such a way that the shift arising in the formation of the test is eliminated by using sample values.
The rest of the paper is organized as follows. Section 2 provides descriptions of the 40 goodness-of-fit hypothesis tests and the procedure for calculating the power of the tests. The samples generated from 15 distributions are given in Section 4. Section 5 presents and discusses the results of a simulation modeling study. Finally, Section 6 concludes the results.

2. Statistical Methods

In this section, the most popular tests for normality are overviewed.

2.1. Chi-Square Test (CHI2)

In 1900, Karl Pearson introduced the chi-square test [1]. The statistic of the test is defined as:
χ 2 = i = 1 k ( O i E i ) 2 E i ,
where O i is the observed frequency and E i is the expected frequency.

2.2. Kolmogorov–Smirnov (KS)

In 1933, Kolmogorov and Smirnov proposed the KS test [2]. The statistic of the test is defined as:
χ 2 = D + = max { ( i n ) z i } ,       1 i n ; D = max { z i i 1 n } ,       1 i n ; D = max ( D + , D   ) ,
where z i is the cumulative probability of standard normal distribution and D is the difference between observed and expected values.

2.3. Anderson–Darling (AD)

In 1952, Anderson and Darling developed a variety of the Kolmogorov and Smirnov tests [4]. This test is more powerful than the Kolmogorov and Smirnov test. The statistic of the test is defined as:
A 2 = n i = 1 n 2 i 1 n ( ln ( F ( x i ) ) + ln ( 1 F ( x n + 1 i ) ) ) ,
where F ( x i )   is   the value of the distribution function at point x i and n   is   the   empirical sample size.

2.4. Cramer–Von Mises (CVM)

In 1962, Cramer proposed the Cramer–von Mises test. This test is an alternative to the Kolmogorov and Smirnov test [21]. The statistic of the test is defined as:
C M = 1 12 n + i = 1 n ( Z i 2 i 1 2 n ) 2 ,
where Z i   is the cumulative distribution function of the specified distribution Z i = X ( i ) X ¯ / S ,     and   X ¯   and S are the sample mean and sample standard deviation.

2.5. Shapiro–Wilk (SW)

In 1965, Shapiro and Wilk formed the original test [22]. The statistic of the test is defined as:
W = ( i = 1 n a i x ( i ) ) 2 ( i = 1 n x i x ¯ ) 2 ,
where   x ( i ) is the i t h order statistic, x ¯ is the sample mean, and a i   constants obtained:
a i = ( a 1 , , a n ) = m T V 1 ( m T V 1 V 1 m ) 1 / 2   ,
where   m = ( m 1 , , m n )   T are the expected values of the order statistics of independent and identically distributed random variables sampled from the standard normal distribution and V is the covariance matrix of those order statistics.

2.6. Lilliefors (LF)

In 1967, Lilliefors modified the Kolmogorov and Smirnov test [23]. The statistic of the test is defined as:
T = sup x | F * ( x ) S ( x ) | ,
where F * ( x )   is   the standard normal distribution function and S ( x )   is   the empirical distribution function of the z i values.

2.7. D’Agostino (DA)

In 1971, D’Agostino introduced the test for testing the goodness-of-fit hypothesis, which is an extension of the Shapiro–Wilk test [8]. The test proposed by D’Agostino does not need to define a weight vector. The statistic of the test is defined as:
D = i = 1 n ( i ( n + 1 ) / 2 ) · x ( i ) n 2 · m 2 ,
where m 2 is the second central moment that is defined as:
m 2 = n 1 i = 1 n ( x i x ¯ ) 2 .

2.8. Shapiro–Francia (SF)

In 1972, Shapiro and Francia simplified the Shapiro and Wilk test and developed the Shapiro and Francia test, which is computationally more efficient [24]. The statistic of the test is defined as:
W S F = ( i = 1 n m i x i ) 2 ( i = 1 n x i x ¯ ) 2 i = 1 n m i 2 ,
where m i   is the expected values of the standard normal order statistics.

2.9. D’Agostino–Pearson (DAP)

In 1973–1974, D’Agostino and Pearson proposed the D’Agostino and Pearson test [25]. The statistic of the test is defined as:
D P = i = 1 n ( i ( n + 1 ) / 2 ) x ( i ) n 2 m 2 ,
where n is the size of sample and m 2 is the sample variance of order statistics.

2.10. Filliben (Filli)

In 1975, Filliben defined the probabilistic correlation coefficient r as a test for the goodness-of-fit hypothesis [26]. This test statistic is defined as:
r = i = 1 n x ( i ) ·   M ( i ) i = 1 n M ( i ) 2 · ( n 1 ) · σ 2 ,
where σ 2   is the variance,   M ( i ) = Φ 1 (   m ( i ) ) , when   m ( i )   is   the estimated median values of the order statistics, each   m ( i ) is obtained by:
  m ( i ) = { 1 0.5 ( 1 n )             i = 1 , ( i 0.3175 ) ( n + 0.365 )               1 < i < n , 0.5 ( 1 n )                           i = n .

2.11. Martinez–Iglewicz (MI)

In 1981, Martinez and Iglewicz proposed a normality test based on the ratio of two estimators of variance, where one of the estimators is the robust biweight scale estimator S b 2 [27]:
S b 2 = n · | z ˜ i | < 1 ( x i M ) 2 ( 1 z ˜ i 2 ) 4 [ | z ˜ i | < 1 ( 1 z ˜ i 2 ) ( 1 5 z ˜ i 2 ) ] 2 ,
where M is the sample median, z ˜ i = ( x i M ) / ( 9 A ) , with A being the median of | x i M | .
This test statistic is then given by:
I n = i = 1 n ( x i M ) 2 ( n 1 ) · S b 2 .

2.12. Epps–Pulley (EP)

In 1983, Epps and Pulley proposed a test statistic based on the following weighted integral [28]:
T E P = | φ n ( t ) φ ^ 0 ( t ) | 2 d G ( t ) ,
where φ n ( t ) is the empirical characteristic function and G ( t ) is an adequate function chosen according to several considerations. By setting d G ( t ) = g ( t ) d t and selecting:
g ( t ) = m 2 / 2 π · e x p ( 0.5 m 2 t 2 )
the following statistic is obtained:
T E P = 1 + n 3 + 2 n k = 2 n j = 1 k 1 exp ( ( ( x j x k ) 2 ) / 2 m 2 ) 2 j = 1 n exp ( ( ( x j x ¯ ) 2 ) / 4 m 2 ) ,
where m 2   is   the second central moment.

2.13. Jarque–Bera (JB)

In 1987, Jarque and Bera proposed a test [29] with statistic defined as:
J B = n 6 ( s + ( k 3 ) 2 4 ) ,
where s = m 3 2 m 2 3 and k = m 4 m 2 3 are the sample skewness and kurtosis.

2.14. Hosking ( H 1 H 3 )

In 1990, Hosking and Wallis proposed the first Hosking test [5]. This test statistic is defined as:
H i = V i μ V σ V ,
where μ V and σ V are the mean and standard deviation of number of simulation data values of V . V i is calculated as:
V 1 = i = 1 N n i ( t ( i ) t R ) 2 i = 1 N n i ,   V 2 = i = 1 N n i ( t ( i ) t R ) 2 + ( t 3 t 3 R ) 2 i = 1 N n i ,
V 3 = i = 1 N n i ( t 3 ( i ) t 3 R ) 2 + ( t 4 ( i ) t 4 R ) 2 i = 1 N n i ,   t R = i = 1 N n i t ( i ) i = 1 N n i ,
where t ( i )   is   the coefficient of variation of the L-moment ratio, t 3 ( i )   is   the coefficient of skewness of the L- moment, and t 4 ( i ) is the coefficient of kurtosis of the L- moment.

2.15. Cabaña–Cabaña (CC1-CC2)

In 1994, Cabaña and Cabaña proposed the CC1 and CC2 tests [6]. The CC1 ( T S , l ) and CC2 ( T K , l ), respectively, are defined as:
T S , l = max |   w S , l ( x ) | ,             T K , l =   max |   w K , l ( x ) | ,
where   w S , l ( x )   and w K , l ( x )   approximate transformed estimated empirical processes sensitive to changes in skewness and kurtosis and are defined as:
  w S , l = Φ ( x ) · H ¯ 3 ϕ ( x ) · j = 1 l 1 j H j 1 ( x ) · H ¯ j + 3 ,   w K , l = ϕ ( x ) · H ¯ 3 + [ Φ ( x ) x · ϕ ( x ) ] · H ¯ 4 ϕ ( x ) · j = 2 l ( j j 1 H j 2 ( x ) · H j ( x ) ) · H ¯ j + 3 ,
where l   is   a dimensionality parameter, Φ ( x )   is   the probability density function of the standard normal distribution, H j ( · )   is   the j th order normalized Hermite polynomial, and H ¯ j   is   the   j th order normalized mean of the Hermite polynomial defined as: H ¯ j = 1 n i = 1 n H j ( x i ) .

2.16. The Chen–Shapiro Test (ChenS)

In 1995, Chen and Shapiro introduced an alternative test statistic based on normalized spacings and defined as [9]:
C S = 1 ( n 1 ) · s i = 1 n 1 x ( i + 1 ) x ( i )   M i + 1   M i ,
where   M i   is   the i th quantile of a standard normal distribution.

2.17. Modified Shapiro-Wilk (SWRG)

In 1997, Rahman and Govindarajulu proposed a modification to the Shapiro–Wilk test [8]. This test statistic is simpler to compute and relies on a new definition of the weights using the approximations to m and V . Each element a i   of the weight vector is given as:
a i = ( n + 1 ) ( n + 2 ) ϕ ( m i ) [ m i 1 ϕ ( m i 1 ) 2 m i ϕ ( m i ) + m i + 1 ϕ ( m i + 1 ) ] ,
where it is assumed that m 0 ϕ ( m 0 ) = m n + 1 ϕ ( m n + 1 ) = 0 . Therefore, the modified test statistic assigns larger weights to the extreme order statistics than the original test.

2.18. Doornik–Hansen (DH)

In 1977, Bowman and Shenton introduced the Doornik–Hansen goodness-of-fit test [9]. This test statistic is obtained using transformations of skewness and kurtosis:
s = m 3 m 2 3           a n d         k = m 4 m 2 2
where m i = 1 n i = 1 n ( x i x ¯ ) i x ¯ = 1 n i = 1 n x i   and   n   is   sample size.
The DH test statistics have a chi-square distribution with two degrees of freedom. It is defined as:
D H = z 1 2 + z 2 2 ~ χ 2 ( 2 ) ,
where z 1 = δ log ( y + y 2 1 ) , δ = 1 l o g ( w 2 ) , w 2 = 1 + 2 ( β 1 ) ,
β = 3 ( n 2 + 27 n 70 ) ( n + 1 ) ( n + 3 ) ( n 2 ) ( n + 5 ) ( n + 7 ) ( n + 9 ) ,   y = s ( w 2 1 ) ( n + 1 ) ( n + 3 ) 12 ( n 2 ) ,   s = z 2 = 2 α ( 1 9 α 1 + χ 2 α 3 ) ,   α = a + c × s 2 ,   a = ( n 2 ) ( n + 5 ) ( n + 7 ) ( n 2 + 27 n 70 ) 6 δ ,   c = ( n 7 ) ( n + 5 ) ( n + 7 ) ( n 2 + 2 n 5 ) 6 δ ,   δ = ( n 3 ) ( n + 1 ) ( n 2 + 15 n 4 ) ,   χ = 2 l ( k 1 s 2 ) ,   l = ( n + 5 ) ( n + 7 ) ( n 3 + 37 n 2 + 11 n 313 ) 12 δ .  

2.19. Zhang Q (ZQ), Q * (ZQstar), Q Q * (ZQQstar)

In 1999, Zhang introduced the Qtest statistic based on the ratio of two unbiased estimators of standard deviation, q 1 and q 2 , given by Q = l n ( q 1 / q 2 ) [10]. The estimators q 1 and q 2 are calculated by q 1 = i = 1 n a i x ( i ) and q 2 = i = 1 n b i x ( i ) , where the i th order linear coefficients a i and b i are:
a i = [ ( u i u 1 ) ( n 1 ) ] 1 ,   given   i 1 ,   a 1 = i = 2 n a i , b i = { b n i + 1 = [ ( u i u i + 4 ) ( n 4 ) ] 1                                   i = 1 , , 4 , ( n 4 ) 1 · [ ( u i u i + 4 ) 1 ( u i 4 u i ) 1 ]   i = 5 , , n 4 ,
where u i   is the i th expected value of the order statistics of a standard normal distribution, u i = Φ 1 [ ( i 0.375 ) / ( n + 0.25 ) ] .
Zhang also proposed the alternative statistic Q * by switching the ith order statistics x ( i ) in q 1 and q 2 by x ( i ) * = x ( n i + 1 ) .
In addition to those already discussed, Zhang proposed joint test Q Q * , based on the fact that Q and Q * are approximately independent.

2.20. Barrio–Cuesta-Albertos–Matran–Rodriguez-Rodriguez (BCMR)

In 1999, Barrio, Cuesta-Albertos, Matrán, and Rodríguez-Rodríguez proposed a new BCMR goodness-of-fit test [11]. This test is based on L 2 -Wasserstein distance and is defined as:
B C M R = m 2 [ i = 1 n x ( i ) · ( i 1 ) / n i / n Φ 1 ( t ) d t ] 2 m 2 ,
where the numerator represents the squared L 2 -Wasserstein distance.

2.21. Glen–Leemis–Barr (GLB)

In 2001, Glen, Leemis, and Barr extended the Kolmogorov–Smirnov and Anderson–Darling test to form the GLB test [12]. This test statistic is defined as:
P S = n 1 n · i = 1 n [ ( 2 n + 1 2 i ) · ln ( p ( i ) ) + ( 2 i 1 ) · ln ( 1 p ( i ) ) ] ,
where p ( i )   is the elements of the vector p containing the quantiles of the order statistics sorted in ascending order.

2.22. Bonett–Seier T w (BS)

In 2002, Bonett and Seier introduced the BS test [13]. The statistic for this test is defined as:
T w = n + 2 · ( ω ^ 3 ) 3.54 ,
where ω ^ = 13.29 [ l n m 2 ln ( n 1 i = 1 n | x i x ¯ | ) ] ,   m 2 = 1 n i = 1 n ( x i x ¯ ) 2 .

2.23. Bontemps–Meddahi (BM1– B M 3 4 , BM2– B M 3 6 )

In 2005, Bontemps and Meddahi proposed a family of normality tests based on moment conditions known as Stein equations and their relation with Hermite polynomials [24]. The statistic of the test is defined as:
B M 3 p = k = 3 p ( 1 n i = 1 n H k ( z i ) ) 2 ,
where z i = ( x i x ¯ ) / s and H k ( · )   is   the k th order normalized Hermite polynomial having the general expression given by:
i > 1 :   H i ( u ) =   1 i [ u ·   H i 1 ( u ) i 1 ·   H i 2 ( u ) ] ,         H 0 ( u ) = 1 ,         H 1 ( u ) = u .

2.24. Zhang–Wu (ZW1– Z C , ZW2– Z A )

In 2005, Zhang and Wu presented the ZW1 and ZW2 goodness-of-fit tests [15]. The Z C and Z A statistics are similar to the Cramér–von Mises and Anderson–Darling tests statistics based on the empirical distribution function. The statistic of the test is defined as:
Z C = i = 1 n [ l n Φ ( z ( i ) ) 1 1 n 0.5 i 0.75 1 ] 2 , Z A = i = 1 n [ l n   Φ ( z ( i ) ) n i + 0.5 + ln [ 1   Φ ( z ( i ) ) ] i 0.5 ] ,
where Φ ( z ( i ) ) = ( i 0.5 ) / n .

2.25. Gel–Miao–Gastwirth (GMG)

In 2007, Gel, Miao, and Gastwirth proposed the GMG test [16]. The statistic of the test is defined as:
R s J = s J n ,
where J n   is the ratio of the standard deviation and the robust measure of dispersion is defined as:
J n = π / 2 n i = 1 n | x i M | ,
where M is the median of the sample.

2.26. Robust Jarque–Bera (RJB)

In 2007, Gel and Gastwirth modified the Jarque–Bera test and got a more powerful Jarque–Bera test [16]. RJB test statistic is defined as:
R J B = n 6 ( m 3 J n 3 ) 2 + n 64 ( m 4 J n 4 3 ) 2 ,
where m 3 , m 4 are the third and fourth moments, respectively, and J n   is   the ratio of the standard deviation.

2.27. Coin β 3 2

In 2008, Coin proposed a test based on polynomial regression to determine the group distributions of symmetric distributions [17]. The type of model for this test is:
z ( i ) = β 1 · α i + β 3 · α i 3 ,
where β 1 and β 3   are fitting parameters and α i is the expected values of standard normal order statistics.

2.28. Brys–Hubert–Struyf T M C L R (BHS)

In 2008, Brys, Hubert, and Struyf introduced the BHS tests [3]. This test is based on skewness and long tails. The statistics for this test T M C L R is defined as:
T M C L R = n ( w ω ) T V 1 ( w ω ) ,
where w is set as [ M C ,   L M C ,   R M C ] T , M C is medcouple, L M C   is   left medcouple, R M C   is   right medcouple, and ω and V are obtained based on the influence function of the estimators in ω . In the case of a normal distribution:
ω = [ 0 ,   0.199 ,   0.199 ] T ,           V = [ 1.25 0.323 0.323 0.323 2.62 0.0123 0.323 0.0123 2.62 ] .

2.29. Brys–Hubert–Struyf–Bonett–Seier T M C L R   &   T w (BHSBS)

In 2008, Brys, Hubert, Struyf, Bonett, and Seier introduced the combined BHSBS test [3]. This test statistic is defined as:
T M C L R   &   T w = n ( w ω ) T V 1 ( w ω ) & n + 2 · ( ω ^ 3 ) 3.54 ,
where ω is asymptotic mean and V is covariance matrix.

2.30. Desgagné–Lafaye de Micheaux–Leblanc R n (DLDMLRn), X A P D a (DLDMXAPD), Z E P D a (DLDMZEPD)

In 2009, Desgagné, Lafaye de Micheaux, and Leblanc introduced the R n and X A P D a tests [18]. The statistic R n ( μ , σ ) for this test is defined as:
R n ( μ , σ ) = 1 n i = 1 n d θ ( Y i ) = ( 2 [ 1 n i = 1 n Y i 2   s i g n ( Y i ) ] 2 1 [ 1 n i = 1 n Y i 2 l o g | Y i | ( 2 l o g 2 γ ) / 2 ] ) ,
where Y i = σ 1 ( X i μ ) . When μ and   σ are unknown, the following maximum-likelihood estimators can be used:
μ ^ n = X ¯ n = 1 n i = 1 n X i ,             σ ^ n = S n = [ 1 n i = 1 n ( X i X ¯ n ) 2 ] 1 / 2 .
The DLDMXAPD test is based on skewness and kurtosis which are defined as:
s = 1 n i = 1 n Z i 2   s i g n ( Z i ) ,                     k = 1 n i = 1 n Z i 2   l o g | ( Z i ) | ,
where Z i = S n 1 ( X i X ¯ n ) , X ¯ n ,   S n are defined above.
The DLDMXAPD test is suitable for use when the sample size is greater than 10. The statistic X A P D a for this test is defined as:
X A P D a = n s 2 3 8 / π + n ( k ( 2 l o g 2 γ ) / 2 ) 2 ( 3 π 2 28 ) / 8 ,       X A P D = Z 2 ( s ) + Z 2 ( k s 2 ) ,
where γ = 0.577215665   is   the Euler–Mascheroni constant and s ,   k are skewness and kurtosis, respectively.
In 2016, Desgagné, Lafaye de Micheaux, and Leblanc presented the DLDMZEPD test based on the skewness [18]. The statistic Z E P D a for this test is defined as:
Z E P D a = n 1 / 2 ( k ( 2 l o g 2 γ ) / 2 ) [ ( 3 π 2 28 ) / 8 ] 1 / 2 ,           Z E P D = Z E P D ( k ) .

2.31. N-Metric

We improved the Bakshaev [30] goodness-of-fit hypothesis test based on N-metrics. This test is defined in the following way.
Under the null hypothesis statistic, T n = n 0 1 0 1 K ( x ) d ( F n * ( x ) x ) has the same asymptotic distribution as quadratic form:
T n = k = 1 j = 1 a k j π 2 k j ξ k ξ j ,
where ξ k are independent random variables from the standard normal distribution and:
a k j = 2 0 1 0 1 K ( x ) d s i n ( π k x ) .
In this case, Bakshaev applied the kernel function K ( x ) = | x y | , and we propose to apply another kernel function (Figure 1):
K ( x ) = φ ( g ¯ ( x ) ) g ¯ ( x ) ,
where   φ ( x ) = 1 2 π e x 2 2 .
An additional bias is introduced when the kernel function is calculated at the sample values (i.e., for x   =   X ( t ) ). Therefore, to eliminate this bias, the shape of the kernel function is chosen so that the influence in the environment of the sample values is as small as possible.
Let X be the standard normal random variable, Φ and φ be its distribution and density functions, respectively, and g :   R     R is an odd strictly monotonically increasing function. Then the distribution function F Y of the random variable Y = g ( X ) is Φ ( g ¯ ( x ) ) , where g ¯ is the inverse of the function g . The distribution density f Y of a random variable Y is φ ( g ¯ ( x ) ) g ¯ ( x ) . Let us consider the parametric class of functions g ¯ , which depends on three parameters:
g ¯ = x ( c + | x | b ) a ,         a , b , c > 0 , g ¯ = ( c + | x | b ) a + a | x | ( c + | x | b ) a 1 b | x | b 1
where a is variance, b is trough, and c is peak shape parameter.

3. The Power of Test

The power of the test is defined as the probability of rejecting a false H 0 hypothesis. Power is the opposite of type II error. Decreasing the probability of type I error α increases the probability of type II error and decreases the power of the test. The smaller the error is, the more powerful test is. In practice, the tests are designed to minimize the type II error for a fixed type I error. The most commonly chosen value for α is 0.05 . The probability of the opposite event is calculated as 1 β , i.e., the power of the test (see in Figure 2) β is the probability of rejecting hypothesis H 0 when it is false. The power of the test makes it possible to compare two tests significance level and sample sizes. A more powerful test has a higher value of 1 β . Increasing the sample size usually increases the power of the test [31,32].
When exact null distribution of a goodness-of-fit test statistic is a step function created by the summation of the exact probabilities for each possible value of the test statistic, it is possible to obtain the same critical value for a number of different adjacent significance levels α . Linear interpolation of the power of the test statistic using the power for a significance levels (see in Figure 3) less than (denoted α 1 ) and greater than (denoted α 2 ) the desired significance level (denoted as α ) is preferred by many authors to overcome this problem (see, for example, [33]). Linear interpolation gives a weighting to the power based on how close α 1 and α 2 are to α . In this case, the power of the test is calculated according to the formula [19]:
P o w e r = ( α α 1 ) P ( T γ 2 ( α ) | H 1 ) + ( α 2 α ) P ( T γ 1 ( α ) | H 1 ) α 2 α 1 ,
where γ 1 ( α ) and γ 2 ( α )   are the critical values immediately below and above the significance level α . α 1 = P ( T γ 1 ( α ) | H 0 ) and α 2 = P ( T γ 2 ( α ) | H 0 ) are the significance levels for γ 1 ( α ) and γ 2 ( α ) , respectively.
The power of test statistics is determinate by the following steps [19]:
  • The distribution of the analyzed data x 1 , x 2 ,   , x n is formed.
  • Statistics of the compatibility hypothesis test criteria are calculated. If the obtain value of statistic is greater than the corresponding critical value ( α = 0.05 is used), then hypothesis H 0 is rejected.
  • Steps 1 and 2 are repeated for k (in our experiments, k = 1 , 000 , 000 ) times.
  • The power of a test is calculated as c o u n t / k , where c o u n t is the number of false hypotheses rejections.

4. Statistical Distributions

The simulation study considers fifteen statistical distributions for which the performance of the presented normality tests are assessed. Statistical distributions are grouped into three groups: symmetric, asymmetric, and modified normal distributions. A description of these distribution groups is presented in the following.

4.1. Symmetric Distributions

Symmetric distributions considered in this research are [20]:
  • three cases of the B e t a ( a ,   b ) distribution B e t a ( 0.5 ; 0.5 ) ,     B e t a ( 1 ; 1 ) ,   and B e t a ( 2 ; 2 ) , where a and b are the shape parameters;
  • three cases of the C a u c h y ( t ,   s ) distribution— C a u c h y ( 0 ; 0.5 ) ,     C a u c h y ( 0 ; 1 ) , and C a u c h y ( 0 ; 2 ) , where   t and   s are the location and scale parameters;
  • one case of the L a p l a c e ( t ,   s )   distribution L a p l a c e ( 0 ; 1 ) , where t and   s are the location and scale parameters;
  • one case of the L o g i s t i c ( t ,   s ) distribution L o g i s t i c ( 2 ; 2 ) , where t and   s are the location and scale parameters;
  • four cases of the t S t u d e n t ( ν ) distribution t ( 1 ) ,   t ( 2 ) ,   t ( 4 ) , and t ( 10 ) , where ν is the number of degrees of freedom;
  • five cases of the T u k e y ( λ ) distribution T u k e y ( 0.14 ) ,   T u k e y ( 0.5 ) ,   T u k e y ( 2 ) ,   T u k e y ( 5 ) , and T u k e y ( 10 ) , where λ is the shape parameter; and
  • one case of the standard normal N ( 0 ; 1 ) distribution.

4.2. Asymmetric Distributions

Asymmetric distributions considered in this research are [20]:
  • four cases of the B e t a ( a , b ) distribution B e t a ( 2 ; 1 ) ,   B e t a ( 2 ; 5 ) ,   B e t a ( 4 ; 0.5 ) , and B e t a ( 5 ; 1 ) ;
  • four cases of the C h i - s q u a r e d ( ν ) distribution χ 2 ( 1 ) ,   χ 2 ( 2 ) ,   χ 2 ( 4 ) , and   χ 2 ( 10 ) , where ν is the number of degrees of freedom;
  • six cases of the G a m m a ( a ,   b ) distribution— G a m m a ( 2 ; 2 ) ,   G a m m a ( 3 ; 2 ) ,   G a m m a ( 5 ; 1 ) ,   G a m m a ( 9 ; 1 ) ,   G a m m a ( 15 ; 1 ) , and G a m m a ( 100 ; 1 ) , where a and b are the shape and scale parameters;
  • one case of the G u m b e l ( t ,   s ) distribution G u m b e l ( 1 ; 2 ) , where t and   s are the location and scale parameters;
  • one case of the L o g n o r m a l ( t ,   s ) distribution L N ( 0 ; 1 ) , where t and   s are the location and scale parameters; and
  • four cases of the W e i b u l l ( a ,   b ) distribution W e i b u l l ( 0.5 ; 1 ) ,   W e i b u l l ( 1 ; 2 ) ,   W e i b u l l ( 2 ; 3.4 ) , and W e i b u l l ( 3 ; 4 ) , where a and b are the shape and scale parameters.

4.3. Modified Normal Distributions

Modified normal distributions considered in this research are [20]:
  • six cases of the standard normal distribution truncated at a and b T r u n c ( a ;   b ) T r u n c ( 1 ; 1 ) ,   T r u n c ( 2 ; 2 ) ,   T r u n c ( 3 ; 3 ) ,   T r u n c ( 2 ; 1 ) ,   T r u n c ( 3 ; 1 ) ,   and T r u n c ( 3 ; 2 ) , which are referred to as NORMAL1;
  • nine cases of a location-contaminated standard normal distribution, hereon termed L o C o n N ( p ;   a ) L o C o n N ( 0.3 ; 1 ) ,   L o C o n N ( 0.4 ; 1 ) ,   L o C o n N ( 0.5 ; 1 ) ,   L o C o n N ( 0.3 ; 3 ) ,   L o C o n N ( 0.4 ; 3 ) ,   L o C o n N ( 0.5 ; 3 ) ,   L o C o n N ( 0.3 ; 5 ) ,   L o C o n N ( 0.4 ; 5 ) ,   and L o C o n N ( 0.5 ; 5 ) , which are referred to as NORMAL2;
  • nine cases of a scale-contaminated standard normal distribution, hereon termed S c C o n N ( p ;   b ) S c C o n N ( 0.05 ; 0.25 ) ,   S c C o n N ( 0.10 ; 0.25 ) ,   S c C o n N ( 0.20 ; 0.25 ) ,   S c C o n N ( 0.05 ; 2 ) ,   S c C o n N ( 0.10 ; 2 ) ,   S c C o n N ( 0.20 ; 2 ) ,   S c C o n N ( 0.05 ; 4 ) ,   S c C o n N ( 0.10 ; 4 ) , and S c C o n N ( 0.20 ; 4 ) , which are referred to as NORMAL3; and
  • twelve cases of a mixture of normal distributions, hereon termed M i x N ( p ;   a ;   b ) M i x N ( 0.3 ; 1 ; 0.25 ) ,   M i x N ( 0.4 ; 1 ; 0.25 ) ,   M i x N ( 0.5 ; 1 ; 0.25 ) ,   M i x N ( 0.3 ; 3 ; 0.25 ) ,   M i x N ( 0.4 ; 3 ; 0.25 ) ,   M i x N ( 0.5 ; 3 ; 0.25 ) ,   M i x N ( 0.3 ; 1 ; 4 ) ,   M i x N ( 0.4 ; 1 ; 4 ) ,   M i x N ( 0.5 ; 1 ; 4 ) ,   M i x N ( 0.3 ; 3 ; 4 ) ,   M i x N ( 0.4 ; 3 ; 4 ) , and M i x N ( 0.5 ; 3 ; 4 ) , which are referred to as NORMAL4.

5. Simulation Study and Discussion

This section provides a comprehensive modeling study that is designed to evaluate the power of selected normality tests. This modeling study takes into account the effects of sample size, the level of significance ( α = 0.05 ) chosen, and the alternative type of distribution (Beta, Cauchy, Laplace, Logistic, Student, Chi-Square, Gamma, Gumbel, Lognormal, Weibull, and modified standard normal). The study was performed by applying 40 normality tests (including our proposed normality test) for the generated 1,000,000 standardized samples of size 32, 64, 128, 256, 512, and 1024.
The best set of parameters ( a ,   b ,   c ) was selected experimentally: the value of a was examined from 0.001 to 0.99 by step 0.01, the value of b was examined from 0.01 to 10 by step 0.01, and the value of c was examined from 0.5 to 50 by step 0.25. The N-metric test gave the most powerful results with the parameters: a = 0.95 ,   b = 0.25 ,   c = 1 . In those cases, a test has several modifications, we present results only for the best variant. The Table 1, Table 2 and Table 3 present average power obtained for the symmetric, asymmetric, and modified normal distribution sets, for samples sizes of 32, 64, 128, 256, 512, and 1024. By comparing Table 1, Table 2 and Table 3, it can be seen that the most powerful test for small samples was Hosking1 (H1), the most powerful test for large sample sizes was our presented test (N-metric). According to Table 1, Table 2 and Table 3, it is observed that for large sample sizes, most tests’ power is approaching 1 except for the D’Agostino (DA) test, the power of which is significantly lower.
An additional study was conducted to determine the exact minimal sample size at which the N-metric test (statistic (34) with kernel function (35)) is the most powerful for groups of symmetric, asymmetric, and modified normal distributions. Hosking1 and N-metric tests were applied for data sets of sizes: 80, 90, 100, 105, 110, and 115. The obtained results showed that the N-metric test was the most powerful for sample size 112 for the symmetric distributions, for sample size 118 for the asymmetric distributions, and for sample size 88 for a group of modified normal distributions (see in Table 4). The N-metric test is the most powerful for the Gamma distribution for sample size 32 . It has been observed that in the case of Cauchy and Lognormal distributions, the N-metric test is the most powerful when the sample size is 255 , which can be influenced by the long tail of these distributions.
To complement the results given in Table 1, Table 2 and Table 3, Figure 4 (and Figure A1, Figure A2 and Figure A3 in Appendix A) presents the average power results of the most powerful goodness-of-fit tests. Figure 4 presents two distributions from each group of symmetric (Standard normal and Student), asymmetric (Gamma and Gumbel), and modified normal (standard normal distribution truncated at a and b and location-contaminated standard normal distribution) distributions. Figures of all other distributions are given in Appendix A. In Figure 4, it can be seen that for the standard normal distribution, our proposed test (N-metric) is the most powerful when the sample size is 64 or larger. Figure 4 shows that our proposed test (N-metric) is the most powerful in the case of Gamma data distribution for all sample sizes examined. In general, it can be summarized that the power of the Chen–Shapiro (ChenS), Gel–Miao–Gastwirth (GMG), Hosking1 (H1), and Modified Shapiro–Wilk (SWRG) tests increases gradually with increasing sample size. The power of our proposed test (N-metric) increases abruptly when the sample size is 128 and its power value remains close to 1 for larger sample sizes.

6. Conclusions and Future Work

In this study, a comprehensive comparison of the power of popular normality tests was performed. Given the importance of this topic and the extensive development of normality tests, the proposed new normality test, the detailed test descriptions provided, and the power comparisons are relevant. Only univariate data were examined in this study of the power of normality tests (a study with multivariate data is planned for the future).
The study addresses the performance of 40 normality tests, for various sample sizes n for a number of symmetric, asymmetric, and modified normal distributions. A new goodness-of-fit test has been proposed. Its results are compared with other tests.
Based on the obtained modeling results, it was determined that the most powerful tests for the groups of symmetric, asymmetric, and modified normal distributions were Hosking1 (for smaller sample sizes) and our proposed N-metric (for larger sample sizes) test. The power of the Hosking1 test (for smaller sample sizes) is 1.5 to 7.99 percent higher than the second (by power) test for the groups of symmetric, asymmetric, and modified normal distributions. The power of the N-metric test (for larger sample sizes) is 6.2 to 16.26 percent higher than the second (by power) test for the groups of symmetric, asymmetric, and modified normal distributions.
The N-metric test is recommended to be used for symmetric data sets of size n 112 , for asymmetric data sets of size n 118 , and for bell-shaped distributed data sets of size n 88 .

Author Contributions

Data curation, J.A. and T.R.; formal analysis, J.A. and T.R.; investigation, J.A. and T.R.; methodology, J.A. and T.R.; software, J.A. and T.R.; supervision, T.R.; writing—original draft, J.A. and M.B.; writing—review and editing, J.A. and M.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Generated data sets were used in the study (see in Section 4).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Figure A1. Average empirical power results, for all sample sizes, for the groups of symmetric distributions of five powerful goodness-of-fit tests.
Figure A1. Average empirical power results, for all sample sizes, for the groups of symmetric distributions of five powerful goodness-of-fit tests.
Mathematics 09 00788 g0a1
Figure A2. Average empirical power results for the examined sample sizes for the groups of asymmetric distributions of five powerful goodness-of-fit tests.
Figure A2. Average empirical power results for the examined sample sizes for the groups of asymmetric distributions of five powerful goodness-of-fit tests.
Mathematics 09 00788 g0a2
Figure A3. Average empirical power results for the examined sample sizes for the groups of the modified normal distributions of five powerful goodness-of-fit tests.
Figure A3. Average empirical power results for the examined sample sizes for the groups of the modified normal distributions of five powerful goodness-of-fit tests.
Mathematics 09 00788 g0a3

References

  1. Barnard, G.A.; Barnard, G.A. Introduction to Pearson (1900) on the Criterion That a Given System of Deviations from the Probable in the Case of a Correlated System of Variables is Such That it Can be Reasonably Supposed to Have Arisen from Random Sampling; Springer Series in Statistics Breakthroughs in Statistics; Springer: Cham, Switzerland, 1992; pp. 1–10. [Google Scholar]
  2. Kolmogorov, A. Sulla determinazione empirica di una legge di distribuzione. Inst. Ital. Attuari Giorn. 1933, 4, 83–91. [Google Scholar]
  3. Adefisoye, J.; Golam Kibria, B.; George, F. Performances of several univariate tests of normality: An empirical study. J. Biom. Biostat. 2016, 7, 1–8. [Google Scholar]
  4. Anderson, T.W.; Darling, D.A. Asymptotic theory of certain “goodness-of-fit” criteria based on stochastic processes. Ann. Math. Stat. 1952, 23, 193–212. [Google Scholar] [CrossRef]
  5. Hosking, J.R.M.; Wallis, J.R. Some statistics useful in regional frequency analysis. Water Resour. Res. 1993, 29, 271–281. [Google Scholar] [CrossRef]
  6. Cabana, A.; Cabana, E.M. Goodness-of-Fit and Comparison Tests of the Kolmogorov-Smirnov Type for Bivariate Populations. Ann. Stat. 1994, 22, 1447–1459. [Google Scholar] [CrossRef]
  7. Chen, L.; Shapiro, S.S. An Alternative Test for Normality Based on Normalized Spacings. J. Stat. Comput. Simul. 1995, 53, 269–288. [Google Scholar] [CrossRef]
  8. Rahman, M.M.; Govindarajulu, Z. A modification of the test of Shapiro and Wilk for normality. J. Appl. Stat. 1997, 24, 219–236. [Google Scholar] [CrossRef]
  9. Ray, W.D.; Shenton, L.R.; Bowman, K.O. Maximum Likelihood Estimation in Small Samples. J. R. Stat. Soc. Ser. A 1978, 141, 268. [Google Scholar] [CrossRef]
  10. Zhang, P. Omnibus test of normality using the Q statistic. J. Appl. Stat. 1999, 26, 519–528. [Google Scholar] [CrossRef]
  11. Barrio, E.; Cuesta-Albertos, J.A.; Matrán, C.; Rodríguez-Rodríguez, J.M. Tests of goodness of fit based on the L2-Wasserstein distance. Ann. Stat. 1999, 27, 1230–1239. [Google Scholar]
  12. Glen, A.G.; Leemis, L.M.; Barr, D.R. Order statistics in goodness-of-fit testing. IEEE Trans. Reliab. 2001, 50, 209–213. [Google Scholar] [CrossRef]
  13. Bonett, D.G.; Seier, E. A test of normality with high uniform power. Comput. Stat. Data Anal. 2002, 40, 435–445. [Google Scholar] [CrossRef]
  14. Psaradakis, Z.; Vávra, M. Normality tests for dependent data: Large-sample and bootstrap approaches. Commun. Stat.-Simul. Comput. 2018, 49, 283–304. [Google Scholar] [CrossRef] [Green Version]
  15. Zhang, J.; Wu, Y. Likelihood-ratio tests for normality. Comput. Stat. Data Anal. 2005, 49, 709–721. [Google Scholar] [CrossRef]
  16. Gel, Y.R.; Miao, W.; Gastwirth, J.L. Robust directed tests of normality against heavy-tailed alternatives. Comput. Stat. Data Anal. 2007, 51, 2734–2746. [Google Scholar] [CrossRef]
  17. Coin, D. A goodness-of-fit test for normality based on polynomial regression. Comput. Stat. Data Anal. 2008, 52, 2185–2198. [Google Scholar] [CrossRef]
  18. Desgagné, A.; Lafaye de Micheaux, P. A powerful and interpretable alternative to the Jarque–Bera test of normality based on 2nd-power skewness and kurtosis, using the Rao’s score test on the APD family. J. Appl. Stat. 2017, 45, 2307–2327. [Google Scholar] [CrossRef]
  19. Steele, C.M. The Power of Categorical Goodness-Of-Fit Statistics. Ph.D. Thesis, Australian School of Environmental Studies, Warrandyte, Victoria, Australia, 2003. [Google Scholar]
  20. Romão, X.; Delgado, R.; Costa, A. An empirical power comparison of univariate goodness-of-fit tests for normality. J. Stat. Comput. Simul. 2010, 80, 545–591. [Google Scholar] [CrossRef]
  21. Choulakian, V.; Lockhart, R.; Stephens, M. Cramérvon Mises statistics for discrete distributions. Can. J. Stat. 1994, 22, 125–137. [Google Scholar] [CrossRef]
  22. Shapiro, S.S.; Wilk, M.B. An analysis of variance test for normality (complete samples). Biometrika 1965, 52, 591–611. [Google Scholar] [CrossRef]
  23. Lilliefors, H.W. On the Kolmogorov-Smirnov test for normality with mean and variance unknown. J. Am. Stat. Assoc. 1967, 62, 399–402. [Google Scholar] [CrossRef]
  24. Ahmad, F.; Khan, R.A. A power comparison of various normality tests. Pak. J. Stat. Oper. Res. 2015, 11, 331. [Google Scholar] [CrossRef] [Green Version]
  25. D’Agostino, R.B.; Pearson, E.S. Testing for departures from normality. I. Fuller empirical results for the distribution of b2 and √b1. Biometrika 1973, 60, 613–622. [Google Scholar]
  26. Filliben, J.J. The Probability Plot Correlation Coefficient Test for Normality. Technometrics 1975, 17, 111–117. [Google Scholar] [CrossRef]
  27. Martinez, J.; Iglewicz, B. A test for departure from normality based on a biweight estimator of scale. Biometrika 1981, 68, 331–333. [Google Scholar] [CrossRef]
  28. Epps, T.W.; Pulley, L.B. A test for normality based on the empirical characteristic function. Biometrika 1983, 70, 723–726. [Google Scholar] [CrossRef]
  29. Jarque, C.; Bera, A. Efficient tests for normality, homoscedasticity andserial independence of regression residuals. Econ. Lett. 1980, 6, 255–259. [Google Scholar] [CrossRef]
  30. Bakshaev, A. Goodness of fit and homogeneity tests on the basis of N-distances. J. Stat. Plan. Inference 2009, 139, 3750–3758. [Google Scholar] [CrossRef]
  31. Hill, T.; Lewicki, P. Statistics Methods and Applications; StatSoft: Tulsa, OK, USA, 2007. [Google Scholar]
  32. Kasiulevičius, V.; Denapienė, G. Statistikos taikymas mokslinių tyrimų analizėje. Gerontologija 2008, 9, 176–180. [Google Scholar]
  33. Damianou, C.; Kemp, A.W. New goodness of statistics for discrete and continuous data. Am. J. Math. Manag. Sci. 1990, 10, 275–307. [Google Scholar]
Figure 1. Plot of out kernel function K ( x ) with experimentally chosen optimal parameters a = 0.95 ,   b = 0.25 ,   and   c = 1 .
Figure 1. Plot of out kernel function K ( x ) with experimentally chosen optimal parameters a = 0.95 ,   b = 0.25 ,   and   c = 1 .
Mathematics 09 00788 g001
Figure 2. Illustration of the power.
Figure 2. Illustration of the power.
Mathematics 09 00788 g002
Figure 3. Significance levels of the statistic step function.
Figure 3. Significance levels of the statistic step function.
Mathematics 09 00788 g003
Figure 4. Average empirical power results, for the examined sample sizes, for the groups of symmetric, asymmetric, and modified normal distributions of five powerful goodness-of-fit tests.
Figure 4. Average empirical power results, for the examined sample sizes, for the groups of symmetric, asymmetric, and modified normal distributions of five powerful goodness-of-fit tests.
Mathematics 09 00788 g004
Table 1. Average empirical power obtained for a group of symmetric distributions.
Table 1. Average empirical power obtained for a group of symmetric distributions.
Sample Size
32641282565121024
TestsAD0.7140.7990.8630.9090.9390.955
BCMR0.7180.8090.8750.9200.9470.947
BHS0.4310.5510.6630.7520.8180.868
BHSBS0.6800.7780.7830.9030.9380.959
BM20.7260.8350.9050.9450.9650.974
BS0.7170.8100.8770.9200.9470.961
CC20.7120.8050.8730.9200.9490.936
CHI20.6630.7780.8420.8840.9410.945
CVM0.5910.7330.8050.8550.9190.949
ChenS0.7290.8060.8710.9150.9430.960
Coin0.7350.8300.8910.9300.9520.963
DA0.2660.2950.3140.3190.3150.311
DAP0.7230.8200.8830.9240.9480.962
DH0.7090.8050.8770.9250.9500.963
DLDMZEPD0.7300.8260.8890.9290.9520.963
EP0.7060.8280.9740.9100.9460.959
Filli0.7120.8050.8750.9220.9490.962
GG0.6580.7600.8500.9150.9490.962
GLB0.7120.7980.8630.9090.9430.918
GMG0.7870.8620.9140.9460.9650.975
H10.7990.8620.8520.9990.9990.999
JB0.6430.7620.8560.9180.9490.963
KS0.5850.7230.7890.8360.9050.939
Lillie0.6690.7580.8280.8830.9210.947
MI0.6320.6760.7050.7240.7360.745
N-metric0.2450.5850.9710.9990.9990.999
SF0.7150.8070.8760.9230.9490.962
SW0.7180.8080.8740.9190.9460.962
SWRG0.6940.7750.8340.8820.9160.946
ZQstar0.5130.5760.6300.6690.6970.718
ZW20.7150.8060.8690.9120.9390.957
Table 2. Average empirical power obtained for a group of asymmetric distributions.
Table 2. Average empirical power obtained for a group of asymmetric distributions.
Sample Size
32641282565121024
TestsAD0.7290.8350.9080.9490.9690.984
BCMR0.7490.8560.9240.9710.9950.991
BHS0.5290.6640.7690.8550.9150.950
BHSBS0.5380.6520.7470.9140.9020.944
BM20.7370.8590.9310.9650.9810.993
BS0.5060.5880.6650.7380.8050.859
CC20.5790.6820.7770.8530.9380.956
CHI20.6450.7990.8810.9340.9650.980
CVM0.5940.7550.8360.8870.9350.957
ChenS0.7560.8620.9280.9610.9780.991
Coin0.4800.5560.6300.7000.7690.916
DA0.2370.2230.2090.1980.1910.192
DAP0.7050.8260.9100.9550.9770.990
DH0.7240.8450.9210.9570.9770.991
DLDMXAPD0.7260.8430.9180.9550.9750.989
EP0.7530.8460.9130.9670.9750.993
Filli0.7320.8420.9150.9530.9740.991
GG0.6720.8050.8980.9490.9730.988
GLB0.7250.8310.9050.9870.9700.984
GMG0.6830.7510.8090.8590.9010.932
H10.8160.8960.8960.9990.9990.999
JB0.6620.8080.9040.9530.9750.989
KS0.5820.7360.8100.8630.9210.945
Lillie0.6710.7860.8720.9290.9590.976
MI0.6440.7310.7980.8430.8720.913
N-metric0.4640.7610.9900.9990.9990.999
SF0.7360.8460.9180.9550.9750.989
SW0.7530.8590.9250.9590.9770.991
SWRG0.7580.8610.9270.9600.9770.999
ZQstar0.5700.6390.6930.7320.7610.748
ZW20.7640.8700.9320.9620.9800.997
Table 3. Average empirical power obtained for a group of modified normal distributions.
Table 3. Average empirical power obtained for a group of modified normal distributions.
Sample Size
32641282565121024
TestsAD0.6620.7560.8250.8720.9050.931
BCMR0.6520.7560.8310.8800.9130.935
BHS0.4630.5850.6760.7440.7960.834
BHSBS0.5680.7010.7870.8470.8900.918
BM20.6410.7700.8540.9040.9340.953
BS0.5870.6880.7700.8330.8810.916
CC20.5760.6750.7630.8330.8870.923
CHI20.5660.7280.8080.8660.9140.939
CVM0.5570.7080.7790.8330.8970.930
ChenS0.6560.7590.8330.8820.9150.937
Coin0.5790.6910.7810.8460.8890.918
DA0.3140.3420.3670.3880.4050.418
DAP0.6170.7330.8180.8720.9060.930
DH0.6170.7270.8150.8720.9070.930
DLDMXAPD0.6510.7540.8310.8790.9120.935
EP0.6400.7480.8190.8650.9060.931
Filli0.6370.7430.8230.8770.9110.933
GG0.5290.6570.7750.8600.9060.932
GLB0.6590.7550.8230.8700.9030.930
GMG0.6880.7710.8360.8830.9170.942
H10.7430.8160.7990.9990.9990.999
JB0.5150.6620.7830.8610.9040.930
KS0.5640.7100.7720.8250.8930.924
Lillie0.6260.7240.7960.8500.8890.917
MI0.4940.5360.5630.5780.5850.590
N-metric0.2430.5820.9720.9990.9990.999
SF0.6420.7470.8260.8790.9120.934
SW0.6540.7580.8320.8820.9150.937
SWRG0.6430.7460.8180.8640.9010.931
ZQstar0.3940.4230.4500.4720.4870.498
ZW20.6400.7490.8260.8760.9070.931
Table 4. The minimal sample size at which the N-metric test is most powerful.
Table 4. The minimal sample size at which the N-metric test is most powerful.
Nr.DistributionGroups of Distributions Minimal   Sample   Size   ( n )
1.Standard normalSymmetric46
2.BetaSymmetric88
3.CauchySymmetric257
4.LaplaceSymmetric117
5.LogisticSymmetric71
6.StudentSymmetric96
7.BetaAsymmetric108
8.Chi-squareAsymmetric123
9.GammaAsymmetric<32
10.GumbelAsymmetric125
11.LognormalAsymmetric255
12.WeibullAsymmetric65
13.Normal1Modified normal70
14.Normal2Modified normal93
15.Normal3Modified normal72
16.Normal4Modified normal117
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Arnastauskaitė, J.; Ruzgas, T.; Bražėnas, M. An Exhaustive Power Comparison of Normality Tests. Mathematics 2021, 9, 788. https://doi.org/10.3390/math9070788

AMA Style

Arnastauskaitė J, Ruzgas T, Bražėnas M. An Exhaustive Power Comparison of Normality Tests. Mathematics. 2021; 9(7):788. https://doi.org/10.3390/math9070788

Chicago/Turabian Style

Arnastauskaitė, Jurgita, Tomas Ruzgas, and Mindaugas Bražėnas. 2021. "An Exhaustive Power Comparison of Normality Tests" Mathematics 9, no. 7: 788. https://doi.org/10.3390/math9070788

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop