Abstract
This paper presents a derivation of the distribution of the Kolmogorov–Smirnov, Cramer–von Mises, and Anderson–Darling test statistics in the case of exponential sampling when the parameters are unknown and estimated from sample data for small sample sizes via maximum likelihood.
Originally published in Communications in Statistics—Simulation and Computation, Volume 37, Number 7 in 2008, this paper contains a derivation of the probability distribution of some goodness of fit statistics when parameters are estimated from the data. It is possible in reality only with the environment of APPL to work on these unique distributions. Piecewise distributions like those in Figures 13.5 and 13.13 are one of the strengths of APPL analysis. Also the procedures UniformRV and Transform are used in calculating the distribution of the W 22 and A 22 statistics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cho, S. K., & Spiegelberg-Planer, R. (2002). Country nuclear power profiles. http, //www-pub.iaea.org/MTCD/publications/PDF/cnpp2003/CNPP_Webpage/PDF/2002/index.htm. Accessed 6 Dec 2007.
D’Agostino, R. B., & Stephens, M. A. (1986). Goodness-of-fit techniques. New York: Marcel Dekker.
Drew, J. H., Glen, A. G., & Leemis, L. M. (2000). Computing the cumulative distribution function of the Kolmogorov–Smirnov statistic. Computational Statistics and Data Analysis 34, 1–15.
Durbin, J. (1975). Kolmogorov–Smirnov tests when parameters are estimated with applications to tests of exponentiality and tests on spacings. Biometrika, 62, 5–22.
Hogg, R. V., McKean, J. W., & Craig, A. T. (2005). Introduction to the mathematical statistics (6th ed.). Upper Saddle River, NJ: Prentice–Hall.
Law, A. M. (2007). Simulation modeling and analysis (4th ed.). New York: McGraw–Hill.
Lawless, J. F. (2003). Statistical models and methods for lifetime data (2nd ed.). New York: Wiley.
Lehmann, E. L. (1959). Testing statistical hypotheses. New York: Wiley.
Lilliefors, H. W. (1969). On the Kolmogorov–Smirnov test for the exponential distribution with mean unknown. Journal of the American Statistical Association, 64, 387–389.
Marsaglia, G., Tsang, W. W., & Wang, J. (2003). Evaluating Kolmogorov’s distribution. Journal of Statistical Software, 8(18). http.www.jstatsoft.org/v08/i18/
Rigdon, S., & Basu, A. P. (2000). Statistical methods for the reliability of repairable systems. New York: Wiley.
Stephens, M. A. (1974). EDF statistics for goodness of fit and some comparisons. Journal of the American Statistical Association, 69(347), 730–737.
Acknowledgements
The first author acknowledges summer support from Rose–Hulman Institute of Technology. The second and third authors acknowledge FRA support from the College of William & Mary. The authors also acknowledge the assistance of Bill Griffith, Thom Huber, and David Kelton in selecting data sets for the case studies in Section 13.3.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix: Distribution of D 3 for Exponential Sampling
Appendix: Distribution of D 3 for Exponential Sampling
The pattern that emerged in the piecewise representation of the PDF of D 2 led us to derive the PDF of D 3 to see if any similar patterns arose. This appendix contains a derivation of the distribution of the K–S test statistic when n = 3 observations x 1, x 2, and x 3 are drawn from an exponential population with fixed, positive, unknown mean θ. The maximum likelihood estimator is \(\hat{\theta} = (x_1 + x_2 + x_3) / 3\), which results in the fitted CDF
Analogous to the n = 2 case, define
and
so that
The domain of definition of y and z is
The values of the fitted CDF at the three order statistics are
and
The vertical distances A, B, C, D, E, and F (as functions of y and z) are defined in a similar fashion to the n = 2 case (see Figure 13.2):
for \((y,z) \in \mathcal{D}\).
Figure 13.11 shows the regions associated with the maximum of A, B, C, D, E, F for \((y,z) \in \mathcal{D}\). In three dimensions, with D 3 = max{A, B, C, D, E, F} as the third axis, this figure appears to be a container with the region E at the bottom of the container and with each of the other four sides rising as they move away from their intersection with E. The absolute value signs that appear in the final formulas for B, C, D, and E above can be easily removed since, over the region \(\mathcal{D}\) associated with D 3, the expressions within the absolute value signs are always positive for B and D, but always negative for C and E. The distance F is never the largest of the six distances for any \((y,z) \in \mathcal{D}\), so it can be excluded from consideration. Table 13.2 gives the functional forms of the two-way intersections between the five regions shown in Figure 13.11. Note that the BC and AD curves, and the AC and BD curves, are identical.
In order to determine the breakpoints in the support for D 3, it is necessary to find the (y, z) coordinates of the three-way intersections of the five regions in Figure 13.11 and the two-way intersections of the regions on the boundary of \(\mathcal{D}\). Table 13.3 gives the values of y and z for these breakpoints on the boundary of \(\mathcal{D}\), along with the value of D 3 = max{A, B, C, D, E, F} at these values, beginning at (y, z) = (0, 1∕2) and proceeding in a counterclockwise direction. One point has been excluded from Table 13.3 because of the intractability of the values (y, z). The three-way intersection between regions A, C, and the line z = (1 − y)∕2 can only be expressed in terms of the solution to a cubic equation. After some algebra, the point of intersection is the decimal approximation \((y,z)\cong (0.1608,0.4196)\) and the associated value of D 3 is 2/3 minus the only real solution to the cubic equation
which yields
where \(c = 108\sqrt{729e^{-6 } - 4e^{-3}}\).
The three-way intersection points in the interior of \(\mathcal{D}\) are more difficult to determine than those on the boundary. The value of D 3 associated with each of these four points is the single real root of a cubic equation on the support of D 3. These equations and approximate solution values, in ascending order, are given in Table 13.4. For example, consider the value of the maximum at the intersection of regions A, C, and E in Figure 13.11. The value of D 3 must satisfy the cubic equation
which yields
or approximately \(d_{ACE}\cong 0.19998\), in which \(c = \sqrt{59049 - 12e^{6}}\).
The largest value of D 3 = max{A, B, C, D, E} on \(\mathcal{D}\) occurs at the origin (y = 0 and z = 0) and has value 2/3, which is the upper limit of the support of D 3. The smallest value of D 3 on \(\mathcal{D}\) occurs at the intersection ACE and is \(d_{ACE}\cong 0.19998\), which is the lower limit of the support of D 3.
Determining the Joint Distribution of Y and Z. The next step is to determine the distribution of Y = X (1)∕(X (1) + X (2) + X (3)) and Z = X (2)∕(X (1) + X (2) + X (3)). Using an order statistic result from Hogg et al. [67, page 193], the joint PDF of X (1), X (2), and X (3) is
In order to determine the joint PDF of Y = X (1)∕(X (1) + X (2) + X (3)) and Z = X (2)∕(X (1) + X (2) + X (3)), define the dummy transformation W = X (3). The random variables Y, Z, and W define a one-to-one transformation from \(\mathcal{A} =\{ (x_{(1)},x_{(2)},x_{(3)})\,\vert \,0 <x_{(1)} \leq x_{(2)} \leq x_{(3)})\}\) to \(\mathcal{B} =\{ (y,z,w)\,\vert \,0 <y <z <(1 - y)/2,w> 0\}\). Since x (1) = yw∕(1 − y − z), x (2) = zw∕(1 − y − z), and x (3) = w, and the Jacobian of the inverse transformation is w 2∕(1 − y − z)3, the joint PDF of Y, Z, and W on \(\mathcal{B}\) is
Integrating by parts, the joint PDF of Y and Z on \(\mathcal{D}\) is
i.e., Y and Z are uniformly distributed on \(\mathcal{D}\).
Determining the Distribution of D 3 . The CDF of D 3 will be defined in a piecewise manner, with breakpoints at the following ordered quantities: d ACE , d BCE , d ADE , d BDE , 1∕3, d AC , \(\frac{2} {3} - e^{-3/2}\), \(1 -\frac{1} {e}\), and 2∕3. The CDF \(F_{D_{3}}(d) =\Pr (D_{3} \leq d)\) is found by integrating the joint PDF of Y and Z over the appropriate limits, yielding
which is plotted in Figure 13.12. Dots have been plotted at the breakpoints, with each of the lower four tightly-clustered breakpoints from Table 13.4 corresponding to a horizontal plane intersecting one of the four corners of region E in Figure 13.11. Percentiles of this distribution match the tabled values from Durbin [48]. We were not able to establish a pattern between the CDF of D 2 and the CDF of D 3 that might lead to a general expression for any n.
APPL was again used to calculate moments of D 3. The decimal approximations for the mean, variance, skewness, and kurtosis, are, respectively, \(E(D_{3})\cong 0.3727\), \(V (D_{3})\cong 0.008804\), \(\gamma _{3}\cong 0.4541\), and \(\gamma _{4}\cong 2.6538\). Although the functional form of the eight-segment PDF of D 3 is too lengthy to display here, it is plotted in Figure 13.13, with the only non-obvious breakpoint being on the initial nearly-vertical segment at \({\bigl (d_{BCE},f_{D_{3}}(d_{BCE})\bigr )}\cong (0.2091,1.5624)\).
Rights and permissions
Copyright information
© 2017 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Evans, D.L., Drew, J.H., Leemis, L.M. (2017). The Distribution of the Kolmogorov–Smirnov, Cramer–von Mises, and Anderson–Darling Test Statistics for Exponential Populations with Estimated Parameters. In: Glen, A., Leemis, L. (eds) Computational Probability Applications. International Series in Operations Research & Management Science, vol 247. Springer, Cham. https://doi.org/10.1007/978-3-319-43317-2_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-43317-2_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-43315-8
Online ISBN: 978-3-319-43317-2
eBook Packages: Business and ManagementBusiness and Management (R0)