The Distribution of the Kolmogorov–Smirnov, Cramer–von Mises, and Anderson–Darling Test Statistics for Exponential Populations with Estimated Parameters

Evans, Diane L.; Drew, John H.; Leemis, Lawrence M.

doi:10.1007/978-3-319-43317-2_13

Diane L. Evans⁶,
John H. Drew⁷ &
Lawrence M. Leemis⁸

Part of the book series: International Series in Operations Research & Management Science ((ISOR,volume 247))

1398 Accesses
14 Citations

Abstract

This paper presents a derivation of the distribution of the Kolmogorov–Smirnov, Cramer–von Mises, and Anderson–Darling test statistics in the case of exponential sampling when the parameters are unknown and estimated from sample data for small sample sizes via maximum likelihood.

Originally published in Communications in Statistics—Simulation and Computation, Volume 37, Number 7 in 2008, this paper contains a derivation of the probability distribution of some goodness of fit statistics when parameters are estimated from the data. It is possible in reality only with the environment of APPL to work on these unique distributions. Piecewise distributions like those in Figures 13.5 and 13.13 are one of the strengths of APPL analysis. Also the procedures UniformRV and Transform are used in calculating the distribution of the W ²₂ and A ²₂ statistics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Cho, S. K., & Spiegelberg-Planer, R. (2002). Country nuclear power profiles. http, //www-pub.iaea.org/MTCD/publications/PDF/cnpp2003/CNPP_Webpage/PDF/2002/index.htm. Accessed 6 Dec 2007.
D’Agostino, R. B., & Stephens, M. A. (1986). Goodness-of-fit techniques. New York: Marcel Dekker.
Google Scholar
Drew, J. H., Glen, A. G., & Leemis, L. M. (2000). Computing the cumulative distribution function of the Kolmogorov–Smirnov statistic. Computational Statistics and Data Analysis 34, 1–15.
Article Google Scholar
Durbin, J. (1975). Kolmogorov–Smirnov tests when parameters are estimated with applications to tests of exponentiality and tests on spacings. Biometrika, 62, 5–22.
Article Google Scholar
Hogg, R. V., McKean, J. W., & Craig, A. T. (2005). Introduction to the mathematical statistics (6th ed.). Upper Saddle River, NJ: Prentice–Hall.
Google Scholar
Law, A. M. (2007). Simulation modeling and analysis (4th ed.). New York: McGraw–Hill.
Google Scholar
Lawless, J. F. (2003). Statistical models and methods for lifetime data (2nd ed.). New York: Wiley.
Google Scholar
Lehmann, E. L. (1959). Testing statistical hypotheses. New York: Wiley.
Google Scholar
Lilliefors, H. W. (1969). On the Kolmogorov–Smirnov test for the exponential distribution with mean unknown. Journal of the American Statistical Association, 64, 387–389.
Article Google Scholar
Marsaglia, G., Tsang, W. W., & Wang, J. (2003). Evaluating Kolmogorov’s distribution. Journal of Statistical Software, 8(18). http.www.jstatsoft.org/v08/i18/
Rigdon, S., & Basu, A. P. (2000). Statistical methods for the reliability of repairable systems. New York: Wiley.
Google Scholar
Stephens, M. A. (1974). EDF statistics for goodness of fit and some comparisons. Journal of the American Statistical Association, 69(347), 730–737.
Article Google Scholar

Download references

Acknowledgements

The first author acknowledges summer support from Rose–Hulman Institute of Technology. The second and third authors acknowledge FRA support from the College of William & Mary. The authors also acknowledge the assistance of Bill Griffith, Thom Huber, and David Kelton in selecting data sets for the case studies in Section 13.3.

Author information

Authors and Affiliations

Rose Hulman, Terre Haute, IN, USA
Diane L. Evans
William and Mary, Williamsburg, VA, USA
John H. Drew
The College of William and Mary, Williamsburg, VA, USA
Lawrence M. Leemis

Authors

Diane L. Evans
View author publications
You can also search for this author in PubMed Google Scholar
John H. Drew
View author publications
You can also search for this author in PubMed Google Scholar
Lawrence M. Leemis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lawrence M. Leemis .

Editor information

Editors and Affiliations

Department of Mathematics and Computer Science, The Colorado College, Colorado Springs, Colorado, USA
Andrew G. Glen
Department of Mathematics, The College of William and Mary, Williamsburg, Virginia, USA
Lawrence M. Leemis

Appendix: Distribution of D 3 for Exponential Sampling

The pattern that emerged in the piecewise representation of the PDF of D ₂ led us to derive the PDF of D ₃ to see if any similar patterns arose. This appendix contains a derivation of the distribution of the K–S test statistic when n = 3 observations x ₁, x ₂, and x ₃ are drawn from an exponential population with fixed, positive, unknown mean θ. The maximum likelihood estimator is $\hat{\theta} = (x_1 + x_2 + x_3) / 3$, which results in the fitted CDF

$$\displaystyle{\hat{F}(x) = 1 - e^{-x/\hat{\theta }}\qquad \qquad x> 0.}$$

Analogous to the n = 2 case, define

$$\displaystyle{y = \frac{x_{(1)}} {x_{(1)} + x_{(2)} + x_{(3)}}}$$

and

$$\displaystyle{z = \frac{x_{(2)}} {x_{(1)} + x_{(2)} + x_{(3)}}}$$

so that

$$\displaystyle{1 - y - z = \frac{x_{(3)}} {x_{(1)} + x_{(2)} + x_{(3)}}.}$$

The domain of definition of y and z is

$$\displaystyle{\mathcal{D} =\{ (y,z)\,\vert \,0 <y <z <(1 - y)/2\}.}$$

The values of the fitted CDF at the three order statistics are

$$\displaystyle{\hat{F}(x_{(1)}) = 1 - e^{-x_{(1)}/\hat{\theta }} = 1 - e^{-3y},}$$

$$\displaystyle{\hat{F}(x_{(2)}) = 1 - e^{-x_{(2)}/\hat{\theta }} = 1 - e^{-3z},}$$

and

$$\displaystyle{\hat{F}(x_{(3)}) = 1 - e^{-x_{(3)}/\hat{\theta }} = 1 - e^{-3(1-y-z)}.}$$

The vertical distances A, B, C, D, E, and F (as functions of y and z) are defined in a similar fashion to the n = 2 case (see Figure 13.2):

$$\displaystyle{\begin{array}{lllll} A & =&1 - e^{-3y} \\ B & =&\left \vert \frac{1} {3} -\left (1 - e^{-3y}\right )\right \vert & =&\left \vert e^{-3y} -\frac{2} {3}\right \vert \\ C & =&\left \vert \left (1 - e^{-3z}\right ) -\frac{1} {3}\right \vert & =&\left \vert e^{-3z} -\frac{2} {3}\right \vert \\ D& =&\left \vert \frac{2} {3} -\left (1 - e^{-3z}\right )\right \vert & =&\left \vert e^{-3z} -\frac{1} {3}\right \vert \\ E & =&\left \vert \left (1 - e^{-3(1-y-z)}\right ) -\frac{2} {3}\right \vert & =&\left \vert e^{-3(1-y-z)} -\frac{1} {3}\right \vert \\ F & =&1 -\left (1 - e^{-3(1-y-z)}\right ) & =&e^{-3(1-y-z)} \end{array} }$$

for $(y,z) \in \mathcal{D}$.

Figure 13.11 shows the regions associated with the maximum of A, B, C, D, E, F for $(y,z) \in \mathcal{D}$. In three dimensions, with D ₃ = max{A, B, C, D, E, F} as the third axis, this figure appears to be a container with the region E at the bottom of the container and with each of the other four sides rising as they move away from their intersection with E. The absolute value signs that appear in the final formulas for B, C, D, and E above can be easily removed since, over the region $\mathcal{D}$ associated with D ₃, the expressions within the absolute value signs are always positive for B and D, but always negative for C and E. The distance F is never the largest of the six distances for any $(y,z) \in \mathcal{D}$, so it can be excluded from consideration. Table 13.2 gives the functional forms of the two-way intersections between the five regions shown in Figure 13.11. Note that the BC and AD curves, and the AC and BD curves, are identical.

Table 13.2 Intersections of regions A, B, C, D, and E in $\mathcal{D}$

Full size table

In order to determine the breakpoints in the support for D ₃, it is necessary to find the (y, z) coordinates of the three-way intersections of the five regions in Figure 13.11 and the two-way intersections of the regions on the boundary of $\mathcal{D}$. Table 13.3 gives the values of y and z for these breakpoints on the boundary of $\mathcal{D}$, along with the value of D ₃ = max{A, B, C, D, E, F} at these values, beginning at (y, z) = (0, 1∕2) and proceeding in a counterclockwise direction. One point has been excluded from Table 13.3 because of the intractability of the values (y, z). The three-way intersection between regions A, C, and the line z = (1 − y)∕2 can only be expressed in terms of the solution to a cubic equation. After some algebra, the point of intersection is the decimal approximation $(y,z)\cong (0.1608,0.4196)$ and the associated value of D ₃ is 2/3 minus the only real solution to the cubic equation

$$\displaystyle{3d^{3} + d^{2} - 3e^{-3} = 0,}$$

which yields

$$\displaystyle{d_{AC} = \frac{7} {9} - \frac{1} {18}\left (2916e^{-3} - 8 + c\right )^{1/3} -\frac{2} {9}\left (2916e^{-3} - 8 + c\right )^{-1/3}\cong 0.3827,}$$

where $c = 108\sqrt{729e^{-6 } - 4e^{-3}}$.

Table 13.3 Intersection points along the boundary of $\mathcal{D}$

Full size table

The three-way intersection points in the interior of $\mathcal{D}$ are more difficult to determine than those on the boundary. The value of D ₃ associated with each of these four points is the single real root of a cubic equation on the support of D ₃. These equations and approximate solution values, in ascending order, are given in Table 13.4. For example, consider the value of the maximum at the intersection of regions A, C, and E in Figure 13.11. The value of D ₃ must satisfy the cubic equation

$$\displaystyle{e^{3}\left (1 - d\right )\left (\frac{2} {3} - d\right )\left (\frac{1} {3} - d\right ) = 1,}$$

which yields

$$\displaystyle{d_{ACE} = \frac{\left (243 + c\right )^{2/3}12^{2/3}c - 243\left (243 + c\right )^{2/3}12^{2/3} + 144e^{5} - 12^{4/3}e^{4}(243 + c)^{1/3}} {216e^{5}},}$$

Table 13.4 Three-way interior intersection points of regions A, B, C, D, and E in $\mathcal{D}$

Full size table

or approximately $d_{ACE}\cong 0.19998$, in which $c = \sqrt{59049 - 12e^{6}}$.

The largest value of D ₃ = max{A, B, C, D, E} on $\mathcal{D}$ occurs at the origin (y = 0 and z = 0) and has value 2/3, which is the upper limit of the support of D ₃. The smallest value of D ₃ on $\mathcal{D}$ occurs at the intersection ACE and is $d_{ACE}\cong 0.19998$, which is the lower limit of the support of D ₃.

Determining the Joint Distribution of Y and Z. The next step is to determine the distribution of Y = X ₍₁₎∕(X ₍₁₎ + X ₍₂₎ + X ₍₃₎) and Z = X ₍₂₎∕(X ₍₁₎ + X ₍₂₎ + X ₍₃₎). Using an order statistic result from Hogg et al. [67, page 193], the joint PDF of X ₍₁₎, X ₍₂₎, and X ₍₃₎ is

$$\displaystyle{g(x_{(1)},x_{(2)},x_{(3)}) = \frac{3!} {\theta ^{3}} \mathrm{exp}{\bigl ( - (x_{(1)} + x_{(2)} + x_{(3)})/\theta \bigr )}\qquad \ 0 <x_{(1)} \leq x_{(2)} \leq x_{(3)}.}$$

In order to determine the joint PDF of Y = X ₍₁₎∕(X ₍₁₎ + X ₍₂₎ + X ₍₃₎) and Z = X ₍₂₎∕(X ₍₁₎ + X ₍₂₎ + X ₍₃₎), define the dummy transformation W = X ₍₃₎. The random variables Y, Z, and W define a one-to-one transformation from $\mathcal{A} =\{ (x_{(1)},x_{(2)},x_{(3)})\,\vert \,0 <x_{(1)} \leq x_{(2)} \leq x_{(3)})\}$ to $\mathcal{B} =\{ (y,z,w)\,\vert \,0 <y <z <(1 - y)/2,w> 0\}$. Since x ₍₁₎ = yw∕(1 − y − z), x ₍₂₎ = zw∕(1 − y − z), and x ₍₃₎ = w, and the Jacobian of the inverse transformation is w ²∕(1 − y − z)³, the joint PDF of Y, Z, and W on $\mathcal{B}$ is

$$\displaystyle\begin{array}{rcl} h(y,z,w)& =& \frac{6} {\theta ^{3}} \mathrm{exp}\left (-\left ( \frac{yw + zw} {1 - y - z} + w\right )/\theta \right )\left \vert \frac{w^{2}} {(1 - y - z)^{3}}\right \vert {}\\ & =& \frac{6w^{2}} {\theta ^{3}(1 - y - z)^{3}}\mathrm{exp}\left (- \frac{w} {(1 - y - z)\theta }\right )\qquad (y,z,w) \in \mathcal{B}. {}\\ \end{array}$$

Integrating by parts, the joint PDF of Y and Z on $\mathcal{D}$ is

$$\displaystyle{f_{Y,Z}(y,z) = \frac{6} {\theta ^{3}(1 - y - z)^{3}}\int _{0}^{\infty }w^{2}\,\mathrm{exp}\left (- \frac{w} {(1 - y - z)\theta }\right )dw = 12\ \ (y,z,w) \in \mathcal{D},}$$

i.e., Y and Z are uniformly distributed on $\mathcal{D}$.

Determining the Distribution of D ₃ . The CDF of D ₃ will be defined in a piecewise manner, with breakpoints at the following ordered quantities: d _ACE, d _BCE, d _ADE, d _BDE, 1∕3, d _AC, $\frac{2} {3} - e^{-3/2}$, $1 -\frac{1} {e}$, and 2∕3. The CDF $F_{D_{3}}(d) =\Pr (D_{3} \leq d)$ is found by integrating the joint PDF of Y and Z over the appropriate limits, yielding

$$\displaystyle{F_{D_{3}}(d) = \left \{\begin{array}{ll} 0 &d <d_{ACE} \\ \frac{2} {3}\left [\ln \left (e^{3}[1 - d]\left [\frac{2} {3} - d\right ]\left [\frac{1} {3} - d\right ]\right )\right ]^{2} & d_{ ACE} \leq d <d_{BCE} \\ \frac{2} {3}\ln \left [e^{6}(1 - d)\left (\frac{2} {3} - d\right )^{2}\left (\frac{2} {3} + d\right )\left (\frac{1} {3} - d\right )^{2}\right ]& \\ \qquad \times \ln \left ( \frac{1-d} {2/3+d}\right ) &d_{BCE} \leq d <d_{ADE} \\ \frac{4} {3}\ln \left (\frac{d+1/3} {2/3-d}\right )\ln \left (\frac{d+2/3} {1-d} \right ) & \\ \qquad -\frac{2} {3}\left [\ln \left (e^{3}\left [d + \frac{2} {3}\right ]\left [d + \frac{1} {3}\right ]\left [\frac{1} {3} - d\right ]\right )\right ]^{2} & d_{ ADE} \leq d <d_{BDE} \\ \frac{4} {3}\ln \left (\frac{d+1/3} {2/3-d}\right )\ln \left (\frac{d+2/3} {1-d} \right ) &d_{BDE} \leq d <\frac{1} {3} \\ \frac{4} {3}\ln \left (\frac{2/3-d} {d+1/3}\right )\ln (1 - d) -\frac{2} {3}\left [\ln \left (\frac{d+1/3} {1-d} \right )\right ]^{2} & \frac{1} {3} \leq d <d_{AC} \\ 1 -\frac{2} {3}\left [\ln \left (d + \frac{1} {3})\right )\right ]^{2} -\left [1 +\ln \left (1 - d\right )\right ]^{2} & \\ \qquad - 3\left [1 + \frac{2} {3}\ln \left (\frac{2} {3} - d\right )\right ]^{2} & d_{ AC} \leq d <\frac{2} {3} - e^{-3/2} \\ 1 -\frac{2} {3}\left [\ln \left (d + \frac{1} {3})\right )\right ]^{2} -\left [1 +\ln \left (1 - d\right )\right ]^{2} & \frac{2} {3} - e^{-3/2} \leq d <1 - e^{-1} \\ 1 -\frac{2} {3}\left [\ln \left (d + \frac{1} {3})\right )\right ]^{2} & 1 - e^{-1} \leq d <\frac{2} {3} \\ 1 &d \geq \frac{2} {3}, \end{array} \right.}$$

which is plotted in Figure 13.12. Dots have been plotted at the breakpoints, with each of the lower four tightly-clustered breakpoints from Table 13.4 corresponding to a horizontal plane intersecting one of the four corners of region E in Figure 13.11. Percentiles of this distribution match the tabled values from Durbin [48]. We were not able to establish a pattern between the CDF of D ₂ and the CDF of D ₃ that might lead to a general expression for any n.

APPL was again used to calculate moments of D ₃. The decimal approximations for the mean, variance, skewness, and kurtosis, are, respectively, $E(D_{3})\cong 0.3727$, $V (D_{3})\cong 0.008804$, $\gamma _{3}\cong 0.4541$, and $\gamma _{4}\cong 2.6538$. Although the functional form of the eight-segment PDF of D ₃ is too lengthy to display here, it is plotted in Figure 13.13, with the only non-obvious breakpoint being on the initial nearly-vertical segment at ${\bigl (d_{BCE},f_{D_{3}}(d_{BCE})\bigr )}\cong (0.2091,1.5624)$.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Evans, D.L., Drew, J.H., Leemis, L.M. (2017). The Distribution of the Kolmogorov–Smirnov, Cramer–von Mises, and Anderson–Darling Test Statistics for Exponential Populations with Estimated Parameters. In: Glen, A., Leemis, L. (eds) Computational Probability Applications. International Series in Operations Research & Management Science, vol 247. Springer, Cham. https://doi.org/10.1007/978-3-319-43317-2_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-43317-2_13
Published: 02 December 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-43315-8
Online ISBN: 978-3-319-43317-2
eBook Packages: Business and ManagementBusiness and Management (R0)

Publish with us

Policies and ethics

The Distribution of the Kolmogorov–Smirnov, Cramer–von Mises, and Anderson–Darling Test Statistics for Exponential Populations with Estimated Parameters

Abstract

Access this chapter

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix: Distribution of D 3 for Exponential Sampling

Appendix: Distribution of D 3 for Exponential Sampling

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation

Appendix: Distribution of D ₃ for Exponential Sampling