Skip to main content

The Distribution of the Kolmogorov–Smirnov, Cramer–von Mises, and Anderson–Darling Test Statistics for Exponential Populations with Estimated Parameters

  • Chapter
  • First Online:
Computational Probability Applications

Abstract

This paper presents a derivation of the distribution of the Kolmogorov–Smirnov, Cramer–von Mises, and Anderson–Darling test statistics in the case of exponential sampling when the parameters are unknown and estimated from sample data for small sample sizes via maximum likelihood.

Originally published in Communications in Statistics—Simulation and Computation, Volume 37, Number 7 in 2008, this paper contains a derivation of the probability distribution of some goodness of fit statistics when parameters are estimated from the data. It is possible in reality only with the environment of APPL to work on these unique distributions. Piecewise distributions like those in Figures 13.5 and 13.13 are one of the strengths of APPL analysis. Also the procedures UniformRV and Transform are used in calculating the distribution of the W 22 and A 22 statistics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Cho, S. K., & Spiegelberg-Planer, R. (2002). Country nuclear power profiles. http, //www-pub.iaea.org/MTCD/publications/PDF/cnpp2003/CNPP_Webpage/PDF/2002/index.htm. Accessed 6 Dec 2007.

  2. D’Agostino, R. B., & Stephens, M. A. (1986). Goodness-of-fit techniques. New York: Marcel Dekker.

    Google Scholar 

  3. Drew, J. H., Glen, A. G., & Leemis, L. M. (2000). Computing the cumulative distribution function of the Kolmogorov–Smirnov statistic. Computational Statistics and Data Analysis 34, 1–15.

    Article  Google Scholar 

  4. Durbin, J. (1975). Kolmogorov–Smirnov tests when parameters are estimated with applications to tests of exponentiality and tests on spacings. Biometrika, 62, 5–22.

    Article  Google Scholar 

  5. Hogg, R. V., McKean, J. W., & Craig, A. T. (2005). Introduction to the mathematical statistics (6th ed.). Upper Saddle River, NJ: Prentice–Hall.

    Google Scholar 

  6. Law, A. M. (2007). Simulation modeling and analysis (4th ed.). New York: McGraw–Hill.

    Google Scholar 

  7. Lawless, J. F. (2003). Statistical models and methods for lifetime data (2nd ed.). New York: Wiley.

    Google Scholar 

  8. Lehmann, E. L. (1959). Testing statistical hypotheses. New York: Wiley.

    Google Scholar 

  9. Lilliefors, H. W. (1969). On the Kolmogorov–Smirnov test for the exponential distribution with mean unknown. Journal of the American Statistical Association, 64, 387–389.

    Article  Google Scholar 

  10. Marsaglia, G., Tsang, W. W., & Wang, J. (2003). Evaluating Kolmogorov’s distribution. Journal of Statistical Software, 8(18). http.www.jstatsoft.org/v08/i18/

  11. Rigdon, S., & Basu, A. P. (2000). Statistical methods for the reliability of repairable systems. New York: Wiley.

    Google Scholar 

  12. Stephens, M. A. (1974). EDF statistics for goodness of fit and some comparisons. Journal of the American Statistical Association, 69(347), 730–737.

    Article  Google Scholar 

Download references

Acknowledgements

The first author acknowledges summer support from Rose–Hulman Institute of Technology. The second and third authors acknowledge FRA support from the College of William & Mary. The authors also acknowledge the assistance of Bill Griffith, Thom Huber, and David Kelton in selecting data sets for the case studies in Section 13.3.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lawrence M. Leemis .

Editor information

Editors and Affiliations

Appendix: Distribution of D 3 for Exponential Sampling

Appendix: Distribution of D 3 for Exponential Sampling

The pattern that emerged in the piecewise representation of the PDF of D 2 led us to derive the PDF of D 3 to see if any similar patterns arose. This appendix contains a derivation of the distribution of the K–S test statistic when n = 3 observations x 1, x 2, and x 3 are drawn from an exponential population with fixed, positive, unknown mean θ. The maximum likelihood estimator is \(\hat{\theta} = (x_1 + x_2 + x_3) / 3\), which results in the fitted CDF

$$\displaystyle{\hat{F}(x) = 1 - e^{-x/\hat{\theta }}\qquad \qquad x> 0.}$$

Analogous to the n = 2 case, define

$$\displaystyle{y = \frac{x_{(1)}} {x_{(1)} + x_{(2)} + x_{(3)}}}$$

and

$$\displaystyle{z = \frac{x_{(2)}} {x_{(1)} + x_{(2)} + x_{(3)}}}$$

so that

$$\displaystyle{1 - y - z = \frac{x_{(3)}} {x_{(1)} + x_{(2)} + x_{(3)}}.}$$

The domain of definition of y and z is

$$\displaystyle{\mathcal{D} =\{ (y,z)\,\vert \,0 <y <z <(1 - y)/2\}.}$$

The values of the fitted CDF at the three order statistics are

$$\displaystyle{\hat{F}(x_{(1)}) = 1 - e^{-x_{(1)}/\hat{\theta }} = 1 - e^{-3y},}$$
$$\displaystyle{\hat{F}(x_{(2)}) = 1 - e^{-x_{(2)}/\hat{\theta }} = 1 - e^{-3z},}$$

and

$$\displaystyle{\hat{F}(x_{(3)}) = 1 - e^{-x_{(3)}/\hat{\theta }} = 1 - e^{-3(1-y-z)}.}$$

The vertical distances A, B, C, D, E, and F (as functions of y and z) are defined in a similar fashion to the n = 2 case (see Figure 13.2):

$$\displaystyle{\begin{array}{lllll} A & =&1 - e^{-3y} \\ B & =&\left \vert \frac{1} {3} -\left (1 - e^{-3y}\right )\right \vert & =&\left \vert e^{-3y} -\frac{2} {3}\right \vert \\ C & =&\left \vert \left (1 - e^{-3z}\right ) -\frac{1} {3}\right \vert & =&\left \vert e^{-3z} -\frac{2} {3}\right \vert \\ D& =&\left \vert \frac{2} {3} -\left (1 - e^{-3z}\right )\right \vert & =&\left \vert e^{-3z} -\frac{1} {3}\right \vert \\ E & =&\left \vert \left (1 - e^{-3(1-y-z)}\right ) -\frac{2} {3}\right \vert & =&\left \vert e^{-3(1-y-z)} -\frac{1} {3}\right \vert \\ F & =&1 -\left (1 - e^{-3(1-y-z)}\right ) & =&e^{-3(1-y-z)} \end{array} }$$

for \((y,z) \in \mathcal{D}\).

Figure 13.11 shows the regions associated with the maximum of A, B, C, D, E, F for \((y,z) \in \mathcal{D}\). In three dimensions, with D 3 = max{A, B, C, D, E, F} as the third axis, this figure appears to be a container with the region E at the bottom of the container and with each of the other four sides rising as they move away from their intersection with E. The absolute value signs that appear in the final formulas for B, C, D, and E above can be easily removed since, over the region \(\mathcal{D}\) associated with D 3, the expressions within the absolute value signs are always positive for B and D, but always negative for C and E. The distance F is never the largest of the six distances for any \((y,z) \in \mathcal{D}\), so it can be excluded from consideration. Table 13.2 gives the functional forms of the two-way intersections between the five regions shown in Figure 13.11. Note that the BC and AD curves, and the AC and BD curves, are identical.

Table 13.2 Intersections of regions A, B, C, D, and E in \(\mathcal{D}\)
Fig. 13.11
figure 11

Regions associated with max{A, B, C, D, E, F} over \((y,z) \in \mathcal{D}\)

In order to determine the breakpoints in the support for D 3, it is necessary to find the (y, z) coordinates of the three-way intersections of the five regions in Figure 13.11 and the two-way intersections of the regions on the boundary of \(\mathcal{D}\). Table 13.3 gives the values of y and z for these breakpoints on the boundary of \(\mathcal{D}\), along with the value of D 3 = max{A, B, C, D, E, F} at these values, beginning at (y, z) = (0, 1∕2) and proceeding in a counterclockwise direction. One point has been excluded from Table 13.3 because of the intractability of the values (y, z). The three-way intersection between regions A, C, and the line z = (1 − y)∕2 can only be expressed in terms of the solution to a cubic equation. After some algebra, the point of intersection is the decimal approximation \((y,z)\cong (0.1608,0.4196)\) and the associated value of D 3 is 2/3 minus the only real solution to the cubic equation

$$\displaystyle{3d^{3} + d^{2} - 3e^{-3} = 0,}$$

which yields

$$\displaystyle{d_{AC} = \frac{7} {9} - \frac{1} {18}\left (2916e^{-3} - 8 + c\right )^{1/3} -\frac{2} {9}\left (2916e^{-3} - 8 + c\right )^{-1/3}\cong 0.3827,}$$

where \(c = 108\sqrt{729e^{-6 } - 4e^{-3}}\).

Table 13.3 Intersection points along the boundary of \(\mathcal{D}\)

The three-way intersection points in the interior of \(\mathcal{D}\) are more difficult to determine than those on the boundary. The value of D 3 associated with each of these four points is the single real root of a cubic equation on the support of D 3. These equations and approximate solution values, in ascending order, are given in Table 13.4. For example, consider the value of the maximum at the intersection of regions A, C, and E in Figure 13.11. The value of D 3 must satisfy the cubic equation

$$\displaystyle{e^{3}\left (1 - d\right )\left (\frac{2} {3} - d\right )\left (\frac{1} {3} - d\right ) = 1,}$$

which yields

$$\displaystyle{d_{ACE} = \frac{\left (243 + c\right )^{2/3}12^{2/3}c - 243\left (243 + c\right )^{2/3}12^{2/3} + 144e^{5} - 12^{4/3}e^{4}(243 + c)^{1/3}} {216e^{5}},}$$
Table 13.4 Three-way interior intersection points of regions A, B, C, D, and E in \(\mathcal{D}\)

or approximately \(d_{ACE}\cong 0.19998\), in which \(c = \sqrt{59049 - 12e^{6}}\).

The largest value of D 3 = max{A, B, C, D, E} on \(\mathcal{D}\) occurs at the origin (y = 0 and z = 0) and has value 2/3, which is the upper limit of the support of D 3. The smallest value of D 3 on \(\mathcal{D}\) occurs at the intersection ACE and is \(d_{ACE}\cong 0.19998\), which is the lower limit of the support of D 3.

Determining the Joint Distribution of Y and Z. The next step is to determine the distribution of Y = X (1)∕(X (1) + X (2) + X (3)) and Z = X (2)∕(X (1) + X (2) + X (3)). Using an order statistic result from Hogg et al. [67, page 193], the joint PDF of X (1), X (2), and X (3) is

$$\displaystyle{g(x_{(1)},x_{(2)},x_{(3)}) = \frac{3!} {\theta ^{3}} \mathrm{exp}{\bigl ( - (x_{(1)} + x_{(2)} + x_{(3)})/\theta \bigr )}\qquad \ 0 <x_{(1)} \leq x_{(2)} \leq x_{(3)}.}$$

In order to determine the joint PDF of Y = X (1)∕(X (1) + X (2) + X (3)) and Z = X (2)∕(X (1) + X (2) + X (3)), define the dummy transformation W = X (3). The random variables Y, Z, and W define a one-to-one transformation from \(\mathcal{A} =\{ (x_{(1)},x_{(2)},x_{(3)})\,\vert \,0 <x_{(1)} \leq x_{(2)} \leq x_{(3)})\}\) to \(\mathcal{B} =\{ (y,z,w)\,\vert \,0 <y <z <(1 - y)/2,w> 0\}\). Since x (1) = yw∕(1 − yz), x (2) = zw∕(1 − yz), and x (3) = w, and the Jacobian of the inverse transformation is w 2∕(1 − yz)3, the joint PDF of Y, Z, and W on \(\mathcal{B}\) is

$$\displaystyle\begin{array}{rcl} h(y,z,w)& =& \frac{6} {\theta ^{3}} \mathrm{exp}\left (-\left ( \frac{yw + zw} {1 - y - z} + w\right )/\theta \right )\left \vert \frac{w^{2}} {(1 - y - z)^{3}}\right \vert {}\\ & =& \frac{6w^{2}} {\theta ^{3}(1 - y - z)^{3}}\mathrm{exp}\left (- \frac{w} {(1 - y - z)\theta }\right )\qquad (y,z,w) \in \mathcal{B}. {}\\ \end{array}$$

Integrating by parts, the joint PDF of Y and Z on \(\mathcal{D}\) is

$$\displaystyle{f_{Y,Z}(y,z) = \frac{6} {\theta ^{3}(1 - y - z)^{3}}\int _{0}^{\infty }w^{2}\,\mathrm{exp}\left (- \frac{w} {(1 - y - z)\theta }\right )dw = 12\ \ (y,z,w) \in \mathcal{D},}$$

i.e., Y and Z are uniformly distributed on \(\mathcal{D}\).

Determining the Distribution of D 3 . The CDF of D 3 will be defined in a piecewise manner, with breakpoints at the following ordered quantities: d ACE , d BCE , d ADE , d BDE , 1∕3, d AC , \(\frac{2} {3} - e^{-3/2}\), \(1 -\frac{1} {e}\), and 2∕3. The CDF \(F_{D_{3}}(d) =\Pr (D_{3} \leq d)\) is found by integrating the joint PDF of Y and Z over the appropriate limits, yielding

$$\displaystyle{F_{D_{3}}(d) = \left \{\begin{array}{ll} 0 &d <d_{ACE} \\ \frac{2} {3}\left [\ln \left (e^{3}[1 - d]\left [\frac{2} {3} - d\right ]\left [\frac{1} {3} - d\right ]\right )\right ]^{2} & d_{ ACE} \leq d <d_{BCE} \\ \frac{2} {3}\ln \left [e^{6}(1 - d)\left (\frac{2} {3} - d\right )^{2}\left (\frac{2} {3} + d\right )\left (\frac{1} {3} - d\right )^{2}\right ]& \\ \qquad \times \ln \left ( \frac{1-d} {2/3+d}\right ) &d_{BCE} \leq d <d_{ADE} \\ \frac{4} {3}\ln \left (\frac{d+1/3} {2/3-d}\right )\ln \left (\frac{d+2/3} {1-d} \right ) & \\ \qquad -\frac{2} {3}\left [\ln \left (e^{3}\left [d + \frac{2} {3}\right ]\left [d + \frac{1} {3}\right ]\left [\frac{1} {3} - d\right ]\right )\right ]^{2} & d_{ ADE} \leq d <d_{BDE} \\ \frac{4} {3}\ln \left (\frac{d+1/3} {2/3-d}\right )\ln \left (\frac{d+2/3} {1-d} \right ) &d_{BDE} \leq d <\frac{1} {3} \\ \frac{4} {3}\ln \left (\frac{2/3-d} {d+1/3}\right )\ln (1 - d) -\frac{2} {3}\left [\ln \left (\frac{d+1/3} {1-d} \right )\right ]^{2} & \frac{1} {3} \leq d <d_{AC} \\ 1 -\frac{2} {3}\left [\ln \left (d + \frac{1} {3})\right )\right ]^{2} -\left [1 +\ln \left (1 - d\right )\right ]^{2} & \\ \qquad - 3\left [1 + \frac{2} {3}\ln \left (\frac{2} {3} - d\right )\right ]^{2} & d_{ AC} \leq d <\frac{2} {3} - e^{-3/2} \\ 1 -\frac{2} {3}\left [\ln \left (d + \frac{1} {3})\right )\right ]^{2} -\left [1 +\ln \left (1 - d\right )\right ]^{2} & \frac{2} {3} - e^{-3/2} \leq d <1 - e^{-1} \\ 1 -\frac{2} {3}\left [\ln \left (d + \frac{1} {3})\right )\right ]^{2} & 1 - e^{-1} \leq d <\frac{2} {3} \\ 1 &d \geq \frac{2} {3}, \end{array} \right.}$$

which is plotted in Figure 13.12. Dots have been plotted at the breakpoints, with each of the lower four tightly-clustered breakpoints from Table 13.4 corresponding to a horizontal plane intersecting one of the four corners of region E in Figure 13.11. Percentiles of this distribution match the tabled values from Durbin [48]. We were not able to establish a pattern between the CDF of D 2 and the CDF of D 3 that might lead to a general expression for any n.

Fig. 13.12
figure 12

The CDF of D 3

APPL was again used to calculate moments of D 3. The decimal approximations for the mean, variance, skewness, and kurtosis, are, respectively, \(E(D_{3})\cong 0.3727\), \(V (D_{3})\cong 0.008804\), \(\gamma _{3}\cong 0.4541\), and \(\gamma _{4}\cong 2.6538\). Although the functional form of the eight-segment PDF of D 3 is too lengthy to display here, it is plotted in Figure 13.13, with the only non-obvious breakpoint being on the initial nearly-vertical segment at \({\bigl (d_{BCE},f_{D_{3}}(d_{BCE})\bigr )}\cong (0.2091,1.5624)\).

Fig. 13.13
figure 13

The PDF of D 3

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Evans, D.L., Drew, J.H., Leemis, L.M. (2017). The Distribution of the Kolmogorov–Smirnov, Cramer–von Mises, and Anderson–Darling Test Statistics for Exponential Populations with Estimated Parameters. In: Glen, A., Leemis, L. (eds) Computational Probability Applications. International Series in Operations Research & Management Science, vol 247. Springer, Cham. https://doi.org/10.1007/978-3-319-43317-2_13

Download citation

Publish with us

Policies and ethics