Abstract
It was the Fall of 1978. I had just finished my masters in statistics and started out as a PhD student in the stat-math division at the ISI in Calcutta. Teachers of the calibre of B.V. Rao and Ashok Maitra had taught me an enormous amount of mathematics and probability theory. But deep inside me I was curious to learn much more of statistical theory. Unfortunately, Basu had already left and moved to the US, and C.R. Rao was rarely seen in the Calcutta center. I considered following Basu to Tallahassee, but my friend Rao Chaganty warned me that the weather was so outlandishly good that I would probably never graduate. My other favorite teacher T. Krishnan was primarily interested in applied statistics, and J.K. Ghosh had only just returned from his visit to Pittsburgh. I remember being given a problem on admissibility; but, alas, that too turned out to be a modest extension of Karlin [30].
You have full access to this open access chapter, Download chapter PDF
It was the Fall of 1978. I had just finished my masters in statistics and started out as a PhD student in the stat-math division at the ISI in Calcutta. Teachers of the calibre of B.V. Rao and Ashok Maitra had taught me an enormous amount of mathematics and probability theory. But deep inside me I was curious to learn much more of statistical theory. Unfortunately, Basu had already left and moved to the US, and C.R. Rao was rarely seen in the Calcutta center. I considered following Basu to Tallahassee, but my friend Rao Chaganty warned me that the weather was so outlandishly good that I would probably never graduate. My other favorite teacher T. Krishnan was primarily interested in applied statistics, and J.K. Ghosh had only just returned from his visit to Pittsburgh. I remember being given a problem on admissibility; but, alas, that too turned out to be a modest extension of Karlin [30].
ISI allowed its students an unlimited amount of laziness and vagrancy, and I exploited this executive nonchalance gratuitously. I was not doing anything that I wanted to admit. Stat-Math was then located in an unpretentious, dark old building across from the central pond in the main campus. One day I was intrigued to see a new face; a visitor from Australia, someone whispered. In a week or so, the office sent out an announcement of a course on sufficiency by our visitor; the name was Terence P. Speed. That is how I first met Terry 34 years ago, and became one of his early students. Much later, I came to know that he was professionally and personally close to Basu, who had an enduring influence on my life. Together, Terry and Basu prepared a comprehensive bibliography of sufficiency [8]. They had intended to write a book, but communication at great distances was not such a breeze 40 years ago, and the book never came into being. Most recently, Terry and I worked together on summarizing Basu’s work for the Selected Works series of Springer. I am deeply honored and touched to be asked to write this commentary on Terry’s contributions to statistics, and particularly to sufficiency. Terry has worked on such an incredible variety of areas and problems that I will limit myself to just a few of his contributions that have directly influenced my own work and education. Sufficiency is certainly one of them. My perspective and emphasis will be rather different from other survey articles on it, such as Yamada and Morimoto [51].
For someone who does not believe in a probability model, sufficiency is of no use. It is also of only limited use in the robustness doctrine. I think, however, that the importance of sufficiency in inference must be evaluated in the context of the time. The idea of data summarization in the form of a low dimensional statistic without losing information must have been intrinsically attractive and also immensely useful when Fisher first formulated it [23]. In addition, we now know the various critical links of sufficiency to both the foundations of statistics, and to the elegant and structured theory of optimal procedures in inference.
For example, the links to the (weak and the strong) likelihood principle and conditionality principle are variously summarized in the engaging presentations in Barnard [3], Basu [6], Berger and Wolpert [10], Birnbaum [14], Fraser [26], and Savage [42]. And we are also all aware of such pillars of the mathematical theory of optimality, the Rao-Blackwell and the Lehmann-Scheffé theorem [12, 35], which are inseparably connected to sufficient statistics. At the least, sufficiency has acted as a nucleus around which an enormous amount of later development of ideas, techniques, and results have occurred. Some immediate examples are the theory of ancillarity, monotone likelihood ratio, exponential families, invariance, and asymptotic equivalence [5, 17, 18, 22, 33, 36, 38]. Interesting work relating sparse order statistics (e.g., a small fraction of the largest ones) to approximate sufficiency is done in Reiss [40], and approximate sufficiency and approximate ancillarity are given a direct definition, with consequences, in DasGupta [20]. We also have the coincidence that exact and nonasymptotic distributional and optimality calculations can be done precisely in those cases where a nontrivial sufficient statistic exists. The fundamental nature of the idea of sufficiency thus cannot be minimized; not yet.
Collectively, Kolmogorov, Neyman, Bahadur, Dynkin, Halmos, and Savage, among many other key architects, put sufficiency on the rigorous mathematical pedal. If \(\{P,P \in \mathcal{P}\}\) is a family of probability measures on a measurable space \((\Omega, \mathcal{A})\), a sub σ-field \(\mathcal{B}\) of \(\mathcal{A}\) is sufficient if for each measurable set \(A \in \mathcal{A}\), there is a (single) \(\mathcal{B}\) measurable function g A such that \({g}_{A} = {E}_{P}({I}_{A}\,\vert \mathcal{B}),\mbox{ a.e.}\,(P)\forall \,P \in \mathcal{P}\). This is rephrased in terms of a sufficient statistic by saying that if \(T : (\Omega, \mathcal{A})\rightarrow (\Omega^{\prime},\mathcal{A}^{\prime})\) is a mapping from the original (measurable) space to another space, then T is a sufficient statistic if \(\mathcal{B} = {\mathcal{B}}_{T} = {T}^{-1}(\mathcal{A}^{\prime})\) is a sufficient sub σ-field of \(\mathcal{A}\). In a classroom situation, the family \(\mathcal{P}\) is often parametrized by a finite dimensional parameter θ, and we describe sufficiency as the conditional distribution of any other statistic given the sufficient statistic being independent of the underlying parameter θ. Existence of a fixed dimensional sufficient statistic for all sample sizes is a rare phenomenon for regular families of distributions, and is limited to the multiparameter exponential family (Barankin and Maitra [2], Brown [16]; it is also mentioned in Lehmann [34]). Existence of a fixed dimensional sufficient statistic in location-scale families has some charming (and perhaps unexpected) connections to the Cauchy-Deny functional equation [29, 32, 39].
Sufficiency corresponds to summarization without loss of information, and so the maximum such possible summarization is of obvious interest. A specific sub σ-field \({\mathcal{B}}^{{_\ast}}\) is a minimal sufficient sub σ-field if for any other sufficient sub σ-field \(\mathcal{B}\), we have the inclusion that \({\mathcal{B}}^{{_\ast}}\vee {\mathcal{N}}_{\mathcal{P}}\subseteq \mathcal{B}\vee {\mathcal{N}}_{\mathcal{P}}\), where \({\mathcal{N}}_{\mathcal{P}}\) is the family of all \(\mathcal{P}\)-null members of \(\mathcal{A}\). In terms of statistics, a specific sufficient statistic T ∗ is minimal sufficient if given any other sufficient statistic T, we can write T ∗ as \({T}^{{_\ast}} = h \circ T\,\mbox{ a.e.}\,\mathcal{P}\), i.e., a minimal sufficient statistic is a function of every sufficient statistic. A sufficient statistic that is also boundedly complete is minimal sufficient.
This fact does place completeness as a natural player on the scene rather than as a mere analytical necessity; of course, another well known case is Basu’s theorem [4]. The converse is not necessarily true; that is, a minimal sufficient statistic need not be boundedly complete. The location parameter t densities provide a counterexample, where the vector of order statistics is minimal sufficient, but clearly not boundedly complete. It is true, however, that in somewhat larger families of densities, the vector of order statistics is complete, and hence boundedly complete [9]. If we think of a statistic as a partition of the sample space, then the partitions corresponding to a minimal sufficient statistic T ∗ can be constructed by the rule that T ∗ (x) = T ∗ (y) if and only if the likelihood ratio \(\frac{{f}_{\theta }(x)} {{f}_{\theta }(y)}\) is independent of θ. Note that this rule applies only to the dominated case, with f θ(x) being the density (Radon-Nikodym derivative) of P θ with respect to the relevant dominating measure.
Halmos and Savage [28] gave the factorization theorem for characterizing a sufficient sub σ-field, which says that if each \(P \in \mathcal{P}\) is assumed to be absolutely continuous with respect to some P 0 (which we may pick to be in the convex hull of \(\mathcal{P}\)), then a given sub σ-field \(\mathcal{B}\) is sufficient if and only if for each \(P \in \mathcal{P}\), we can find a \(\mathcal{B}\) measurable function g P such that the identity dP = g P dP 0 holds. Note that we insist on g P being \(\mathcal{B}\) measurable, rather than being simply \(\mathcal{A}\) measurable (which would be no restriction, and would not serve the purpose of data summarization). Once again, in a classroom situation, we often describe this as T being sufficient if and only if we can write the joint density f θ(x) as f θ(x) = g θ(T(x))p 0(x) for some g and p 0. The factorization theorem took the guessing game out of the picture in the dominated case, and is justifiably regarded as a landmark advance. I will shortly come to Terry Speed’s contribution on the factorization theorem.
Sufficiency comes in many colors, which turn out to be equivalent under special sets of conditions (e.g. Roy and Ramamoorthi [41]). I will loosely describe a few of these notions. We have Blackwell sufficiency [15] which corresponds to sufficiency of an experiment as defined via comparison of experiments [48, 50], Bayes sufficiency which corresponds to the posterior measure under any given prior depending on the data x only through T(x), and prediction sufficiency (also sometimes called adequacy) which legislates that to predict an unobserved Y defined on some space \((\Omega^{{\prime}{\prime}},\mathcal{A}^{{\prime}{\prime}})\) on the basis of an observed X defined on \((\Omega, \mathcal{A})\), it should be enough to only consider predictors based on T(X). See, for example, Takeuchi and Akahira [49], and also the earlier articles Bahadur [1] and Skibinsky [44]. I would warn the reader that the exact meaning of prediction sufficiency is linked to the exact assumptions on the prediction loss function. Likewise, Bayes sufficiency need not be equivalent to ordinary sufficiency unless \((\Omega, \mathcal{A})\) is a standard Borel space, i.e., unless \(\mathcal{A}\) coincides with the Borel σ-field corresponding to some compact metrizable topology on Ω.
Consider now the enlarged class of probability distributions defined as \({P}_{C}(A) = P(X \in A\,\vert Y \in C),P \in \mathcal{P},C \in \mathcal{A}^{{\prime}{\prime}}\). Bahadur leads us to the conclusion that prediction sufficiency is equivalent to sufficiency in this enlarged family of probability measures. A major result due to Terry Speed is the derivation of a factorization theorem for characterizing a prediction sufficient statistic in the dominated case [45]. A simply stated but illuminating example in Section 6 of Speed’s article shows why the particular version of the factorization theorem he gives can be important in applications. As far as I know, a theory of partial adequacy, akin to partial sufficiency [7, 25, 27], has never been worked out. However, I am not sure how welcome it will now be, considering the diminishing importance of probability and models in prevalent applied statistics.
Two other deep and delightful papers of Terry that I am familiar with are his splendidly original paper on spike train deconvolution [37], and his paper on Gaussian distributions over finite simple graphs [47]. These two papers are precursors to what we nowadays call independent component analysis and graphical models. Particularly, the spike train deconvolution paper leads us to good problems in need of solution. However, I will refrain from making additional comments on it in order to spend some time on a most recent writing of Terry that directly influenced me.
In his editorial column in the IMS Bulletin [46], Terry describes the troublesome scenario of irreconcilable quantitative values obtained in bioassays conducted under different physical conditions at different laboratories (actually, he describes, specifically, the example of reporting the expression level of the HER2 protein in breast cancer patients). He cites an earlier classic paper of Youden [52], which I was not previously familiar with. Youden informally showed the tendency of a point estimate derived from one experiment to fall outside of the error bounds reported by another experiment. In Youden’s cases, this was usually caused by an unmodelled latent bias, and once the bias was taken care of, the conundrum mostly disappeared.
Inspired by Terry’s column, I did some work on reconcilability of confidence intervals found from different experiments, even if there are no unmodelled biases. What I found rather surprised me. Theoretical calculations led to the conclusion that in as few as 10 experiments, it could be quite likely that the confidence intervals would be nonoverlapping. In meta-analytic studies, particularly in clinical trial contexts, the number of experiments combined is frequently 20, 25, or more. This leads to the apparently important question: how does one combine independent confidence intervals when they are incompatible? We have had some of our best minds think about related problems; for example, Fisher [24], Birnbaum [13], Koziol and Perlman [31], Berk and Cohen [11], Cohen et al. [19], and Singh et al. [43]. Holger Dette and I recently collaborated on this problem and derived some exact results and some asymptotic theory involving extremes [21]. It was an exciting question for us, caused by a direct influence of Terry.
Human life is a grand collage of countless events and emotions, triumphs and defeats, love and hurt, joy and sadness, the extraordinary and the mundane. I have seen life from both sides now, tears and fears and feeling proud, dreams and schemes and circus crowds. But it is still my life’s illusion of those wonderful years in the seventies that I recall fondly in my life’s journey. Terry symbolizes that fantasy and uncomplicated part of my life. I am grateful to have had this opportunity to write a few lines about Terry; prendre soin, Terry, my teacher and my friend.
References
R. R. Bahadur. Sufficiency and statistical decision functions. Ann. Math. Statist., 25:423–462, 1954.
E. W. Barankin and A. P. Maitra. Generalization of the Fisher-Darmois-Koopman-Pitman theorem on sufficient statistics. Sankhyā Ser. A, 25:217–244, 1963.
G. A. Barnard. Comments on Stein’s “A remark on the likelihood principle”. J. Roy. Stat. Soc. A, 125:569–573, 1962.
D. Basu. On statistics independent of a complete sufficient statistic. Sankhyā, 15:377–380, 1955.
D. Basu. The family of ancillary statistics. Sankhyā, 21:247–256, 1959.
D. Basu. Statistical information and likelihood. Sankhyā Ser. A, 37(1): 1–71, 1975. Discussion and correspondence between Barnard and Basu.
D. Basu. On partial sufficiency: A review. J. Statist. Plann. Inference, 2(1):1–13, 1978.
D. Basu and T. P. Speed. Bibliography of sufficiency. Mimeographed Technical report, Manchester, 1975.
C. B. Bell, D. Blackwell, and L. Breiman. On the completeness of order statistics. Ann. Math. Statist., 31:794–797, 1960.
J. O. Berger and R. L. Wolpert. The Likelihood Principle, volume 6 of Lecture Notes – Monograph Series. Institute of Mathematical Statistics, Hayward, CA, 2nd edition, 1988.
R. H. Berk and A. Cohen. Asymptotically optimal methods of combining tests. J. Am. Stat. Assoc., 74(368):812–814, 1979.
P. J. Bickel and K. A. Doksum. Mathematical Statistics: Basic Ideas and Selected Topics, volume I of Holden-Day Series in Probability and Statistics. Holden-Day, Inc., San Francisco, CA, 1977.
A. Birnbaum. Combining independent tests of significance. J. Am. Stat. Assoc., 49:559–574, 1954.
A. Birnbaum. On the foundations of statistical inference. J. Am. Stat. Assoc., 57:269–326, 1962.
D. Blackwell. Comparison of experiments. In Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability, 1950, pages 93–102, Berkeley and Los Angeles, 1951. University of California Press.
L. Brown. Sufficient statistics in the case of independent random variables. Ann. Math. Statist., 35:1456–1474, 1964.
L. D. Brown. Fundamentals of Statistical Exponential Families with Applications in Statistical Decision Theory, volume 9 of Lecture Notes – Monograph Series. Institute of Mathematical Statistics, Hayward, CA, 1986.
L. D. Brown and M. G. Low. Asymptotic equivalence of nonparametric regression and white noise. Ann. Stat., 24(6):2384–2398, 1996.
A. Cohen, J. I. Marden, and K. Singh. Second order asymptotic and nonasymptotic optimality properties of combined tests. J. Statist. Plann. Inference, 6(3):253–276, 1982.
A. DasGupta. Extensions to Basu’s theorem, factorizations, and infinite divisibility. J. Statist. Plann. Inference, 137:945–952, 2007.
A. DasGupta and H. Dette. On the reconcilability and combination of independent confidence intervals. Preprint, 2011.
P. Dawid. Basu on ancillarity. In Selected Works of Debabrata Basu, Selected Works in Probability and Statistics, pages 5–8. Springer, 2011.
R. A. Fisher. On the mathematical foundations of theoretical statistics. Phil. Trans. R. Soc. Lond. A, 222:309–368, 1922.
R. A. Fisher. Statistical Methods for Research Workers. Oliver & Boyd, 14th, revised edition, 1970.
D. A. S. Fraser. Sufficient statistics with nuisance parameters. Ann. Math. Statist., 27:838–842, 1956.
D. A. S. Fraser. The Structure of Inference. John Wiley & Sons Inc., New York, 1968.
J. Hájek. On basic concepts of statistics. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability (Berkeley, Calif., 1965/66), Vol. I: Statistics, pages 139–162, Berkeley, CA, 1967. University of California Press.
P. R. Halmos and L. J. Savage. Application of the Radon-Nikodym theorem to the theory of sufficient statistics. Ann. Math. Statist., 20: 225–241, 1949.
V. S. Huzurbazar. Sufficient Statistics: Selected Contributions. Marcel Dekker, New York, NY, 1976.
S. Karlin. Admissibility for estimation with quadratic loss. Ann. Math. Statist., 29:406–436, 1958.
J. A. Koziol and M. D. Perlman. Combining independent chi-squared tests. J. Am. Stat. Assoc., 73(364):753–763, 1978.
K.-S. Lau and C. R. Rao. Integrated Cauchy functional equation and characterizations of the exponential law. Sankhyā Ser. A, 44(1):72–90, 1982.
L. Le Cam. Sufficiency and approximate sufficiency. Ann. Math. Statist., 35:1419–1455, 1964.
E. L. Lehmann. Testing Statistical Hypotheses. John Wiley & Sons Inc., New York, 1959.
E. L. Lehmann and G. Casella. Theory of Point Estimation. Springer Texts in Statistics. Springer-Verlag, New York, second edition, 1998.
E. L. Lehmann and J. P. Romano. Testing Statistical Hypotheses. Springer Texts in Statistics. Springer, New York, third edition, 2005.
L. Li and T. P. Speed. Parametric deconvolution of positive spike trains. Ann. Stat., 28(5):1279–1301, 2000.
M. Nussbaum. Asymptotic equivalence of density estimation and Gaussian white noise. Ann. Stat., 24(6):2399–2430, 1996.
C. R. Rao and D. N. Shanbhag. Recent results on characterization of probability distributions: A unified approach through extensions of Deny’s theorem. Adv. Appl. Prob., 18(3):660–678, 1986.
R. D. Reiss. A new proof of the approximate sufficiency of sparse order statistics. Stat. Prob. Lett., 4:233–235, 1986.
K. K. Roy and R. V. Ramamoorthi. Relationship between Bayes, classical and decision theoretic sufficiency. Sankhyā Ser. A, 41(1-2): 48–58, 1979.
L. J. Savage. The foundations of statistics reconsidered. In Proceedings of the 4th Berkeley Symposium on Mathematical Statistics and Probability, Vol. I, pages 575–586, Berkeley, CA, 1961. University of California Press.
K. Singh, M. Xie, and W. E. Strawderman. Combining information from independent sources through confidence distributions. Ann. Stat., 33(1):159–183, 2005.
M. Skibinsky. Adequate subfields and sufficiency. Ann. Math. Statist., 38:155–161, 1967.
T. P. Speed. A factorisation theorem for adequate statistics. Aust. J. Stat., 20(3):240–249, 1978.
T. P. Speed. Enduring values. IMS Bulletin, 39(10):10, 2010.
T. P. Speed and H. T. Kiiveri. Gaussian Markov distributions over finite graphs. Ann. Stat., 14(1):138–150, 1986.
C. Stein. Notes on the comparison of experiments. Technical report, University of Chicago, 1951.
K. Takeuchi and M. Akahira. Characterizations of prediction sufficiency (adequacy) in terms of risk functions. Ann. Stat., 3(4):1018–1024, 1975.
E. Torgersen. Comparison of Statistical Experiments, volume 36 of Encyclopedia of Mathematics and its Applications. Cambridge University Press, Cambridge, 1991.
S. Yamada and H. Morimoto. Sufficiency. In Current Issues in Statistical Inference: Essays in Honor of D. Basu, volume 17 of Lecture Notes – Monograph Series, pages 86–98. Institute of Mathematical Statistics, Hayward, CA, 1992.
W. J. Youden. Enduring values. Technometrics, 14(1):1–11, 1972.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
DasGupta, A. (2012). Sufficiency. In: Dudoit, S. (eds) Selected Works of Terry Speed. Selected Works in Probability and Statistics. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-1347-9_3
Download citation
DOI: https://doi.org/10.1007/978-1-4614-1347-9_3
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-1346-2
Online ISBN: 978-1-4614-1347-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)