Abstract
Dealing with unequal priors in both linear discriminant analysis (LDA) based on Gaussian distribution (GDA) and in Fisher’s linear discriminant analysis (FDA) is frequently used in practice but almost described in neither any textbook nor papers. This is one of the first papers exhibiting that GDA and FDA yield the same classification results for any number of classes and features. We discuss in which ways unequal priors have to enter these two methods in theory as well as algorithms. This may be of particular interest if prior knowledge is available and should be included in the discriminant rule. Various estimators that use prior probabilities in different places (e.g. prior-based weighting of the covariance matrix) are compared both in theory and by means of simulations.
Similar content being viewed by others
References
Bayes, T. (1763). An essay towards solving a problem in the doctrine of chances. http://www.stat.ucla.edu/history/letter.pdf.
Bryan, J.G. (1951). The generalized discriminant function: mathematical foundation and computational routine. Harvard Educational Review, 21, 90–95.
Fahrmeir, L., Hamerle, A., Tutz, G. (1996). Multivariate statistische Verfahren. Berlin: Walter de Gruyter.
Filzmoser, P., Joossens, K., Croux, C. (2006). Multiple group linear discriminant analysis: robustness and error rate. Physica-Verlag, Heidelberg, 521–532.
Fisher, R.A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, 179–188.
Hastie, T., Tibshirani, R., Friedman, J. (2009). The elements of statistical learning, 2nd edn. New York: Springer.
Herbrandt, S. (2012). Diskriminanzanalyseverfahren und ihre Implementierung in R: Analyse der numerischen Stabilität. AV Akademikerverlag.
Huberty, C.J. (1994). Applied discriminant analysis. New York: Wiley.
IBM Corp. (2015). IBM SPSS statistics for Windows. IBM Corp, Armonk, New York.
IBM Corp. (2016). IBM SPSS statistics 24 algorithms. IBM Corp.
Johnson, R.A., & Wichern, D.W. (2007). Applied multivariate statistical analysis, 6th edn. New Jersey: Pearson Education Inc.
Krzanowski, W.J., & Marriott, F.H.C. (1995). Multivariante analysis, Part 2: Classification, covariance structures and repeated measurements. London: Arnold.
Leech, N.L., Barrett, K.C., Morgan, G.A. (2005). SPSS for intermediate statistics: use and interpretation, 2nd edn. New Jersey: Lawrence Erlbaum Associates.
Mahalanobis, P.C. (1936). On the generalised distance in statistics. Proceedings of the National Institute of Science of India, 2(1), 49–55.
McLachlan, G.J. (1992). Discriminant analysis and statistical pattern recognition. New York: Wiley.
Mukhopadhyay, P. (2009). Multivariate statistical analysis. Singapore: World Scientific.
R Core Team. (2016). R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, https://www.R-project.org/.
Rao, C.R. (1948). The utilization of multiple measurements in problems of biological classification. Journal of the Royal Statistical Society, 10(2), 159–203.
Rencher, A.C. (1995). Methods of multivariate analysis. New York: Wiley.
SAS Institute Inc. (2018). SAS/STATⓇ 15.1 Users’s Guide SAS Institute Inc., Cary, North Carolina.
Venables, W.N., & Ripley, B.D. (2002). Modern applied statistics with S, 4th edn. New York: Springer. http://www.stats.ox.ac.uk/pub/MASS4.
Wald, A. (1944). On a statistical problem arising in the classification of an individual into one of two groups. Annals of Mathematical Statistics, 15(2), 145–162.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Comparison for two groups and two features
Appendix: Comparison for two groups and two features
Assume G = p = 2 groups and features. Hence, in FDA, the number of discriminant components is r ≤ min{p,G − 1} = min{2, 1} = 1 (see Section 2.2). We obtain at most one discriminant component \(\alpha _{1} = (\alpha _{11}, \alpha _{12})^{\prime } \in \mathbb {R}^{2}\). Hereinafter, this is derived.
First, we take a closer look at the weighted covariance matrix between the groups \(B_{\mu _{w}}\). In case of two classes and features, this can be rewritten as:
As previously mentioned (see Section 2.2), the optimisation problem in Eq. 11 is solved by the eigenvector v1 with the corresponding highest eigenvalue λ1 of the matrix \({\Sigma }^{-\frac {1}{2}}B_{\mu _{w}} {\Sigma }^{-\frac {1}{2}} = \pi _{1} \pi _{2} {\Sigma }^{-\frac {1}{2}} (\mu _{2} - \mu _{1}) (\mu _{2} - \mu _{1})^{\prime } {\Sigma }^{-\frac {1}{2}}\). Therefore, the biggest eigenvalue λ1 and the corresponding normalised eigenvector v1 are determined.
The number of eigenvalues unequal zero of a matrix is equal to the rank of this matrix. So, we have:
because rk(π1π2(μ2 − μ1)) = 1 as well as rk((μ2 − μ1)′) = 1 . The rank of \({\Sigma }^{-\frac {1}{2}}B_{\mu _{w}} {\Sigma }^{-\frac {1}{2}} \) is 1 or 0. Whereas the zero matrix is the only matrix which has rank 0, it applies \(\text {rk}\left ({\Sigma }^{-\frac {1}{2}} B_{\mu _{w}} {\Sigma }^{-\frac {1}{2}} \right ) = 0\) if and only if \((\mu _{2} - \mu _{1}) (\mu _{2} - \mu _{1})^{\prime } = 0 \in \mathbb {R}^{2 \times 2}\) thus \(\mu _{2} - \mu _{1} = 0 \in \mathbb {R}^{2}\). This contradicts the assumption of different expected values of the groups (see Section 2).
The trace of a matrix is equal to the sum of its eigenvalues. So, we reveal the eigenvalue
Since the priors are non-negative and the covariance matrix Σ is positive semidefinite λ1 > 0 is the biggest eigenvalue of \({\Sigma }^{-\frac {1}{2}} B_{\mu _{w}} {\Sigma }^{-\frac {1}{2}}\). Therefore, the corresponding eigenvector is \(v_{1} = \frac {{\Sigma }^{-\frac {1}{2}} (\mu _{2} - \mu _{1})}{((\mu _{2} - \mu _{1})^{\prime } {\Sigma }^{-1} (\mu _{2} - \mu _{1}))^{\frac {1}{2}}}\). It satisfies the two conditions
and
Hence, the discriminant component which is the eigenvector of the matrix \({\Sigma }^{-1}B_{\mu _{w}}\) is constituted by:
The discriminant component α1 only depends on the expected values of the groups μ1, μ2 and the covariance matrix Σ or rather its inverse. Notice, α1 is independent from the covariance matrix between the groups \(B_{\mu _{w}}\).
In order to determine the hyperplane of Fisher’s discriminant rule, the scores of the two groups D1 and D2 (see Eq. 17, Section 2.2) are equated:
Before we equate the canonical discriminant scores of group 1 and 2 those will rearranged.
The term \(-\frac {1}{2}x^{\prime } {\Sigma }^{-1} x\) is the same for all groups g = 1,...,G and can be neglected. Thus, we obtain
with bg := (Σ− 1μg)′ and \(c_{g} := - \frac {1}{2} \mu _{g}^{\prime } {\Sigma }^{-1}\mu _{g} + \log (\pi _{g})\). The hyperplane of the discriminant rule of GDA results by:
In case of two groups and features, GDA and FDA have the same hyperplane:
Thus, GDA and FDA yield the same results even for unequal priors π1 and π2.
Rights and permissions
About this article
Cite this article
van Meegen, C., Schnackenberg, S. & Ligges, U. Unequal Priors in Linear Discriminant Analysis. J Classif 37, 598–615 (2020). https://doi.org/10.1007/s00357-019-09336-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00357-019-09336-2