Skip to main content
Log in

Unequal Priors in Linear Discriminant Analysis

  • Published:
Journal of Classification Aims and scope Submit manuscript

Abstract

Dealing with unequal priors in both linear discriminant analysis (LDA) based on Gaussian distribution (GDA) and in Fisher’s linear discriminant analysis (FDA) is frequently used in practice but almost described in neither any textbook nor papers. This is one of the first papers exhibiting that GDA and FDA yield the same classification results for any number of classes and features. We discuss in which ways unequal priors have to enter these two methods in theory as well as algorithms. This may be of particular interest if prior knowledge is available and should be included in the discriminant rule. Various estimators that use prior probabilities in different places (e.g. prior-based weighting of the covariance matrix) are compared both in theory and by means of simulations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Similar content being viewed by others

References

  • Bayes, T. (1763). An essay towards solving a problem in the doctrine of chances. http://www.stat.ucla.edu/history/letter.pdf.

  • Bryan, J.G. (1951). The generalized discriminant function: mathematical foundation and computational routine. Harvard Educational Review, 21, 90–95.

    Google Scholar 

  • Fahrmeir, L., Hamerle, A., Tutz, G. (1996). Multivariate statistische Verfahren. Berlin: Walter de Gruyter.

  • Filzmoser, P., Joossens, K., Croux, C. (2006). Multiple group linear discriminant analysis: robustness and error rate. Physica-Verlag, Heidelberg, 521–532.

  • Fisher, R.A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, 179–188.

    Article  Google Scholar 

  • Hastie, T., Tibshirani, R., Friedman, J. (2009). The elements of statistical learning, 2nd edn. New York: Springer.

    Book  Google Scholar 

  • Herbrandt, S. (2012). Diskriminanzanalyseverfahren und ihre Implementierung in R: Analyse der numerischen Stabilität. AV Akademikerverlag.

  • Huberty, C.J. (1994). Applied discriminant analysis. New York: Wiley.

    MATH  Google Scholar 

  • IBM Corp. (2015). IBM SPSS statistics for Windows. IBM Corp, Armonk, New York.

  • IBM Corp. (2016). IBM SPSS statistics 24 algorithms. IBM Corp.

  • Johnson, R.A., & Wichern, D.W. (2007). Applied multivariate statistical analysis, 6th edn. New Jersey: Pearson Education Inc.

    MATH  Google Scholar 

  • Krzanowski, W.J., & Marriott, F.H.C. (1995). Multivariante analysis, Part 2: Classification, covariance structures and repeated measurements. London: Arnold.

  • Leech, N.L., Barrett, K.C., Morgan, G.A. (2005). SPSS for intermediate statistics: use and interpretation, 2nd edn. New Jersey: Lawrence Erlbaum Associates.

    Google Scholar 

  • Mahalanobis, P.C. (1936). On the generalised distance in statistics. Proceedings of the National Institute of Science of India, 2(1), 49–55.

    MathSciNet  MATH  Google Scholar 

  • McLachlan, G.J. (1992). Discriminant analysis and statistical pattern recognition. New York: Wiley.

    Book  Google Scholar 

  • Mukhopadhyay, P. (2009). Multivariate statistical analysis. Singapore: World Scientific.

    MATH  Google Scholar 

  • R Core Team. (2016). R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, https://www.R-project.org/.

  • Rao, C.R. (1948). The utilization of multiple measurements in problems of biological classification. Journal of the Royal Statistical Society, 10(2), 159–203.

    MathSciNet  MATH  Google Scholar 

  • Rencher, A.C. (1995). Methods of multivariate analysis. New York: Wiley.

    MATH  Google Scholar 

  • SAS Institute Inc. (2018). SAS/STAT 15.1 Users’s Guide SAS Institute Inc., Cary, North Carolina.

  • Venables, W.N., & Ripley, B.D. (2002). Modern applied statistics with S, 4th edn. New York: Springer. http://www.stats.ox.ac.uk/pub/MASS4.

    Book  Google Scholar 

  • Wald, A. (1944). On a statistical problem arising in the classification of an individual into one of two groups. Annals of Mathematical Statistics, 15(2), 145–162.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Carmen van Meegen.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Comparison for two groups and two features

Appendix: Comparison for two groups and two features

Assume G = p = 2 groups and features. Hence, in FDA, the number of discriminant components is r ≤ min{p,G − 1} = min{2, 1} = 1 (see Section 2.2). We obtain at most one discriminant component \(\alpha _{1} = (\alpha _{11}, \alpha _{12})^{\prime } \in \mathbb {R}^{2}\). Hereinafter, this is derived.

First, we take a closer look at the weighted covariance matrix between the groups \(B_{\mu _{w}}\). In case of two classes and features, this can be rewritten as:

$$ \begin{array}{@{}rcl@{}} B_{\mu_{w}} &=& \sum\limits_{g = 1}^{2} \pi_{g} (\mu_{g} - \mu_{w}) (\mu_{g} - \mu_{w})^{\prime}\\ & = & \pi_{1} (\mu_{1} -\mu_{w}) (\mu_{1} -\mu_{w})^{\prime} + \pi_{2} (\mu_{2} - \mu_{w}) (\mu_{2} - \mu_{w})^{\prime} \\ & =& \pi_{1}(\mu_{1}\mu_{1}^{\prime} - \mu_{1}\mu_{w}^{\prime} - \mu_{w} \mu_{1}^{\prime} + \mu_{w}\mu_{w}^{\prime}) + \pi_{2} (\mu_{2} \mu_{2}^{\prime} - \mu_{2}\mu_{w}^{\prime} -\mu_{w}\mu_{2}^{\prime} +\mu_{w}\mu_{w}^{\prime})\\ & =& (\pi_{1} + \pi_{2}) \mu_{w}\mu_{w}^{\prime} - (\pi_{1}\mu_{1} +\pi_{2} \mu_{2})\mu_{w}^{\prime} - \mu_{w}(\pi_{1} \mu_{1} + \pi_{2} \mu_{2})^{\prime} + \pi_{1}\mu_{1}\mu_{1}^{\prime} + \pi_{2} \mu_{2} \mu_{2}^{\prime} \\ & =& \mu_{w}\mu_{w}^{\prime} - \mu_{w}\mu_{w}^{\prime} - (\pi_{1} \mu_{1} + \pi_{2}\mu_{2})(\pi_{1} \mu_{1} + \pi_{2} \mu_{2})^{\prime} + \pi_{1} \mu_{1}\mu_{1}^{\prime} + \pi_{2} \mu_{2} \mu_{2}^{\prime} \\ & =& \pi_{1} \mu_{1}\mu_{1}^{\prime} + \pi_{2} \mu_{2} \mu_{2}^{\prime} - {\pi_{1}^{2}} \mu_{1} \mu_{1}^{\prime} - \pi_{1} \pi_{2} \mu_{1} \mu_{2}^{\prime} - \pi_{1} \pi_{2} \mu_{2} \mu_{1}^{\prime} - {\pi_{2}^{2}} \mu_{2} \mu_{2}^{\prime}\\ & =& (1 - \pi_{1}) \pi_{1} \mu_{1}\mu_{1}^{\prime} + (1 - \pi_{2}) \pi_{2} \mu_{2} \mu_{2}^{\prime} - \pi_{1} \pi_{2} \mu_{1} \mu_{2}^{\prime} - \pi_{1} \pi_{2} \mu_{2} \mu_{1}^{\prime} \\ & =& \pi_{1} \pi_{2} \mu_{1}\mu_{1}^{\prime}- \pi_{1} \pi_{2} \mu_{1} \mu_{2}^{\prime} - \pi_{1} \pi_{2} \mu_{2} \mu_{1}^{\prime} + \pi_{1} \pi_{2} \mu_{2} \mu_{2}^{\prime} = \pi_{1} \pi_{2} (\mu_{1} - \mu_{2}) (\mu_{1} - \mu_{2})^{\prime} \\ & =& \pi_{1} \pi_{2} (\mu_{2} - \mu_{1}) (\mu_{2} - \mu_{1})^{\prime}. \end{array} $$

As previously mentioned (see Section 2.2), the optimisation problem in Eq. 11 is solved by the eigenvector v1 with the corresponding highest eigenvalue λ1 of the matrix \({\Sigma }^{-\frac {1}{2}}B_{\mu _{w}} {\Sigma }^{-\frac {1}{2}} = \pi _{1} \pi _{2} {\Sigma }^{-\frac {1}{2}} (\mu _{2} - \mu _{1}) (\mu _{2} - \mu _{1})^{\prime } {\Sigma }^{-\frac {1}{2}}\). Therefore, the biggest eigenvalue λ1 and the corresponding normalised eigenvector v1 are determined.

The number of eigenvalues unequal zero of a matrix is equal to the rank of this matrix. So, we have:

$$ \begin{array}{@{}rcl@{}} \text{rk}\left( {\Sigma}^{-\frac{1}{2}} B_{\mu_{w}} {\Sigma}^{-\frac{1}{2}} \right) & = &\text{rk}\left( \pi_{1} \pi_{2} {\Sigma}^{-\frac{1}{2}} (\mu_{2} - \mu_{1}) (\mu_{2} - \mu_{1})^{\prime} {\Sigma}^{-\frac{1}{2}}\right) \\ & \leq & \min \left\{ \text{rk}({\Sigma}^{-\frac{1}{2}}), \text{rk}(\pi_{1} \pi_{2} (\mu_{2} - \mu_{1})), \text{rk}((\mu_{2} - \mu_{1})^{\prime}), \text{rk}({\Sigma}^{-\frac{1}{2}}) \right\} = 1 \end{array} $$

because rk(π1π2(μ2μ1)) = 1 as well as rk((μ2μ1)) = 1 . The rank of \({\Sigma }^{-\frac {1}{2}}B_{\mu _{w}} {\Sigma }^{-\frac {1}{2}} \) is 1 or 0. Whereas the zero matrix is the only matrix which has rank 0, it applies \(\text {rk}\left ({\Sigma }^{-\frac {1}{2}} B_{\mu _{w}} {\Sigma }^{-\frac {1}{2}} \right ) = 0\) if and only if \((\mu _{2} - \mu _{1}) (\mu _{2} - \mu _{1})^{\prime } = 0 \in \mathbb {R}^{2 \times 2}\) thus \(\mu _{2} - \mu _{1} = 0 \in \mathbb {R}^{2}\). This contradicts the assumption of different expected values of the groups (see Section 2).

The trace of a matrix is equal to the sum of its eigenvalues. So, we reveal the eigenvalue

$$ \begin{array}{@{}rcl@{}} \lambda_{1} &=& \text{tr}\left( {\Sigma}^{-\frac{1}{2}} B_{\mu_{w}} {\Sigma}^{-\frac{1}{2}} \right) = \text{tr}\left( {\Sigma}^{-\frac{1}{2}} \pi_{1} \pi_{2}(\mu_{2} - \mu_{1}) (\mu_{2} - \mu_{1})^{\prime} {\Sigma}^{-\frac{1}{2}} \right) \\ &=& \text{tr}\left( \pi_{1} \pi_{2}(\mu_{2} - \mu_{1})^{\prime} {\Sigma}^{-1} (\mu_{2} - \mu_{1})\right) = \pi_{1} \pi_{2}(\mu_{2} - \mu_{1})^{\prime} {\Sigma}^{-1} (\mu_{2} - \mu_{1}). \end{array} $$

Since the priors are non-negative and the covariance matrix Σ is positive semidefinite λ1 > 0 is the biggest eigenvalue of \({\Sigma }^{-\frac {1}{2}} B_{\mu _{w}} {\Sigma }^{-\frac {1}{2}}\). Therefore, the corresponding eigenvector is \(v_{1} = \frac {{\Sigma }^{-\frac {1}{2}} (\mu _{2} - \mu _{1})}{((\mu _{2} - \mu _{1})^{\prime } {\Sigma }^{-1} (\mu _{2} - \mu _{1}))^{\frac {1}{2}}}\). It satisfies the two conditions

$$ \begin{array}{@{}rcl@{}} {\Sigma}^{-\frac{1}{2}} B_{\mu_{w}} {\Sigma}^{-\frac{1}{2}}v_{1} &=& {\Sigma}^{-\frac{1}{2}} \pi_{1}\pi_{2}(\mu_{2} - \mu_{1})(\mu_{2}-\mu_{1})^{\prime} {\Sigma}^{-\frac{1}{2}} \frac{{\Sigma}^{-\frac{1}{2}} (\mu_{2} - \mu_{1})}{((\mu_{2} - \mu_{1})^{\prime} {\Sigma}^{-1} (\mu_{2} - \mu_{1}))^{\frac{1}{2}}} \\ &=& \frac{{\Sigma}^{-\frac{1}{2}}(\mu_{2} - \mu_{1}) \pi_{1}\pi_{2} (\mu_{2}-\mu_{1})^{\prime} {\Sigma}^{-1} (\mu_{2} - \mu_{1})}{((\mu_{2} - \mu_{1})^{\prime} {\Sigma}^{-1} (\mu_{2} - \mu_{1}))^{\frac{1}{2}}} \\ &=& \frac{{\Sigma}^{-\frac{1}{2}}(\mu_{2} - \mu_{1}) \lambda_{1}}{((\mu_{2} - \mu_{1})^{\prime} {\Sigma}^{-1} (\mu_{2} - \mu_{1}))^{\frac{1}{2}}} = \lambda_{1} v_{1} \end{array} $$

and

$$ \begin{array}{@{}rcl@{}} v_{1}^{\prime}v_{1} &=& \left( \frac{{\Sigma}^{-\frac{1}{2}} (\mu_{2} - \mu_{1})}{((\mu_{2} - \mu_{1})^{\prime} {\Sigma}^{-1} (\mu_{2} - \mu_{1}))^{\frac{1}{2}}} \right)^{\prime} \frac{{\Sigma}^{-\frac{1}{2}} (\mu_{2} - \mu_{1})}{((\mu_{2} - \mu_{1})^{\prime} {\Sigma}^{-1} (\mu_{2} - \mu_{1}))^{\frac{1}{2}}} \\ &=& \frac{(\mu_{2} - \mu_{1})^{\prime} {\Sigma}^{-1}(\mu_{2} - \mu_{1})}{(\mu_{2} - \mu_{1})^{\prime} {\Sigma}^{-1} (\mu_{2} - \mu_{1})} =1. \end{array} $$

Hence, the discriminant component which is the eigenvector of the matrix \({\Sigma }^{-1}B_{\mu _{w}}\) is constituted by:

$$ \begin{array}{@{}rcl@{}} \alpha_{1} = {\Sigma}^{-\frac{1}{2}} v_{1} = {\Sigma}^{-\frac{1}{2}}\frac{{\Sigma}^{-\frac{1}{2}} (\mu_{2} - \mu_{1})}{((\mu_{1} - \mu_{2})^{\prime} {\Sigma}^{-1} (\mu_{2} - \mu_{1}))^{\frac{1}{2}} } = \frac{{\Sigma}^{-1} (\mu_{2} - \mu_{1})}{((\mu_{1} - \mu_{2})^{\prime} {\Sigma}^{-1} (\mu_{2} - \mu_{1}))^{\frac{1}{2}} }. \end{array} $$

The discriminant component α1 only depends on the expected values of the groups μ1, μ2 and the covariance matrix Σ or rather its inverse. Notice, α1 is independent from the covariance matrix between the groups \(B_{\mu _{w}}\).

In order to determine the hyperplane of Fisher’s discriminant rule, the scores of the two groups D1 and D2 (see Eq. 17, Section 2.2) are equated:

$$ \begin{array}{@{}rcl@{}} &\qquad& D_{1}(x) \stackrel{!}{=} D_{2} (x) \\ &\Leftrightarrow& (\alpha_{1}^{\prime}(x - \mu_{1}))^{2} - 2\log(\pi_{1}) \stackrel{!}{=} (\alpha_{1}^{\prime}(x - \mu_{2}))^{2} - 2\log(\pi_{2}) \\ &\Leftrightarrow& -2(\log(\pi_{1})-\log(\pi_{2})) \stackrel{!}{=} (\alpha_{1}^{\prime}(x - \mu_{2}))^{2} - (\alpha_{1}^{\prime}(x - \mu_{1}))^{2} \\ &\Leftrightarrow& -2(\log(\pi_{1})-\log(\pi_{2})) \stackrel{!}{=} (\alpha_{1}^{\prime}(x - \mu_{2}) + \alpha_{1}^{\prime}(x - \mu_{1}))(\alpha_{1}^{\prime}(x - \mu_{2}) -\alpha_{1}^{\prime}(x - \mu_{1})) \\ &\Leftrightarrow& -2(\log(\pi_{1})-\log(\pi_{2})) \stackrel{!}{=} \alpha_{1}^{\prime}(x - \mu_{2} + x - \mu_{1})\alpha_{1}^{\prime}(x - \mu_{2} - x + \mu_{1}) \\ &\Leftrightarrow& -2(\log(\pi_{1})-\log(\pi_{2})) \stackrel{!}{=} \alpha_{1}^{\prime}(2x - \mu_{1} - \mu_{2})\alpha_{1}^{\prime}(\mu_{1} -\mu_ 2) \\ &\Leftrightarrow& -2(\log(\pi_{1})-\log(\pi_{2})) \stackrel{!}{=} -2\alpha_{1}^{\prime}\left( \frac{\mu_{1} + \mu_{2}}{2} - x\right)\alpha_{1}^{\prime}(\mu_{1} - \mu_{2}) \\ &\Leftrightarrow& \log(\pi_{1})-\log(\pi_{2}) \stackrel{!}{=} \alpha_{1}^{\prime}(\mu - x)\alpha_{1}^{\prime}(\mu_{1} - \mu_{2}) \\ &\Leftrightarrow& \frac{\log(\pi_{1})-\log(\pi_{2})}{\alpha_{1}^{\prime}(\mu_{1} - \mu_{2})} \stackrel{!}{=} \alpha_{1}^{\prime}\mu - \alpha_{1}^{\prime}x \\ &\Leftrightarrow& -\alpha_{1}^{\prime}\mu+ \frac{\log(\pi_{1})-\log(\pi_{2})}{\alpha_{1}^{\prime}(\mu_{1} - \mu_{2})} \stackrel{!}{=} -\alpha_{1}^{\prime}x = -\alpha_{11}x_{1} - \alpha_{12}x_{2} \\ &\Leftrightarrow& \alpha_{11}x_{1} + \alpha_{12}x_{2} \stackrel{!}{=} \alpha_{1}^{\prime}\mu -\frac{\log(\pi_{1})-\log(\pi_{2})}{\alpha_{1}^{\prime}(\mu_{1} - \mu_{2})} \\ &\Leftrightarrow& x_{2} \stackrel{!}{=} - \frac{\alpha_{11}}{\alpha_{12}} x_{1} + \frac{\alpha_{1}^{\prime}\mu}{\alpha_{12}} + \frac{\log(\pi_{1}) - \log(\pi_{2})}{\alpha_{12} (\alpha_{1}^{\prime}(\mu_{2} - \mu_{1}))} \\ &\Leftrightarrow& x_{2} \stackrel{!}{=} -\frac{\alpha_{11}}{\alpha_{12}}x_{1} + \frac{\alpha_{1}^{\prime}\mu}{\alpha_{12}} +\frac{\log(\pi_{1}) - \log(\pi_{2})}{\alpha_{12} ((\mu_{2} - \mu_{1})^{\prime}{\Sigma}^{-1}(\mu_{2} - \mu_{1}))^{\frac{1}{2}}}. \end{array} $$

Before we equate the canonical discriminant scores of group 1 and 2 those will rearranged.

$$ \begin{array}{@{}rcl@{}} L_{g}(x) &=& -\frac{1}{2} (x - \mu_{g})^{\prime}{\Sigma}^{-1}(x - \mu_{g}) + \log(\pi_{g}) \\ &=& -\frac{1}{2}((x - \mu_{g})^{\prime}{\Sigma}^{-1} x - (x - \mu_{g})^{\prime}{\Sigma}^{-1}\mu_{g}) + \log(\pi_{g}) \\ &=&-\frac{1}{2}(x^{\prime} {\Sigma}^{-1} x - \mu_{g}^{\prime} {\Sigma}^{-1} x - x^{\prime}{\Sigma}^{-1}\mu_{g} + \mu_{g}^{\prime} {\Sigma}^{-1}\mu_{g}) + \log(\pi_{g}) \\ &=& -\frac{1}{2}(x^{\prime} {\Sigma}^{-1} x - 2 \mu_{g}^{\prime}{\Sigma}^{-1} x + \mu_{g}^{\prime} {\Sigma}^{-1}\mu_{g}) + \log(\pi_{g}) \\ &=& -\frac{1}{2}x^{\prime} {\Sigma}^{-1} x + \mu_{g}^{\prime}{\Sigma}^{-1} x - \frac{1}{2} \mu_{g}^{\prime} {\Sigma}^{-1}\mu_{g} + \log(\pi_{g}) \end{array} $$

The term \(-\frac {1}{2}x^{\prime } {\Sigma }^{-1} x\) is the same for all groups g = 1,...,G and can be neglected. Thus, we obtain

$$ \begin{array}{@{}rcl@{}} \tilde{L}_{g}(x) &=& \mu_{g}^{\prime}{\Sigma}^{-1} x - \frac{1}{2} \mu_{g}^{\prime} {\Sigma}^{-1}\mu_{g} + \log(\pi_{g})\\ & = & ({\Sigma}^{-1}\mu_{g})^{\prime}x - \frac{1}{2} \mu_{g}^{\prime} {\Sigma}^{-1}\mu_{g} + \log(\pi_{g}) =: b_{g}^{\prime}x + c_{g} \end{array} $$

with bg := (Σ− 1μg) and \(c_{g} := - \frac {1}{2} \mu _{g}^{\prime } {\Sigma }^{-1}\mu _{g} + \log (\pi _{g})\). The hyperplane of the discriminant rule of GDA results by:

$$ \begin{array}{@{}rcl@{}} &\quad& \tilde{L}_{1}(x) = b_{1}^{\prime}x + c_{1} \stackrel{!}{=} b_{2}^{\prime}x + c_{2} = \tilde{L}_{2}(x) \\ &\Leftrightarrow~& (b_{2} - b_{1})^{\prime}x \stackrel{!}{=} c_{1} - c_{2} \\ &\Leftrightarrow& ({\Sigma}^{-1}\mu_{2} - {\Sigma}^{-1}\mu_{1})^{\prime}x \stackrel{!}{=} -\frac{1}{2}\mu_{1}^{\prime}{\Sigma}^{-1}\mu_{1} + \log(\pi_{1}) + \frac{1}{2}\mu_{2}^{\prime}{\Sigma}^{-1}\mu_{2} - \log(\pi_{2}) \\ &\Leftrightarrow& {\Sigma}^{-1}(\mu_{2} - \mu_{1})^{\prime}x \stackrel{!}{=} -\frac{1}{2}(\mu_{1}^{\prime}{\Sigma}^{-1}\mu_{1} - \mu_{2}^{\prime}{\Sigma}^{-1}\mu_{2}) + \log(\pi_{1}) - \log(\pi_{2}) \\ &\Leftrightarrow& {\Sigma}^{-1}(\mu_{2} - \mu_{1})^{\prime}x \stackrel{!}{=} -\frac{1}{2}(\mu_{1}^{\prime}{\Sigma}^{-1}\mu_{1} +\mu_{1}^{\prime}{\Sigma}^{-1}\mu_{2} -\mu_{2}^{\prime}{\Sigma}^{-1}\mu_{1} - \mu_{2}^{\prime}{\Sigma}^{-1}\mu_{2})\\ &&+ \log \left( \frac{\pi_{1}}{\pi_{2}} \right) \\ &\Leftrightarrow& {\Sigma}^{-1}(\mu_{2} - \mu_{1})^{\prime}x \stackrel{!}{=} -\frac{1}{2}(\mu_{1}^{\prime}{\Sigma}^{-1}(\mu_{1} + \mu_{2}) -\mu_{2}^{\prime}{\Sigma}^{-1}(\mu_{1} + \mu_{2}))\\ &&+ \log(\pi_{1}) - \log(\pi_{2}) \\ &\Leftrightarrow& {\Sigma}^{-1}(\mu_{2} - \mu_{1})^{\prime}x \stackrel{!}{=} -\frac{1}{2}(\mu_{1}^{\prime} -\mu_{2}^{\prime}){\Sigma}^{-1}(\mu_{1} + \mu_{2}) + \log(\pi_{1}) - \log(\pi_{2}) \\ &\Leftrightarrow& {\Sigma}^{-1}(\mu_{2} - \mu_{1})^{\prime}x \stackrel{!}{=} (\mu_{2} - \mu_{1})^{\prime} {\Sigma}^{-1} \left( \frac{\mu_{1} + \mu_{2}}{2} \right)+ \log(\pi_{1}) - \log(\pi_{2}) \\ &\Leftrightarrow& \alpha_{1}^{\prime}x \stackrel{!}{=} \alpha_{1}^{\prime}\mu+\frac{\log(\pi_{1}) - \log(\pi_{2})}{((\mu_{2} - \mu_{1})^{\prime}{\Sigma}^{-1}(\mu_{2} - \mu_{1}))^{\frac{1}{2}}} \\ &\Leftrightarrow& x_{2} \stackrel{!}{=} -\frac{\alpha_{11}}{\alpha_{12}}x_{1} + \frac{\alpha_{1}^{\prime}\mu}{\alpha_{12}} +\frac{\log(\pi_{1}) - \log(\pi_{2})}{\alpha_{12} ((\mu_{2} - \mu_{1})^{\prime}{\Sigma}^{-1}(\mu_{2} - \mu_{1}))^{\frac{1}{2}}}. \end{array} $$

In case of two groups and features, GDA and FDA have the same hyperplane:

$$ h(x_{1}) := -\frac{\alpha_{11}}{\alpha_{12}}x_{1} + \frac{\alpha_{1}^{\prime}\mu}{\alpha_{12}} +\frac{\log(\pi_{1}) - \log(\pi_{2})}{\alpha_{12} ((\mu_{2} - \mu_{1})^{\prime}{\Sigma}^{-1}(\mu_{2} - \mu_{1}))^{\frac{1}{2}}}. $$

Thus, GDA and FDA yield the same results even for unequal priors π1 and π2.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

van Meegen, C., Schnackenberg, S. & Ligges, U. Unequal Priors in Linear Discriminant Analysis. J Classif 37, 598–615 (2020). https://doi.org/10.1007/s00357-019-09336-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00357-019-09336-2

Keywords

Navigation