Face recognition based on texture information and geodesic distance approximations between multivariate normal distributions

John Soldera; C T J Dodson; Jacob Scharcanski

doi:10.1088/1361-6501/aade18

1. Introduction

The instrumentation and measurement fields are associated to measure, detect, record and monitor a certain phenomenon (i.e. a measurand) in applications that usually involve uncertainty and/or probability distributions. Some measurands are invisible like the electromagnetic field, and others are visible like the light reflection on a surface.

In order to measure signals in the visible light spectrum, imaging sensors (i.e. cameras) are commonly used to record images or videos which tend to present higher resolutions due to the technological advancement. In such recorded data, colors are usually represented as basic color intensity combinations (i.e. red, green and blue), leading to an inherent high-dimensional multivariate feature representation.

Moreover, the image processing and computer vision fields may be used to help to extract reliable features for several instrumentation-related applications which use texture information, such as face recognition [1–4], brain image recognition [5, 6], texture recognition of material images [7], food image recognition [8, 9], character recognition [10, 11], yawning detection [12], etc. In this work, we are mainly interested in face recognition by using efficient texture dissimilarity metrics based on geodesic distance approximations between probability distributions.

Face recognition is an instrumentation-related application which uses computer vision and pattern recognition techniques to identify individuals. Moreover, there are several emerging applications based in face recognition in augmented reality, gaming, security, and so on [3, 4, 13, 14]. Face recognition is also studied by neuroscientists and psychologists to provide useful insights in how the human brain works [15]. In such applications, features extracted from images or videos present high dimensionality and the sample availability for machine learning is scarce, potentially leading to the known curse of dimensionality [16].

In order to obtain compact face features while preserving the global data structure, the Eigenfaces method [17] uses principal component analysis (PCA) [18] to create a linear orthogonal projection into a lower dimensional space, where new face samples are recognized. The Laplacianfaces method [19] provides a more efficient approach that tries to preserve the local data structure by creating a locality graph, from which a linear lower dimensional projection is obtained. Furthermore, the orthogonal locality preserving projections method (OLPP) [20] extends the Laplacianfaces method by ensuring that the final linear projection is orthogonal. Similarly, the orthogonal neighborhood preserving projections method (ONPP) [21] tries to preserve the local and the global face data geometry by learning a neighborhood graph, which leads to the determination of the final orthogonal linear transformation.

However, preserving the face data structure in the lower dimensional space not always leads to a good face class separation. On the other hand, if the class labels of each training sample are previously known, it is possible to preserve better the class structure by using supervised dimensionality reduction approaches like linear discriminant analysis (LDA) [22], which determines a linear projection (Fisherfaces) that moves samples from different classes away while approximates samples in the same classes in the lower dimensional space.

There are methods based on LDA, like the multi-view discriminant analysis (MvDA) [23] method, that create projections of input face features using different perspectives, and combine them to obtain the final linear transformation. However, the typical LDA-based supervised approach may present inaccuracies for non-linear separation problems. Therefore, Kernel functions were proposed to map feature data to a higher dimensional separable space [24–27]. For instance, the spectral regression Kernel discriminant analysis method (SRKDA) provides an efficient computation of the kernel LDA using large datasets [26], obtaining as result a linear projection that tends to preserve the original non-linear class structure.

Other methods try to learn non-linear transformations in order to preserve better the non-linear data structure of high dimensional data, and that is the case of the Isomap method [28], which creates a neighborhood graph in which the data manifold is approximated by calculating approximations for geodesic distances by determining the shortest path between samples. The final low dimensional representation of the original samples is obtained by the typical multidimensional scaling algorithm (MDS) [29].

The non-linear dimensionality reduction method called locally linear embedding (LLE) [30] tries to model data samples as linear combinations of its neighboring samples and uses this information to determine the lower dimensional representation of the original samples, preserving the local data geometry existing in the original high dimensional space.

Although there are available in the literature several techniques to reduce efficiently the face data dimensionality and preserve the underlying face class structure, there are common issues which affect face representation such as variations of illumination, changes in the head pose, change of appearance, and others, demanding a high availability of distinct training samples in order properly to represent the face variability for machine learning as in the appearance-based approach methods [17, 19–21]. Since these methods concatenate all image pixels to create representative feature vectors, they need to downsample grayscale face images to reduce the computational complexity. The obtained face feature representation still presents high dimensionality and suffers from the aforementioned issues.

On the other hand, it is possible to extract sparse face features directly from high-resolution color face images by using face representations based in landmarks associated to key points on face images at important and discriminative locations, leading to an enhanced face representation [3, 31]. Landmarks can be automatically determined by using approaches like active shape models (ASMs) [32] and several methods have been proposed to extract sparse features from face images using trained ASMs [33, 34].

However, it is possible to rank landmarks in color high-resolution face images according to their discrimination capability by using mutual information, as in the enhanced ASM method [31]. In this method, face features are represented by Gaussian mixtures and new face samples are recognized by maximizing the class likelihood. However, this classification scheme can be adversely affected by outliers and noisy data.

Another method that extracts features from vicinities of landmark locations is the customized OLPP (COLPP) [3], in which landmark topologies are used to mark important and discriminative information on face images. The pixels in the landmark vicinities are concatenated to form high dimensional feature vectors which are mapped into a lower dimensional space where the class structure of the original features is preserved. In this discriminative linear space, classification occurs by employing a linear soft margin support vector machine (SVM) [35].

Most aforementioned methods usually extract high-dimensional feature vectors from whole face images [19–21] or from landmark vicinities [3, 31], and may be subject to undersampling because usually few training samples are available (i.e. face images). On other hand, face features represented as probability distributions demand fewer face samples for training and can be learnt from the texture in the landmark vicinities in high-resolution color face images, leading to more accurate and lower dimensional features representations [31]. In this case, relevant features are extracted from key points on the face images (e.g. the eyes, eyebrows and nose). As a consequence, texture dissimilarities can be obtained as geodesic distances between probability distributions [4, 13].

In information geometry [36], the geodesic distance is defined as the length of the shortest path between probability distributions lying on a Riemannian manifold induced by the Fisher information metric applied to a parametric family of probability distributions [36, 37]. As result, the geodesic distance is a natural dissimilarity metric between probability distributions, and is used to discriminate texture in several image-based-applications, e.g. face recognition [4]. Moreover, the normal distribution is widely used in several applications, however, there is no known closed-form solution for the geodesic distance between general multivariate normal distributions. Therefore, we propose two efficient approximations to be used as texture dissimilarity measures in the context of face recognition.

Moreover, we propose a novel approach for face recognition which uses information geometry techniques [36, 37] to discriminate face textures, in which sparse facial features are extracted from high-resolution color face images by using predefined landmark topologies, unlike the appearance-based approach, in which low-resolution grayscale face images are used to reduce the computational complexity [17, 19–21]. By adopting a common landmark topology, the dissimilarity between distinct face images can be scored in terms of the dissimilarities (obtained using the proposed texture dissimilarity measures) between their corresponding landmarks represented by multivariate normal distributions, which express the color distribution in the vicinities of each landmark location.

The classification of new face samples is based on the nearest neighbor rule. Therefore, a new sample is classified by determining the face image sample in the training set which minimizes the dissimilarity score against the new sample. Our new face recognition method was compared to methods representative of the state-of-the-art using color or grayscale face images, and provided higher recognition rates, reinforcing a belief that color information is relevant for face recognition [3, 31]. Moreover, our new texture dissimilarity metrics applied to face recognition also are efficient in general texture discrimination (e.g. texture recognition of material images), according to an additional set of experiments that we provide in texture recognition, also overcoming state-of-the-art methods.

This paper is organized as follows. Section 2 proposes geodesic distance approximations between multivariate normal distributions to be used as a texture dissimilarity metric in face recognition. Section 3 presents the proposed face recognition method, where section 3.1 discusses how sparse features and probability distributions are obtained from face images, and section 3.2 presents how dissimilarities between distinct face images are scored in terms of the dissimilarities between textures in their corresponding landmark vicinities by using the proposed geodesic distance approximations. The experimental results are presented and discussed in section 4 and the final conclusions and ideas for future works are presented in section 5.

2. Geodesic distance approximations between multivariate normal distributions

In many face recognition methods, face features are represented as vectors [3, 17, 19–21]. However, those feature representations are highly affected by natural image issues such as variations in illumination, pose and scale. Moreover, usually there are not enough samples (face images) properly to sample such high-dimensional feature spaces.

On the other hand, multivariate probability distributions of color image pixels tend to preserve the original image characteristics in a lower dimensionality representation, which are useful for texture discrimination. Moreover, such feature representations are robust to scale and pose variations. Therefore, we choose to represent image features as multivariate normal distributions which are defined as follows:

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle F(x | {\mu}, {\Sigma}) = \frac{{\rm e}^{-\frac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu)}}{\sqrt{{(2 \pi)}^C |\Sigma|}}, \nonumber \end{align} \tag{ 1 }$

where x is a C-dimensional vector, μ is the C-dimensional mean and Σ is the $C \times C$ covariance matrix, for images with C color channels.

Since geodesic distances are the natural distance measure for families of probability distributions [36], and assuming that the texture in the landmark vicinities is normally distributed, we use geodesic distances between normal distributions in order to measure dissimilarities between the textures of corresponding landmarks of distinct face images.

Considering the case when there are two univariate normal distributions $F_1(x | {\mu}_1, {\sigma}_1)$ and $F_2(x | {\mu}_2, {\sigma}_2)$ , the geodesic distance $G_{e}(F_1, F_2)$ between both distributions is given in a closed-form [37] by:

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle G_{e}(F_1, F_2) = \sqrt{2}\ln\frac{1+\delta}{1-\delta} = 2\sqrt{2}\tanh^{-1}\delta, \qquad \nonumber \end{align} \tag{ 2 }$

where

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle \delta\equiv \left[\frac{{({\mu}_1-{\mu}_2)}^2 + 2{({\sigma}_1-{\sigma_2})}^2}{{({\mu}_1-{\mu}_2)}^2 + 2{({\sigma}_1+{\sigma}_2)}^2}\right]^{1/2}. \nonumber \end{align} \tag{ 3 }$

However, for the proposed method, a univariate normal distribution is not suitable since it supports only monochromatic images (i.e. grayscale images). Instead, we use color-based feature representations since color features tend to improve image class discrimination [3, 31]. Therefore, multivariate normal distributions are more adequate to represent face image features.

One special case of multivariate normal distribution is when the covariance matrix ${\Sigma}={\rm diag}{({\sigma}_1^2, {\sigma}_2^2, ..., {\sigma}_C^2)}$ is a diagonal matrix (i.e. the color channels are independent features). Therefore, the geodesic distance $G_{f}(F_1, F_2)$ between multivariate normal distributions $F_1(x | \mu_1, \Sigma_1)$ and $F_2(x | \mu_2, \Sigma_2)$ given by [38] for diagonal covariance matrices can be used as a dissimilarity metric:

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle \label{G_f} G_{f}(F_1, F_2) = \sqrt{\sum_{c=1}^{C} {G_{e}(F_1^c, F_2^c)}^2}, \nonumber \end{align} \tag{ 4 }$

where $F_1^c = F_1(x(c) | {\mu}_1(c), {\Sigma}_1(c, c))$ represents the cth independent univariate normal distribution with mean ${\mu}_1(c)$ and variance ${\Sigma}_1(c, c)$ , belonging to the multivariate distribution $F_1(x | \mu_1, {\Sigma}_1)$ .

However, image color channels are usually not statistically independent, and using multivariate normal distributions with diagonal covariance matrices may discard relevant and discriminative texture information which should be accounted for geodesic distances. Moreover, such distributions are not generally adequate for texture discrimination since they ignore the natural covariances between color channels inherent in the color images.

Therefore, in order to obtain more accurate geodesic distances, we can consider using geodesics for general multivariate normal distributions, where the covariances between color channels are also accounted. Unfortunately, there is no known closed-form solution for this case, but closed-form solutions for two specific multivariate normal distribution subcases are known [37, 38]:

(i)
${\mu_1\neq\mu_2, \Sigma_1=\Sigma_2:}$
$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle {G_\mu(F_1, F_2)} = \sqrt{\left(\mu_1-\mu_2\right)^T {(\Sigma_1)}^{-1} \left(\mu_1 - \mu_2\right)}, \label{Gmu} \nonumber \end{align} \tag{ 5 }$
(ii)
${\mu_1=\mu_2, \Sigma_1\neq\Sigma_2:}$
$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle {G_\Sigma(F_1, F_2)} = \sqrt{\frac{1}{2} \sum_{j=1}^C \log^2(\lambda_j)}, \label{GSigma} \nonumber \end{align} \tag{ 6 }$

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle {\rm with} \ \{\lambda_j\}= {\rm Eig} ({(\Sigma_1)}^{-1/2} \Sigma_2 {(\Sigma_1)}^{-1/2}), \nonumber \end{align} \tag{ 7 }$
where Eig is a function that returns the eigenvalues of a given matrix and $\lambda_j$ indicates the jth eigenvalue.

We intend to approximate the geodesic distance for the case of general multivariate normal distributions based on equations (5) and (6), however some adaptations are necessary due to the fact that distinct images often present different means and covariance matrices. As equation (6) does not consider means ( $\mu_1$ and $\mu_2$ ), we can use it without changes since it is independent of the means. However, equation (5) requires a common covariance matrix $\Sigma_1$ , but we have two different covariance matrices $\Sigma_1$ and $\Sigma_2$ . Therefore, we propose the following two alternatives for computing $G_\mu$ for general multivariate normal distributions:

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle \label{DdmuG} G_{\mu}^g(F_1, F_2) &=0.5\sqrt{\left(\mu_1-\mu_2\right)^T {(\Sigma_1)}^{-1} \left(\mu_1-\mu_2\right)} \nonumber \\ &\quad+0.5\sqrt{\left(\mu_1-\mu_2\right)^T{(\Sigma_2)}^{-1} \left(\mu_1-\mu_2\right)}, \nonumber \end{align} \tag{ 8 }$

and,

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle \label{DdmuH} G_{\mu}^h(F_1, F_2) = \sqrt{\left(\mu_1-\mu_2\right)^T{\left(\frac{\Sigma_1+\Sigma_2}{2}\right)}^{-1} \left(\mu_1-\mu_2\right)}, \nonumber \end{align} \tag{ 9 }$

leading to two distinct ways to approximate the geodesic distance for general multivariate normal distributions:

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle {G_g}(F_1, F_2) = \frac{G_\mu^g(F_1, F_2) + G_\Sigma(F_1, F_2)}{2}, \ \ {\rm or} \label{GeoG} \nonumber \end{align} \tag{ 10 }$

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle {G_h}(F_1, F_2) = \frac{G_\mu^h(F_1, F_2) + G_\Sigma(F_1, F_2)}{2}. \label{GeoH} \nonumber \end{align} \tag{ 11 }$

Considering that the color channels of face images are statistically independent, we fall into the multivariate case with diagonal covariance matrix (G_f). Otherwise, the general multivariate case provides a more accurate geodesic distance approximation between multivariate normal distributions (G_g or G_h).

Moreover, based on information geometry concepts [36, 37], the proposed geodesic distance approximations for multivariate normal distributions (G_f, G_g and G_h) can be considered as Riemannian metrics on the parameter space of the multivariate normal distributions given by the C-dimensional mean vectors (μ) and the $C \times C$ -dimensional positive semi-definite symmetric matrices, i.e. covariance matrices (Σ). As a consequence, our proposed geodesic distance approximations can be used as efficient dissimilarity metrics for the statistical discrimination of texture representations.

Next, we present the proposed approach for face recognition, which is based on the proposed geodesic distance approximations as a texture dissimilarity metric.

3. Face representation and recognition

Next, we present our proposed approach for face representation and classification.

3.1. Sparse face feature extraction

Typical appearance-based methods [17, 19–21] exploit the face data variability for machine learning. However, in order to reduce the computational complexity, these methods use low-resolution grayscale face images which are converted to the form of high-dimensional feature vectors. On the other hand, more discriminative features tend to be obtained from high-resolution color face images by extracting information from the texture in the vicinities of key points on the face images (i.e. landmarks) [3]. Therefore, we propose a feature extraction method based on the sparse approach, since this feature representation can be approached as a multivariate classification problem [3, 31].

Assuming a point distribution model to represent color face images, a predefined topology with Q landmarks can be used to represent the facial features at Q face image locations. These Q landmarks may be manually annotated or automatically identified in the face images. However, there is uncertainty about the correct location of manually annotated or automatically identified landmarks due to image artifacts (e.g. head pose, noise, illumination change, etc). Therefore, given a landmark topology, we can introduce interpolated landmarks between each pair of consecutive landmarks on a face image, improving the reliability of the biometric information. The final landmark topology contains a set of L identified and interpolated landmarks, with L > Q. Moreover, a known landmark topology was used in our work [3, 4], where the landmarks are positioned at key facial points (e.g. chin, mouth, nose, eyes, eyebrows and forehead), helping in the extraction of relevant features for face recognition.

Therefore, given a landmark topology with L landmarks, the texture in the squared vicinities with size $w \times w$ centered in each landmark l is extracted from each face image (i.e. head pose) b of face class a, considering that the face images have C color channels. For a landmark l, I_a,b,l(m, n) is the C-dimensional color vector representing the pixel $(m, n)$ in the vicinity of l, with $m, n = 1, 2, ..., w$ .

Next, statistical feature descriptors of each landmark are obtained. First, the C-dimensional color means ${\mu}_{a, b, l}$ in the landmark vicinities are obtained as follows:

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle \label{EqMeansFE} {\mu}_{a,b,l} = \mathbb{E}[I_{a,b,l}] = \frac{1}{w^2} \sum_{m,n=1}^w I_{a,b,l}(m,n), \nonumber \end{align} \tag{ 12 }$

that are associated with their respective $(C \times C)$ -dimensional covariance matrices ${\Sigma}_{a, b, l}$ , which are obtained as follows:

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle &{\Sigma}_{a,b,l} = \mathbb{E}[(I_{a,b,l}-\mu_{a,b,l})(I_{a,b,l}-{\mu}_{a,b,l})^T] \nonumber \\ &\qquad\,= \frac{1}{w^2} \sum_{m,n=1}^w (I_{a,b,l}(m,n) -\mu_{a,b,l})(I_{a,b,l}(m,n) -\mu_{a,b,l})^T. \label{EqCovsFE} \nonumber \end{align} \tag{ 13 }$

For instance, considering color (RGB) face images, ${\mu}_{a, b, l}$ will be a 3-dimensional vector, but for grayscale face images ${\mu}_{a, b, l}$ will be a 1-dimensional vector. As result, each landmark l in the face image b of class a is represented by the mean ${\mu}_{a, b, l}$ and the covariance matrix ${\Sigma}_{a, b, l}$ computed from the vicinity of the same landmark.

Since the landmarks represent discriminative information on the face images, we propose to calculate dissimilarities between distinct face images in terms of the dissimilarities between the texture in their corresponding landmark vicinities by adopting a common landmark topology (i.e. the landmark topology discussed in section 3.1) as will be described in section 3.2.

3.2. Face classification

Since geodesic distances are a natural dissimilarity metric for statistical distributions, we propose to calculate dissimilarities between distinct face images by summing dissimilarities between the texture of their corresponding landmarks which are given as geodesic distances approximations between multivariate normal distributions as presented in section 2. Considering L, the total number of landmarks in a landmark topology, a geodesic distance approximation between multivariate normal distributions can be adopted, i.e. G_f (equation (4)), G_g (equation (10)) or G_h (equation (11)).

Considering color (RGB) face images, the texture in the vicinity of each landmark can be considered as a multivariate normal distribution, since each pixel can be treated as a 3-dimensional sample within the landmark vicinity. Considering that the color channels are independent for each landmark, the geodesic distance approximation G_f for multivariate normal distributions with diagonal covariance matrices provides a suitable geodesic distance metric. In this case, the dissimilarity ${S_f}_{a, b}^{a', b'}$ between the face image (i.e. head pose) b of face class a with the face image $b'$ of face class $a'$ can be scored by using G_f as follows:

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle \label{Score_f} {S_f}_{a,b}^{a',b'} = \sum_{l=1}^{L} G_{f}(F_{a,b,l}, F_{a',b',l}), \nonumber \end{align} \tag{ 14 }$

where F_a,b,l represents a multivariate normal distribution with null covariances for the landmark l in the face image b of face class a with the C-dimensional mean ${\mu}_{a, b, l}$ and the $(C \times C)$ -dimensional covariance matrix ${\Sigma}_{a, b, l}$ . On the other hand, if the multivariate face data present relevant covariances between color channels, one of the proposed geodesic distance approximations for general multivariate normal distributions (G_g or G_h) should be more adequate for the score calculation:

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle \label{Score_g} {S_g}_{a,b}^{a',b'} = \sum_{l=1}^{L} G_{g}(F_{a,b,l}, F_{a',b',l}), \nonumber \end{align} \tag{ 15 }$

or

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle \label{Score_h} {S_h}_{a,b}^{a',b'} = \sum_{l=1}^{L} G_{h}(F_{a,b,l}, F_{a',b',l}). \nonumber \end{align} \tag{ 16 }$

As result, small scores indicate similar face images (which is the case of a sum of small dissimilarities between landmarks), and, similarly, bigger scores indicate dissimilar face images. Based on the nearest neighbor classification rule, the classification of a new face sample image $I_{a', b'}$ occurs by determining the face image I_a,b in the training set which is less dissimilar to $I_{a', b'}$ by minimizing one of the three proposed score functions: ${S_f}_{a, b}^{a', b'}$ , ${S_g}_{a, b}^{a', b'}$ or ${S_h}_{a, b}^{a', b'}$ .

Moreover, in the score calculation in practice, some landmarks may be influenced by issues that commonly affect the face representation, such as variations of illumination, changes in the head pose and change of appearance, including also landmark positioning inaccuracies or texture non-Gaussianity. Such inaccurately positioned landmarks are expected to have some effect on the proposed dissimilarity score functions calculations. However, the impact of such landmarks is attenuated by the addition of more landmarks, which are interpolated and tend to increase the overall feature quality of the face image representation, diluting the impact of any inaccurately positioned landmarks on the proposed scores calculations, since landmark interpolation increases the overall feature quality as discussed in section 3.1.

Next, we compare experimentally the proposed method with methods that are representative of the state-of-the-art under adverse image conditions found in practice.

4. Experimental results

Experiments were conducted to compare the proposed face recognition method presented in section 3 (which uses the geodesic distance approximations presented in section 2 to discriminate texture in the vicinities of the landmarks) to methods representative of the state-of-the-art using a face database commonly used in face recognition (i.e. the FERET face database [39]). This face database was created with the objective of providing credible data for the development of new techniques, technology, and algorithms for the automatic recognition of human faces. The database is used to develop, test, and evaluate face recognition algorithms. It presents color face images in high-resolution ( $512 \times 768$ pixels), organized in several subsets with specific head pose, expression, age, and illumination conditions. Experiments were performed with the color face images of the first 200 face classes of the subsets fa, fb, hl, hr, rb and rc, including all 6 head poses, totaling 1200 images (6 images for each class), as details in [4].

In all experiments with the proposed method, the feature extraction and representation method proposed in section 3.1 was applied to the face images in the database to extract statistical feature descriptors (i.e. mean vectors and covariance matrices) from the landmark vicinities of size $w \times w$ $(w=11)$ centered at each landmark location, using (12) and (13). In order to select consistent features from the landmarks vicinities, only faces with no landmark occlusions were used. A known landmark topology was used in all experiments in table 1, as is discussed in section 3.1, which allows to extract important and discriminative features from distinct face image locations (e.g. chin, mouth, nose, eyes, eyebrows and forehead) [3, 4].

Table 1. Face recognition rates obtained for the FERET face database.

Methods	RGB (%)	Grayscale (%)
Proposed method with score S_h	95.3	82.3
Proposed method with score S_g	95.4	83.2
Proposed method with score S_f	83.1	78.6
COLPP (d = 54, k = 6, t = 500, r = 0.78)	93.8	79.5
Enhanced ASM	72.5	53.8
SVM	85.2	76.4
SRKDA ( $\sigma=20\, 000$ )	64.9	59.7
MvDA (d = 100)	76.1	71.4
Eigenfaces (d = 51)	69.6	64.1
Fisherfaces (r = 0.8)	67.1	65.7
LPP (d = 50, k = 1, t = 500, r = 0.34)	67.1	65.5
OLPP (d = 54, k = 1, t = 500, r = 0.34)	69.0	66.2
LLE	54.2	46.2
Isomap	69.1	64.8

The methods used for comparison in the table 1 are the Customized OLPP method (COLPP) [3], Enhanced ASM method [31], support vector machines (SVM) [40], spectral regression kernel discriminant analysis (SRKDA) [26], multi-view discriminant analysis (MvDA) [23], Eigenfaces [17], Fisherfaces [22], Laplacianfaces [19], orthogonal locality preserving projection (OLPP) [20], locally linear embedding (LLE) [30] and Isomap [28]. The proposed method is compared using three distinct score functions (S_f, S_g and S_h) defined in section 3.2.

A set of experiments involving the proposed method and the aforementioned methods was conducted on the FERET face database, and 6 runs were executed on the entire test subset. In each run, a leave-one-out test strategy was adopted, and 5 head poses per class were randomly selected for training, and 1 head pose per class was randomly selected for testing. Table 1 shows the average face recognition rates for the proposed method and methods representative of the state-of-the-art. All methods in table 1 use the same selection of face images, in color (RGB) or in grayscale (color images were converted to grayscale).

For each method listed in table 1, the parameters obtaining the best experimental results were chosen by testing each method with several parameters configurations until the maximum recognition rate was reached. The parameter d used in Eigenfaces, Laplacianfaces and other methods is the dimensionality of the subspace, assuming k neighbors, and r is the PCA ratio [17, 19, 20], which also is used by the Fisherfaces and the MvDA methods. The adopted SVM implementation was the LIBSVM [41]. In SRKDA, the Gaussian kernel with standard deviation σ was used. In the iterative Boosting LDA method [42], 10 iterations were performed in each experiment, using half of the training samples for training and the other half for validation, and the Euclidean distance was used as the distance measure. In table 1, the method MvDA was trained to use head poses as views. In the Enhanced ASM method [31], the parameter α was set to 1 giving more importance to measurements in the local vicinity of the landmarks.

As shown in table 1, experiments with color images presented higher recognition rates than the experiments with the same images but converted to grayscale, confirming a trend that color face features tend to improve face class discrimination [3, 31]. Moreover, the proposed face recognition method with the score functions S_h or S_g presented higher recognition rates than with the score function S_f, pointing out that the covariance information between color channels is important to accommodate effects of lighting variation and to approximate better the geodesic distances between multivariate normal distributions. Finally, the proposed face recognition method potentially can present higher recognition rates than comparable methods in the state-of-the-art.

The obtained execution time is relevant for practical face recognition applications, which commonly require real-time processing. The average execution time per color image for each method in the experiments reported in table 1 are mentioned in table 2, for training and for testing. The short execution times found for the proposed method are due to its low computational complexity, since only small mean vectors and covariance matrices are extracted from the face images. As a result, the proposed method has potential for real real-time applications. The experiments reported in table 2 were performed in a computer with an Intel i5 processor, third generation, with 8 Gb RAM.

Table 2. Elapsed execution times by image for training and testing in the face recognition experiments with the FERET face database.

Methods	Training (s)	Test (s)
Proposed method with score S_h	0.04	0.04
Proposed method with score S_g	0.04	0.04
Proposed method with score S_f	0.04	0.04
COLPP	0.06	0.05
Enhanced ASM	0.47	0.05
SVM	0.14	0.06
SRKDA	0.06	0.05
MvDA	0.07	0.04
Eigenfaces	0.06	0.04
Fisherfaces	0.06	0.05
LPP	0.08	0.06
OLPP	0.06	0.05
LLE	0.81	1.19
Isomap	0.12	0.44

The results obtained in table 1 show that the proposed metrics to discriminate texture are efficient in face recognition (G_g in (10) and G_h in (11)). Moreover, we provide an additional set of experiments in general texture discrimination (e.g. texture recognition of material images) in order to evaluate the potential of the proposed texture discrimination metrics shown in section 2 to be applied in typical texture discrimination problems (i.e. represented here by the KTH-TIPS texture database [43] and the KTH-TIPS-2b texture database [44]). In these experiments, statistical feature descriptors (i.e. mean vectors and covariance matrices) were extracted using (12) and (13) from the entire texture images, assuming that the texture features in each texture image are normally distributed. This is supported by the fact that the human face presents a recognizable structure which helps face recognition [3, 31]. However, texture images often contain stochastic variations, and may vary with the pose and scale, so textures are described statistically.

As mentioned before, in order to evaluate the potential of the proposed method for texture recognition, additional tests were performed on the KTH-TIPS texture database [43] and on the KTH-TIPS-2b database [44]. The KTH-TIPS texture database [43] provides images of textured materials in color with size $200 \times 200$ organized in 10 texture classes, and each class consists of 81 samples which are captured under nine scales, three different poses and three distinct illumination directions. Experiments were run by partitioning the database samples in 50 partitions of training and testing sets, in which half of the samples per class are randomly selected for training and the remaining half for testing [7]. Table 3 shows the average texture recognition rates for the proposed method and methods representative of the state-of-the-art.

Table 3. Texture recognition rates obtained for the KTH-TIPS texture database.

Methods	Recognition rates (%)
Proposed method with score S_h	99.76
Proposed method with score S_g	99.76
Proposed method with score S_f	66.80
SRP [7]	99.29
PLS [45]	98.50
SSLBP [46]	99.39
LETRIST [47]	99.00
DMD [48]	97.96

The methods presented in table 3 used for comparison in the KTH-TIPS database are the sorted random projections (SRP) [7], pattern lacunarity spectrum (PLS) [45], scale-selective local binary pattern (SSLBP) [46], locally encoded transform feature histogram (LETRIST) [47] and dense microblock difference (DMD) [48]. The proposed method was compared using three distinct score functions (S_f, S_g and S_h) defined in section 3.2.

Another challenging database used in texture recognition is the KTH-TIPS-2b database [44] which provides material images in color with size $200 \times 200$ organized in 11 texture classes, and each class consists of 432 samples which are captured under nine scales, three different poses and four distinct illuminants, as exemplified in [44]. Fifty experiments were run partitioning the database samples in a ten-fold test strategy [49], in which 11 samples per class are randomly selected for testing and the remaining samples were selected for training in each experiment. Table 4 shows the average texture recognition rates for the proposed method and methods representative of the state-of-the-art.

Table 4. Texture recognition rates obtained for the KTH-TIPS-2b texture database.

Methods	Recognition rates (%)
Proposed method with score S_h	99.24
Proposed method with score S_g	99.23
Proposed method with score S_f	93.70
LBP [50]	92.53
ILBP [51]	95.88
SLBP [52]	95.54
LTP [53]	96.61
αLBP [49]	96.04
IαLBP [49]	97.25

The methods presented in table 4 used for comparison in the KTH-TIPS-2b database are the local binary pattern (LBP) [50], improved LBP (ILBP) [51], shift LBP (SLBP) [52], local ternary pattern (LTP) [53], α-local binary pattern (αLBP) [49], and Improved αLBP (IαLBP) [49]. The proposed method was compared using three distinct score functions (S_f, S_g and S_h) defined in section 3.2.

In the experimental results presented in tables 3 and 4, the proposed texture dissimilarity metric (i.e. geodesic distance approximations) applied to texture recognition with the score functions S_h or S_g presented higher recognition rates than with the score function S_f, also pointing out that the covariance information between color channels is important to approximate better the geodesic distance between multivariate normal distributions. Finally, our new method applied to texture recognition presented higher recognition rates than comparable methods in the state-of-the-art, also pointing out that this discrimination metric is efficient not only for face textures, but also for typical textures, such as those occurring in material images.

5. Conclusions

In this work, geodesic distance approximations for multivariate normal distributions were proposed as texture dissimilarity measures applied to face recognition. Also, a novel face recognition method based on information geometry techniques [36] is proposed. In the proposed approach, the textural dissimilarities in the vicinities of corresponding landmarks in distinct high-resolution color face images are scored in terms of these geodesic approximations, i.e. using the proposed geodesic distance approximations between multivariate normal distributions representing the color distributions in the vicinity of each landmark location. Besides, a specific landmark topology is utilized to extract and compare the face landmarks.

Our proposed face recognition method tends to handle better common issues in face recognition, such as variations in illumination, changes in the head pose, change of appearance, and other issues, since the extracted pixel distributions sampled in the vicinities of the face landmarks tend to be similar across different expressions and head poses. Moreover, the new method takes advantage of the natural redundancy that exists in high-resolution color face images, so it more accurately evaluates the dissimilarities between textures in the vicinities of corresponding landmarks.

Our method was compared to methods that are representatives of the state-of-the-art using color and also grayscale face images, and it tends to obtain higher recognition rates. Moreover, the experimental results also support a trend in which color information is relevant on face recognition [3, 31].

Additionally, our texture dissimilarity measures applied in face recognition potentially can be efficient in general texture discrimination (e.g. texture recognition of material images); an additional set of experiments in texture recognition showed that our method improved on state-of-the-art methods. Furthermore, using different covariance matrices was found to be relevant for texture discrimination.

Future work will deal with issues such as the identification of the best landmark topology for face recognition. Also, we intend to investigate texture feature representations for binary patterns applied to face recognition. Further study will be of alternative techniques to obtain other geodesic distance approximations for multivariate normal distributions, including Gaussian mixture models.

Acknowledgments

The authors thank Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES), Brazil, for financial support.

Face recognition based on texture information and geodesic distance approximations between multivariate normal distributions

Article metrics

Submit

Permissions

Author e-mails

Author affiliations

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction

2. Geodesic distance approximations between multivariate normal distributions

3. Face representation and recognition

3.1. Sparse face feature extraction

3.2. Face classification

4. Experimental results

5. Conclusions

Acknowledgments

Face recognition based on texture information and geodesic distance approximations between multivariate normal distributions

Article metrics

Submit

Permissions

Share this article

Author e-mails

Author affiliations

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction

2. Geodesic distance approximations between multivariate normal distributions

3. Face representation and recognition

3.1. Sparse face feature extraction

3.2. Face classification

4. Experimental results

5. Conclusions

Acknowledgments