Next Article in Journal
Automatic 2D Floorplan CAD Generation from 3D Point Clouds
Next Article in Special Issue
A New Differential Mutation Based Adaptive Harmony Search Algorithm for Global Optimization
Previous Article in Journal
Quinazolin-4(3H)-ones: A Tangible Synthesis Protocol via an Oxidative Olefin Bond Cleavage Using Metal-Catalyst Free Conditions
Previous Article in Special Issue
Advanced Parameter-Setting-Free Harmony Search Algorithm
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Feature Selection for Facial Emotion Recognition Using Cosine Similarity-Based Harmony Search Algorithm

1
Department of Computer Science and Engineering, Future Institute of Engineering and Management, Kolkata 700150, India
2
Department of Computer Science and Engineering, Jadavpur University, Kolkata 700032, India
3
Department of Information Technology, Jadavpur University, Kolkata 700106, India
4
Department of Energy IT, Gachon University, Seongnam 13120, Korea
*
Author to whom correspondence should be addressed.
Appl. Sci. 2020, 10(8), 2816; https://doi.org/10.3390/app10082816
Submission received: 10 March 2020 / Revised: 10 April 2020 / Accepted: 14 April 2020 / Published: 19 April 2020

Abstract

:
Nowadays, researchers aim to enhance man-to-machine interactions by making advancements in several domains. Facial emotion recognition (FER) is one such domain in which researchers have made significant progresses. Features for FER can be extracted using several popular methods. However, there may be some redundant/irrelevant features in feature sets. In order to remove those redundant/irrelevant features that do not have any significant impact on classification process, we propose a feature selection (FS) technique called the supervised filter harmony search algorithm (SFHSA) based on cosine similarity and minimal-redundancy maximal-relevance (mRMR). Cosine similarity aims to remove similar features from feature vectors, whereas mRMR was used to determine the feasibility of the optimal feature subsets using Pearson’s correlation coefficient (PCC), which favors the features that have lower correlation values with other features—as well as higher correlation values with the facial expression classes. The algorithm was evaluated on two benchmark FER datasets, namely the Radboud faces database (RaFD) and the Japanese female facial expression (JAFFE). Five different state-of-the-art feature descriptors including uniform local binary pattern (uLBP), horizontal–vertical neighborhood local binary pattern (hvnLBP), Gabor filters, histogram of oriented gradients (HOG) and pyramidal HOG (PHOG) were considered for FS. Obtained results signify that our technique effectively optimized the feature vectors and made notable improvements in overall classification accuracy.

1. Introduction

In recent days, due to the rapid growth of technology, human–computer interaction has begun to gain courtesy in the research domain. Expression of emotions through facial expressions is an important aspect of human communication that serves social interaction. In the work described by Shan et al. in [1], researchers have validated seven universal feature expressions, namely disgust, anger, happiness, sadness, neutral, fear and surprise. According to facial action coding system [2], various positions of the muscles on the face are responsible for the facial expression as explained by Ekman and Rosenberg [3]. The total number of action units in facial action coding system is found to be 46. These actions are associated with defined set of muscle movement [4]. The main use of Facial Emotion Recognition (FER) [5] lies in determining the emotional state of a person which further helps in depicting the mental state identification and mental disorders. Besides, FER also contributes to video indexing and data driven animation [6]. FER system is capable of making human–robot communication much more significant and effective. There are various challenges involved in FER. Predominantly, the camera angle is one of the key issues in data collection part. Wrong camera angle can hamper the recognition accuracy as it can distress the small muscle movements. Besides, there are many types of noises that also affect the minute muscle movements in eyes, nose or other parts of the face.
For a robust FER system, the primary need is to design a competent-feature vector. As the features are extracted from facial images showing different human emotions, in many cases, irrelevant or redundant features are generated. This ultimately stretches the dimension of feature set and brings down the overall accuracy in making predictions. Feature selection (FS)—an initiative to optimize feature dimensions undertaken by various researchers—aims in extricating redundant/irrelevant features that do not make any significant contribution in the overall prediction process [7,8]. FS is a useful way for substantial reduction of the size of original-feature vectors used to predict target facial emotions expressed by humans. This not only reduces computation time required to make the prediction, but also improves final accuracy by removing redundant and irrelevant features. Redundant and noisy features lead to misclassification and higher memory requirements, which, in turn, significantly increase model building time [9]. FS requires choosing a best-feature subset from 2 N feature subset combinations, where N is the size of the feature subset [10]. Some of the techniques of FS may prove to be very costly. As a result, in many cases, random search and heuristic search techniques are employed. In [11], an algorithm for unsupervised FS using feature similarity measure, called maximum information compression index was elucidated. Another FS technique was proposed in [12] which uses sequential search methods characterized by dynamically changing the number of features, i.e., “floating” methods. The works described in [13,14,15,16] are some other established applications of FS in solving problems like handwriting recognition, handwritten script classification [17], handwritten numeral classification, etc. Another instance of FS, called histogram-based fuzzy ensemble technique was applied to UCI datasets for evaluation in [18].
FS algorithms are segregated into three categories [19]. The filter method applies a statistical or probabilistic approach to assign a scoring to each feature and these features are selected to be either kept or removed from the original-feature set. The wrapper method is often used in conjunction with a machine learning algorithm (which works as a classifier), where the learning algorithm plays the part of the feature validation process and selects an optimal feature subset which enhances the classification accuracy. The hybrid method tends to perform computationally better than wrapper method. Both wrapper and hybrid methods make selections based on classifier that may or may not work well with any other classifier. That is because the optimal feature subset is built when the classifier is built, and the selection depends on the hypotheses which the classifier makes. In our work, we have focused mainly on the supervised filter-based FS method [20] approach, where the feasibility check for a feature subset is based on a popular statistical measure called Pearson’s correlation coefficient (PCC) [21] and the minimal redundancy maximal relevance (mRMR) [22], which takes into the account of relation of features with other features and also with the classes.
Our adopted technique applies a supervised filter harmony search algorithm (SFHSA) in order to perform FS. Keeping the fundamentals of the algorithm intact, we have made modifications to the pitch adjustment procedure of the algorithm where we have incorporated the concept of cosine similarity for adjusting values of the variables. The details of implementation were elucidated in Section 4. Some applications of HSA on FS was discussed in [23,24,25,26,27,28,29]. The proposed HSA-based FS was applied on a set of on five different state-of-the-art feature descriptors that include uniform local binary pattern (uLBP), horizontal–vertical neighborhood local binary pattern (hvnLBP), Gabor filters, histogram of oriented gradients (HOG) and pyramidal HOG (PHOG). The method is tested on two popular FER datasets, namely Japanese female facial expression (JAFFE) [30] and Radboud faces database (RaFD) [31].
The flow of the paper is arranged as follows: Section 2 describes motivation behind the present work along with some state-of-the-art methods related to FER. Section 3 presents the detailed description of the datasets used. It also explains the five feature descriptors briefly. The proposed optimal feature subset selection method, called SFHSA, was detailed in Section 4. The performance evaluation of the SFHSA is reported in Section 5. Section 6 finally concludes the paper. The supporting codes are uploaded in the Github link: https://github.com/Soumyajit-Saha/Cosine-Similarity-based-Harmony-Search-Algorithm.

2. Motivation and Related Work

High dimensional datasets often impose computational overheads especially in cases where achieving a near perfect classification accuracy score is the primary concern. Hence, researchers focus mainly on reducing the dimensions of feature sets by filtering out irrelevant features that do not have a significant impact on the classification process. The overheads imposed by high dimensional feature sets mainly include time and space overheads [9,10,23]. Nowadays—especially in fields of bioinformatics, medicine and genetics [24], which include features or genes denoted in the form of microarray data of humongous dimension—FS is a necessity in countering the underlying challenges of space and execution time. Many efficient algorithms have been established for this purpose. With an objective to serve the same purpose in FER domain, we have established a FS algorithm to increase the efficiency of the machine learning algorithm and reducing the dimension of the feature set simultaneously. We developed a novel HSA-based FS technique—SFHSA—and applied it on the FER datasets and demonstrated a clear comparison in juxtaposition with other established FS techniques.
In FER, the features are obtained from datasets having a large dimension. However, there are many redundant features present within the feature sets that bring down the classification accuracy. To eradicate those redundant and irrelevant features as well as make the predictions more accurate, we have taken the initiative to apply our proposed FS technique over FER datasets. The features, extracted from facial expression images include uLBP, hvnLBP, Gabor filters, HOG and PHOG. There are few FS works found in the literature on FER. In [25], a wrapper-based approach to FS using genetic algorithm (GA) was applied to features for FER, obtained using log-Gabor filters. A linear programming technique was used for FS in [26], where the features for FER was extracted using Gabor filters. A FS based on random forest classifiers was incorporated in the FER system where visual-appearance based features were extracted from Gabor filter bank in [27]. For the FS, the mRMR method based on mutual information (MI) quotient was used. Li et al. in [28] use the combination of fixed filters and trainable non-linear 2D filters based on the biologic mechanism of shunting inhibition. Finally, FS was performed using MI and class separation scores. The work described in [29] proposes automatic FER where features were generated using methods like Gabor filters, log Gabor filter, LBP operator, higher-order local autocorrelation and higher order local autocorrelation-like features. A self-learning attribute reduction algorithm was proposed for FS in [32] which is based on rough set and domain oriented data-driven data mining. The authors in [33] describe an efficient FS technique, late-hill-climbing-based memetic algorithm (LHCMA) on feature sets for FER, which indeed outperforms many previously established FS algorithms. A facial expression recognition system with hvnLBP-based feature extraction and micro GA—along with the particle swarm optimization (PSO)-based feature optimization technique—was proposed in [34] by Mistry et al. The FER method, based on a wrapper-based FS technique called multi-objective differential evolution algorithm, was proposed in [35]. The evaluation was done using support vector machine (SVM) classifiers. We have seen that HSA has provided satisfactory results in the case of FS for holistic Bangla word recognition [36], digit classification [37], email classification [38], epileptic seizure detection [39] and protein sequence classification [40]. However, to best of our knowledge, to date, the HSA-based FS technique has not been applied to FER systems. This has served as a motivation for us to perform the SFHSA for FER systems.

3. Dataset and Feature Description

This section is divided into two subsections: dataset description and feature extraction. The first subsection deals with the dataset and preprocessing information, whereas the next subsection puts forward a brief description of the feature descriptors used here.

3.1. Dataset Description

Two popular benchmark FER datasets were included in the present work, namely JAFFE and RaFD. Details of those databases and their preprocessing algorithms are demonstrated in the following sections.

3.1.1. JAFFE

The JAFFE dataset [30] includes facial expression of 10 different Japanese females. It consists of 7 basic facial expressions: surprise, angry, happy, disgusted, fearful, sad and neutral. In total, there were 213 images, which result in an unequal number of samples per class. Hence, data augmentation [41] was performed to handle this issue. The augmentation was achieved by introducing Gaussian white noise of constant variance and mean to 11 sample images. In the end, the dataset contained a total of 224   ( = 32 × 7 ) images, i.e., 32 images per class. Sample images from the JAFFE dataset are shown in Figure 1.

3.1.2. RaFD

The RaFD facial emotion dataset [31] was constructed from 67 models (consisting of Caucasian males and females, Moroccan Dutch males and Caucasian boys and girls). It consists of 8 various expression classes: disgust, fear, happy, neutral, contempt, surprise, sadness and anger. While capturing each facial expression, five different camera angles and three different gaze directions (frontal, left and right) were used. In this dataset, each class contains 201 images. Figure 2 shows sample images taken from RaFD dataset.

3.1.3. Preprocessing

Facial emotion images generally suffer from various extraneous noises, as they are captured in different environmental conditions. These affect the feature extraction process. A suitable preprocessing technique was required to overcome this problem. Hence, Viola Jones [42] was applied to focus attention on the important region within the whole image. The important region or point of interest was mainly concerned with the areas containing facial expressions such as the lips, eyebrows, nose, eyes, etc. The Viola Jones algorithm returns the coordinates of the bounding box that contains the regions of interest. To make the method more robust and comparable with real-world scenario, the technique was assessed with various image dimensions. In the present work, facial images of three different dimensions that include 32 × 32 ,   48 × 48   and   64 × 64 were considered for the evaluation purpose. After preprocessing, the facial images were resized to their corresponding resolutions. Then, the images were used for the feature extraction. Finally, the extracted features were fed to the proposed feature selection algorithm. The use of three image sizes took into account the variation in image quality in practical applications for emotion recognition.

3.2. Feature Description

In this section, all the feature extraction methods are explained briefly. In the present work, we considered HOG, PHOG, uLBP, hvnLBP and Gabor filter-based feature extraction methodologies.

3.2.1. Histogram of Oriented Gradients

HOG [43] is a texture-based feature descriptor that adopts the histogram of gradients as a statistical measure. The primary idea behind the concept is that any local shape and object can be demonstrated using the gradient intensity distribution or edge direction. As HOG is invariant to geometric transformation, it is widely used in pattern recognition domain. In the computation part, first of all the entire facial image was divided into cells. After that, the gradients were calculated according to Equation (1). In this work, one dimensional gradient was taken. A matrix M = [ 1 0 1 ] was used to calculate the gradient in the X direction denoted by Grad X . Subsequently, M T was adopted for gradient in Y direction denoted by   Grad Y . In this work, the matrix M was used as a mask which was passed through the entire image and the variable here was the pixel intensity. The gradient was computed by taking the intensity difference of the neighboring pixels in particular direction. In case of Grad X , the intensity difference along horizontal neighboring pixels was considered. Similarly, for Grad y the intensity difference along vertical neighboring pixels was considered. The final gradient direction Grad Dir was calculated as follows:
Grad Dir =   tan 1 Grad Y Grad X
Next, the entire gradient direction domain (lying between 0 to 360 ) was divided into 8 histogram bins. Initially, the entire image was divided into cells. For every cell, the bin count for each histogram bins was obtained. We increment the bin count if the value of the gradient direction falls in the range of a particular bin. This way, we achieve the histogram of the gradients for each cell. Finally, those histograms obtained from each cell were concatenated to get the final feature vector. For the extraction of HOG feature, three image dimensions were considered. HOG feature descriptors were applied on each facial emotion image and the final feature vectors were obtained. The feature size of HOG depends on the image size and the cell dimension. The feature dimension for 32 × 32 ,   48 × 48   and   64 × 64 images are 324, 900 and 1764 respectively.

3.2.2. Pyramidal HOG

Researchers have mainly relied on PHOG feature descriptors [44,45] for object recognition. The spatial pyramid representation of HOG is computed here. It mainly sets up the local shape and retains the spatial information by segregating it into various levels. The spatial information was retained by using HOG descriptor in every level. In this work, Canny edge detector [46] was applied to obtain the contour of each region. The edge detector was mainly used to capture the local shapes. An arrangement of minute spatial grids was shaped by doubling the separations in each axis direction for every region. For each resolution level L = l , the grid consists of 2 ^ l cells along each dimension. For example, at L = 2 , the number of grids along X   a x i s will be 4 . As a result, the total number of grids becomes 4 4 = 16 . Further, a Sobel mask [47] of window size 3 × 3 was applied for extracting the orientation of the gradients along the edge contours. At this stage, the procedure of gradient binning was performed similar to HOG descriptor. The gradients corresponding to same cell are quantized and combined into N histogram bins. We increment the bin count if the gradient direction value lies in the range of that particular bin. Mainly, the orientation for binning was performed using either [ 0   to   360 ] range or [ 0   to   180 ] range where the contrast sign was neglected [36]. These bins were sorted and concatenated into a single sequence corresponding to the same level. For each level, a histogram will be obtained. Finally, all the histograms corresponding to each level were merged to get the final feature vector. In the present work, we have used the PHOG descriptor to capture facial features from the images (having three different dimensions of 32 × 32 ,   48 × 48   and   64 × 64 ) of the JAFFE and RaFD datasets. The number of pyramids was kept up to level 3 ( L = 3 ) and number of histogram bins was 8 ( N = 8 ). The orientation of the gradient ranges between 0   to   360 . The number of features obtained from PHOG can be formulated as: N l L 4 l . Putting L = 3   and   N = 8 , we obtain final feature vector of size 680.

3.2.3. Gabor Filter

In this work, we have also used the Gabor filter, a well-known frequency-based feature descriptor [48,49]. It was a linear filter that was mainly applied for texture study. The Gabor filter mainly analyzes the presence of an explicit frequency content in a specific direction within a localized region around the point or region of interest in the image. It is also invariant to rotation, translation and scale. Besides, it is also robust towards any kind of photometric disorders, mainly occur as illumination changes and image noise [37]. In the spatial domain, two dimensional Gabor filter includes the sinusoidal plane modulated Gaussian kernel function [50]. Equations of the kernel function for calculating Gabor filter-based features in the spatial domain are given in Equations (2)–(4).
G a b o r ( x , y ) = f 2 π γ η e x p ( x 2 + γ 2 y 2 2 σ 2 ) × e x p ( i .   ( 2 π f x + ω ) )
x = x c o s θ + y s i n θ
y = x s i n θ + y c o s θ
Here, the standard deviation of the Gaussian envelope is expressed as σ . γ is the spatial aspect ratio and the ellipticity of the support of the Gabor function. i denotes the imaginary number. Phase offset is specified as ω . f stands for sinusoid frequency and θ symbolizes the alignment of normal to parallel stripes of the Gabor function. Eight different orientations and five distinct scales were taken in the Gabor model that results in 40 diverse Gabor filters.
Multiple spatial resolutions and orientations were measured from the set of 2D Gabor filter bank. Then, these were used for convolution of each facial image sample. Let us consider a sample facial image, i m g ( x , y ) and the corresponding Gabor filter kernel is Δ u , v ( x , y ) . The characterization of the output image,   O u t u , v ( x , y ) is given in Equation (5) [39] as follows:
O u t u , v ( x , y ) = i m g ( x , y ) .     Δ u , v ( x , y )
Finally, a down sample by a factor of 8 was performed on the obtained Gabor features. The size of the feature vector varies with the image dimension. In the present work, image dimensions which were taken into account are 32 × 32 ,   48 × 48   and   64 × 64 . Table 1 describes the feature dimension corresponds to the previously mentioned image size.

3.2.4. Uniform Local Binary Pattern

LBP, which was first introduced by Ojala et al. [51], is a useful texture-based feature. It mainly captures the edge properties by taking the intensity differences of the center pixel with its surrounding pixels. In this work, we have considered a window size of 3 × 3 around a center pixel. As a result, a total of eight neighboring pixels were considered. The difference between the center pixel and each of the surrounding pixels were calculated. If the difference was greater than zero, we assigned 1; else 0. In this way, each center pixel was represented in a two-digit LBP code. The calculation of the LBP code is shown in Figure 3. In this figure, the top left corner is taken as the 7 th bit and the bits were considered in a clockwise fashion until it reaches to the 0 th bit. The resultant 8-bit binary number is formed which is then converted to its equivalent decimal number. This process is formulated in Equation (6), where, ( x c e n , y c e n ) is the center pixel and i k is one of the surrounding pixels.
L B P ( x c e n , y c e n ) = k = 0 7 n e w ( i c e n i k ) 2 k
A transition in a binary string is defined as the change from 0 to 1 or change from 1 to 0. The strings that comprise less than or equal to two transitions are known as uniform strings and others are called as non-uniform strings. The main purpose of using uniform pattern was to eliminate the redundant features and capture the information properly. In this work, uniform property was incorporated with the binary strings obtained by LBP. Therefore, most of the redundant binary strings will be eliminated as they were non-uniform. The obtained uniform strings were converted to their respective decimal values. The histogram of those values were taken as the feature vector as formulated in Equation (7), where, L denotes the number of bins.
H i s t n = x , y I ( I m a g e l a b e l ( x , y ) = i ) ,     i = 0 , ,   L 1
In this work, initially all the images were divided into 16   ( = 4 × 4 ) blocks irrespective of the image dimension. The main purpose of the blocking was to preserve the local information of the image. Divisions were made uniformly in the entire image. Then uLBP was applied in each sub blocks. The obtained feature vectors from each sub block were concatenated to get the final feature vector. Here, the feature dimension of uLBP was   59   ( = 8 ( 8 1 ) + 3 ) . As the image was divided in 16 sub blocks, the final feature dimension becomes   944 .

3.2.5. Horizontal vertical Neighborhood Local Binary Pattern

hvnLBP is a very useful texture feature which was first proposed by Mistry et al. [34]. It acquires better contrast information among the neighborhood pixels such as edges and corners. Similarly, as uLBP, a 3 × 3 window was considered for hvnLBP that contains eight neighboring pixels. The surrounding pixels are denoted as L = { l 0 , l 1 , l 2 , l 3 , l 4 , l 5 , l 6 , l 7 } . In case of hvnLBP, the comparison was done among the neighboring pixels. By comparing those surrounding values, we get a binary string of 8 bits which was further converted to its equivalent decimal value. The process is formulated in Equations (8) and (9). An example for calculating hvnLBP is also shown in Figure 4.
h v n L B P ( x , y ) = {   f u n ( max ( l 0 , l 1 , l 2 ) ) , f u n ( max ( l 7 , l 3 ) ) , f u n ( max ( l 6 , l 5 , l 4 ) ) , f u n ( max ( l 0 , l 7 , l 6 ) ) , f u n ( max ( l 1 , l 5 ) ) ,   f u n ( max ( l 2 , l 3 , l 4 ) )   }
C ( max ( l a , l b , l c ) ) = { 1 i f   m a x i m u m 0 o t h e r w i s e
In Equation (9), the l b can also be absent. In that case, only two-pixel intensities will participate in the comparison.
As the binary string consists of 8 bits, the total number of possible values are 256. After obtaining the equivalent decimal values, the histogram was considered as described in Equation (7).
For generating discriminative facial representation, hvnLBP was combined with the 2-D Gabor filter. In this work, a total of 16 magnitude images of various wavelengths and orientations from Gabor filter were taken at the beginning. Finally, hvnLBP was applied on those 16 magnitude images. The feature vector obtained from each magnitude images were concatenated to get hold of the final feature vector. In a typical hvnLBP, there were total 256 features. As there were 16 magnitude images, the final feature dimension becomes 4096 .

4. Proposed Work

In this paper, we have proposed a FS technique based on HSA, which we have named as supervised filter harmony search algorithm (SFHSA). SFHSA uses cosine similarity and mRMR combined with PCC. HSA has proved to be an efficient technique [52,53,54,55] for providing an optimal solution to real life problems, in terms of feasible computation time and usage of memory, as proposed by Lee and Geem in [53]. It simulates the procedure opted by the musicians to find out the finest tune by selecting a particular combination of frequencies produced by sundry musical instruments. It optimizes an objective function by selecting an appropriate combination of solutions from existing set of solutions by employing random search. In [56], an improved version of HSA has been formulated which uses fine-tuning parameters of mathematical techniques to enhance the performance of HSA. Due to the efficient nature of HSA in finding the global optimum solution, it has been exploited in various works related to FS. The traditional approach of HSA involves adjustment of the pitch considering a parameter named Bandwidth (BW). In this phase of the algorithm, we have made a modification in adjustment of the pitch. In our case it was the adjustment of the features by replacing their values with the values of their cosine similar features subject to satisfaction of certain conditions rather than selecting adjacent features as in conventional way. The goodness of a subset was determined using mRMR.
The proposed algorithm is a supervised filter method. As mentioned before, the proposed algorithm is used to optimize the features extracted for FER; the objective of SFHSA is to reduce the feature dimensions while maintaining or increasing the accuracy score. Prior to execution of the algorithm, each feature vector was divided into training and testing sets in the ratio of 2:1. The training set was then used to find the optimal feature subset. The algorithm used is provided in Algorithm 1.
Algorithm 1 Selection of Optimal Feature Subset using SFHSA
1: Input: Original-feature set
2: Output: Reduced-feature subset
3: User defined parameters: HMS = 15, HMCR = 0.8 and PAR = 0.5
4: Determine the worst feature subset from HM
5: while( t < m a x _ i t e r a t i o n s ) {
6:   while( i < = H M S ) {
7:  while( j < = t o t a l _ n o _ o f _ f e a t u r e s _ i n _ s u b s e t ) {
8:   Generate random value for P1 in [ 0 , 1 ]
9:   if( P 1 < H M C R ) {
10:    Choose a feature f j from the subset
11:    Generate a random value for P2 in [ 0 , 1 ]
12:    if( P 2 < P A R ) {
13:     Generate a random value for ε in [ 1 , 1 ]
14:     Randomly choose a feature f k from subset such that cosine similarity ( f j , f k ) is in ( ε , ε )
15:    }
16:   }
17:   else {
18:    Select any feature f r randomly from the original-feature set
19:   }
20:    j = j + 1
21:  }
22:  if( mRMR value of new subset >mRMR value of worst subset) {
23:   Replace worst subset with new
24:   Find the worst feature subset in the updated HM
25:  }
26:   i = i + 1
27: }
28:  t = t + 1
29: }
The parameters that we have used include:
  • Harmony Memory Size(HMS)
  • Harmony Memory Consideration Rate (HMCR)
  • Pitch Adjustment Rate (PAR)
  • Number of Iterations
The HMS determines the size of the harmony memory (HM), i.e., the number of feature subsets present in the HM. Throughout the process, HMCR was applied in order to decide whether the feature to be selected from a feature subset in the HM or not. Its value is in the range of [ 0 , 1 ] and based on experiments we have initialized it to 0.8. The parameter PAR was used (upon satisfaction of condition) to select a feature randomly whose cosine similarity measure with the pre-selected feature is within the range (−ɛ, ɛ), where ɛ is a randomly generated value of range [ 1 ,   1 ] . Its value lies in the range [ 0 ,   1 ] and we have fixed it to 0.5 on an experimental basis, in order to set the probability of PAR being satisfied equal to the probability of PAR not being satisfied. At the commencement of SFHSA, all the parameter values were initialized. The value of HMS was set to 15 in order to keep 15 feature subsets in HM and to have more diverse feature subsets for obtaining better results and the maximum number of iterations was set to 20. The initialization phase was followed by the random selection of feature subsets from the training set. In this phase, we have randomly created m feature subsets, where m = H M S . The feature subsets were considered to have a dimension ranging from 80% to 90% of the dimension n of the actual feature set, which is to be reduced in the subsequent phases. These m feature subsets were used to populate the HM initially.
To find the cosine similarity of the features, the each of the feature sets was normalized. The normalization was performed column wise for each feature present in the feature set. The normalized value, N ( x ) was calculated using Equation (10), where x is an attribute, c u r r ( x ) denotes the value of an attribute corresponding to the current instance; and m i n ( x ) and m a x ( x ) denote the minimum and maximum values of an attribute, respectively, corresponding to all the instances.
N ( x ) = c u r r ( x ) m i n ( x ) m a x ( x ) m i n ( x )
Thereafter, the features that had maximum and minimum values equal for all instances were filtered out as they were not considered to have significant contribution in the classification stage. The normalized values of the features were utilized to find out the cosine similarity [57] between each of two features from the feature set. Equation (11) measures the similarity between features p and q , where they represent any two features from a feature set, p i and q i denote the values for features p and q respectively, corresponding to i th instance and n denotes the total number of instances. The values of the cosine similarities were stored in the form of a matrix for faster computation.
s i m i l a r i t y ( p , q ) = c o s θ = p . q | | p | | q | | = i = 1 n p i q i i = 1 n p i 2 i = 1 n q i 2
For instance, let there be a feature vector F = { f 1 ,   f 2 ,   f 3 , f n }, where f 1 ,   f 2 ,   f 3 ,   f n are the features in a feature set with dimension n , then the cosine similarity matrix is represented as given in Equation (12), where c o s θ a , b is the cosine similarity between a th and b th feature and a , b n .
S n × n = ( c o s θ 1 ,   1 c o s θ 1 ,   n c o s θ n ,   1 c o s θ n ,   n )
For each feature subset, a new feature subset was created using HMCR and PAR. Either a feature is selected from the feature subset in HM (based on cosine similarity value) or a random feature is selected from the existing feature set. For example, let there be a feature subset S = { f 2 , f 8 , f 6 , f 5 , f 3 } in HM, having a combination of features from the existing feature set say F = { f 1 , f 2 , f 3 , f 4 , f 5 , f 6 ,   f 7 , f 8 , f 9 , f 10 } . If the HMCR is satisfied in an iteration, we select a feature, say f 5 . Again, if PAR is satisfied then the feature f 5 is replaced by selecting a cosine similar feature, say f 2 and removing f 5 . Another option is to select a complete random feature, say f 7 from the global feature set, if neither of the condition is satisfied. Suppose f 2 has been used to replace f 5 and for another feature say f 3 , again f 2 is found to be its cosine similar feature. Therefore, in this case both f 5 and f 3 are replaced by a single feature f 2 ; thus, reducing the feature dimension. This selection of features for replacement has been done n times to improvise the new feature subset from the existing feature subset in HM, where n is the dimension of S and the improvisation of the new feature subset carries on maximum number of iteration times. Thus, in each iteration a new improvised feature subset (with reduced or same dimension) was generated. The selection method of an optimal feature subset using SFHSA is explained in Algorithm 1 whereas the flowchart of the same is provided in Figure 5.
A decision is then made, whether the improvised subset is better in quality than the previous feature subset. For determining the quality of the feature subsets, mRMR [58] was applied. In evaluation of the feature subsets, mRMR has proved to be an efficient technique [33,59,60,61]. The concept of mRMR involves maximizing the Relevancy-R (PCC [62] between the feature and the class) and minimizing the Redundancy-D (PCC between the features in the subset). The features which were highly correlated with the class were considered to be relevant and the features those were highly correlated with each other were considered to be redundant. The calculation of PCC is performed using Equation (13), where x and y were two sets of values and x ¯ and y ¯ are their expectation values respectively. The values of both R and D were calculated using Equations (14) and (15) respectively. x i and x j represent features from the feature subset S and c represents the facial class label.
P C C ( x , y ) = ( x x ¯ ) ( y y ¯ ) ( x x ¯ ) 2 ( y y ¯ ) 2
R = 1 | S | x i S P C C ( x i , c )
D = 1 | S 2 | x i ,   x j S P C C ( x i , x j )
The value of mRMR is calculated based on finding out the quality score of the feature subset defined by V ( S ) , defined in Equation (16) as follows:
V ( S ) = R D      
During the decision-making process, if the quality score of the newly generated feature subset is found to be better than the worst subset in HM, it replaces the worst subset and again the worst subset is found out from the updated HM. The entire process is iterated until the stopping criterion is met. The finishing stage of the process includes calculating the accuracy score using a classifier and generating the results which is discussed in the next section. Therefore, the proposed HSA is a filter-based FS method based on mRMR. The use of PCC in mRMR makes the algorithm quite effective in selecting the best subsets. The extracted feature descriptors were refined using SFHSA, and the final selected features were chosen from the testing feature sets and passed through a classifier to find the recognition accuracy of the classification problem under consideration (here, FER). A schematic diagram of the proposed model is shown in Figure 6.

5. Results and Discussion

The SFHSA-based FS technique was applied to the features obtained from the images of two standard FER datasets, namely JAFFE and RaFD. The five feature sets obtained include uLBP, hvnLBP, Gabor-filter-based, HOG, PHOG features. We considered the facial images of three dimensions: 32 × 32 ,   48 × 48   and   64 × 64 . As a result, for the present FS problem, the total number of feature sets taken under consideration becomes 30 ( = 2 × 5 × 3   ), 2 FER datasets, 5 feature sets and facial images of 3 dimensions. Table 2 represents the measurement of sizes of the 5 feature vectors produced using 3 different dimensions of the facial images along with its recognition accuracy. This table also highlights the size of reduced-feature vector with recognition accuracy from two standard FER datasets.
As mentioned previously, each feature set was segregated into training and testing sets with ratio of 2:1 and SFHSA was applied on the training set only. The testing set was used to get the accuracy score by only selecting the attributes (feature indices) that were present in the reduced version of training feature set, defined as follows:
A c c u r a c y   s c o r e ( % ) = # f a c i a l   i m a g e s   s u c c e s s f u l l y   r e c o g n i z e d # t o t a l   f a i c a l   i m a g e s   i n   t h e   t e s t   s e t × 100                          
The initial value of HM consists of HMS (=15), which is randomly generated feature subsets having dimensions ranging from 80% to 90% of the original-feature vector. The detail evaluation on the reduced-feature subsets obtained as a result of using SFHSA on uLBP, hvnLBP, Gabor-filter-based, HOG, PHOG feature descriptors and their corresponding accuracy score were presented in Table 3, Table 4, Table 5, Table 6 and Table 7 respectively. We have used the sequential minimal optimization (SMO) classifier with linear kernel [33] to evaluate the recognition performance on the reduced-feature sets. This was done with an aim to achieve higher accuracy scores and also to make convenient to compare with the past experimental results. In Table 3, detailed outcomes for different FS techniques on feature sets extracted using uLBP method were provided in terms of both reduced-feature dimensions and recognition accuracy. Table 4, Table 5, Table 6 and Table 7 show the similar comparison of the different FS techniques on feature sets extracted using hvnLBP, Gabor-filter-based, HOG and PHOG feature vectors respectively.
In case of uLBP, hvnLBP, Gabor-filter-based, HOG and PHOG features, our proposed SFHSA algorithm produces reduced-feature sets which were 65%, 67%, 82%, 69% and 60% less than the original-feature vectors, respectively, and also increased the recognition accuracies up to 17%, 24%, 20%, 18% and 25%, respectively, than the original ones. Thus, it can be concluded that SFHSA demonstrates the best performance in case of PHOG features in terms of accuracy score and dimension reduction for both JAFFE and RAFD datasets. Table 6 reflects the detailed outcomes after applying different optimization techniques on PHOG features. The content of the HM was presented to provide a better understanding of reduced-feature subsets obtained at the end of execution of algorithm. This shows how the HM appears after the SFHSA was executed on feature vectors. In this regard, it is worth mentioning that for all other feature vectors, we have obtained similar content in the HM.
Table 3, Table 4, Table 5, Table 6 and Table 7 also highlight the detailed comparative results observed in the present experiment with some other standard optimization algorithms such as simulated annealing (SA), GA, memetic algorithm (MA), mutation enhanced binary particle swarm optimization (ME-BPSO) [63], whale optimization algorithm–crossover mutation (WOA-CM) [64] and LHCMA [33]. The achieved reduced-feature sets along with highest accuracy were marked in bold in the Table Analyzing the observed outcomes, it can be said that our SFHSA-based FS technique has surpassed the above mentioned techniques. It can be observed that there was a significant increment in accuracy score as compared to previous techniques. Out of 30 cases, our proposed technique achieves finer results in 16 cases as compared to all other techniques (2 cases in Gabor-filter-based feature vectors (up to 8% increment), 3 cases of HOG feature vectors (up to 2.60% increment), 4 cases of PHOG feature vectors (up to 3.60% increment), 3 cases of hvnLBP feature vectors (up to 9% increment) and in 4 cases of uLBP feature vectors (up to 2.40% increment). Thus, it is evident that our technique of feature optimization reduced feature dimensionality and improved recognition accuracy. Proposed SFHSA performs quite well against wrapper-based algorithms. This supports the effectiveness of the proposed method. It was observed from the preceding experiment that the second and third best performing algorithms (in terms of accuracy scores) were LHCMA, WAO-CM. Therefore, we have also presented the performance comparison of these techniques with our proposed SFHSA (which achieves the highest accuracy score) in terms of three well-known statistical parameters, namely precision, recall and F-measure. This is done in order to enhance the clarity of the comparison. Table 8, Table 9, Table 10, Table 11 and Table 12 present the performance comparison of SFHSA with respect to LHCMA and WAO-CM for uLBP, hvnLBP, Gabor filter, HOG and PHOG features respectively. The comparison demonstrates that SFHSA has outperformed the rest two techniques in 16 out of 30 cases, from which a very high capability of FS of our technique can be inferred.
For each feature vector, we have presented the best reduced-feature set in term of both achieved recognition accuracy and reduced dimensionality out of all the HMS (=15) feature subsets in HM. Since, we have obtained the best results for PHOG features, considering facial images of dimension 32 × 32, so through the visual presentations of Figure 7 and Figure 8, we have shown the different reduced-feature sets with recognition accuracies for both JAFFE and RAFD databases respectively. In Figure 7, different colors were used to denote the dimension (number of features) of the reduced-feature sets obtained as a result of applying SFHSA on PHOG features extracted from images of dimension 32 × 32 in JAFFE database and the corresponding recognition accuracies were presented on the top of each bar, which represents a specific reduced-feature set in HM. Similarly in Figure 8, distinguishable colors were used to specify the dimension (number of features) of the reduced-feature sets in HM obtained as a result of applying SFHSA on PHOG features extracted from images of dimension 32 × 32 in RaFD database along with corresponding recognition accuracies, represented in similar fashion as in Figure 7.

6. Conclusions

In this paper, we have focused our attention in reducing the dimension of the feature sets obtained from facial expression images extracted using five feature descriptors—uLBP, hvnLBP, Gabor filters, HOG and PHOG. The evaluation of the proposed methodology, called SFHSA, is done on two benchmark FER datasets, namely RaFD and JAFFE. The proposed algorithm was applied on the extracted feature sets and reduced-feature subsets were obtained with higher accuracy score. It is evident from the results that our FS technique has effectively filtered out redundant/irrelevant features and also outperformed many existing FS techniques like SA, GA and MA. Following the primary backbone of traditional HSA, we have proposed filter variant of the HSA. Cosine similarity was used for the purpose of adjustment of features. On the other hand, both the mRMR and PCC values were used as a way for determining the feasibility of the optimal feature subsets. The performance of the SFHSA was evaluated using SMO classifier. The comparison presented has enabled us to conclude that our algorithm can be applied for FS in domains where the curse of dimensionality puts forth a concerning challenge to the researchers.

Author Contributions

S.S. (Soumyajit Saha), M.G. and S.G. conceived and designed the experiments; S.S. (Soumyajit Saha) performed the experiments; P.K.S. and S.S. (Shibaprasad Sen) analyzed the data; R.S. contributed reagents/materials/analysis tools; S.S. (Soumyajit Saha), M.G., S.G., P.K.S., S.S. (Shibaprasad Sen) and R.S. wrote the paper; writing—review & editing, Z.W.G.; supervision, Z.W.G. and R.S.; funding acquisition, Z.W.G. All authors have read and agree to the published version of the manuscript.

Funding

This research was supported by the Energy Cloud R&D Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT (2019M3F2A1073164).

Acknowledgments

The authors are thankful to the Center for Microprocessor Application for Training Education and Research (CMATER) of Computer Science and Engineering Department, Jadavpur University, for providing infrastructure facilities during progress of the work.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Shan, C.; Gong, S.; Mcowan, P.W. Facial expression recognition based on Local Binary Patterns: A comprehensive study. Image Vis. Comput. 2009, 27, 803–816. [Google Scholar] [CrossRef] [Green Version]
  2. Ekman, P.; Rosenberg, E. What The Face Reveals: Basic and Applied Studies of Spontaneous Expression Using The Facial Action Coding Systems (FACS); Oxford University Press: New York, NY, USA, 1997. [Google Scholar]
  3. Pantic, M.; Rothkrantz, L.J.M. Automatic analysis of facial expressions: The state of the art. IEEE Trans. Pattern Anal. Mach. Intell. 2000, 22, 1424–1445. [Google Scholar] [CrossRef] [Green Version]
  4. Happy, S.L.; George, A.; Routray, A. A real time facial expression classification system using Local Binary Patterns. In Proceedings of the 4th International Conference on Intelligent Human Computer Interaction: Advancing Technology for Humanity, IHCI, Kharagpur, India, 27–29 December 2012. [Google Scholar] [CrossRef] [Green Version]
  5. Silva, L.C.D.E.; Miyasato, I.T. Facial Emotion Recognition Using Multi-modal Information. Electr. Eng. 1997, 1, 9–12. [Google Scholar]
  6. Zhang, S.; Zhao, X.; Lei, B. Facial Expression Recognition Based on Local Binary Patterns and Local Fisher Discriminant Analysis 2 Local Binary Patterns. Wseas Trans. Signal Process. 2012, 8, 21–31. [Google Scholar]
  7. Ghosh, M.; Guha, R.; Mondal, R.; Singh, P.K.; Sarkar, R.; Nasipuri, M. Feature selection using histogram-based multi-objective GA for handwritten Devanagari numeral recognition. Adv. Intell. Syst. Comput. 2018, 695, 471–479. [Google Scholar] [CrossRef]
  8. Malakar, S.; Ghosh, M.; Bhowmik, S.; Sarkar, R.; Nasipuri, M. A GA based hierarchical feature selection approach for handwritten word recognition. Neural Comput. Appl. 2019. [Google Scholar] [CrossRef]
  9. Belanche, L.A.; González, F.F. Review and Evaluation of Feature Selection Algorithms in Synthetic Problems. arXiv 2011, arXiv:1101.2320. [Google Scholar]
  10. Dash, M.; Liu, H. Feature selection for classification. Intell. Data Anal. 1997, 1, 131–156. [Google Scholar] [CrossRef]
  11. Mitra, P.; Murthy, C.A.; Pal, S.K. Unsupervised feature selection using feature similarity. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 301–312. [Google Scholar] [CrossRef]
  12. Pudil, P.; Novovičová, J.; Kittler, J. Floating search methods in feature selection. Pattern Recognit. Lett. 1994, 15, 1119–1125. [Google Scholar] [CrossRef]
  13. Sen, S.; Mitra, M.; Bhattacharyya, A.; Sarkar, R.; Schwenker, F.; Roy, K. Feature Selection for Recognition of Online Handwritten Bangla Characters. Neural Process. Lett. 2019. [Google Scholar] [CrossRef]
  14. Liwicki, M.; Bunke, H. Feature Selection for HMM and BLSTM Based Handwriting Recognition of Whiteboard Notes. Int. J. Pattern Recognit. Artif. Intell. 2009, 23, 907–923. [Google Scholar] [CrossRef]
  15. Blum, L.; Langley, P. Artificial Intelligence Selection of relevant features and examples in machine. Artif. Intell. 1997, 97, 245–271. [Google Scholar] [CrossRef] [Green Version]
  16. Guha, R.; Ghosh, M.; Singh, P.K.; Sarkar, R.; Nasipuri, M. M-HMOGA: A new multi-objective feature selection algorithm for handwritten numeral classification. J. Intell. Syst. 2020, 29, 1453–1467. [Google Scholar] [CrossRef]
  17. Kundu, S.; Paul, S.; Singh, P.K.; Sarkar, R.; Nasipuri, M. Understanding NFC-Net: A deep learning approach to word-level handwritten Indic script recognition. Neural Comput. Appl. 2019, 4. [Google Scholar] [CrossRef]
  18. Ghosh, M.; Guha, R.; Singh, P.K.; Bhateja, V.; Sarkar, R. A histogram based fuzzy ensemble technique for feature selection. EIntell 2019, 12, 713–724. [Google Scholar] [CrossRef]
  19. Das, S. Filters, wrappers and a boosting-based hybrid for feature selection. Engineering 2001, 1, 74–81. [Google Scholar]
  20. Chatterjee, I.; Ghosh, M.; Singh, P.K.; Nasipuri, M. A clustering-based feature selection framework for handwritten Indic script classification. Expert Syst. 2019, 36, e12459. [Google Scholar] [CrossRef]
  21. Hall, M.A. Correlation-Based Feature Selection for Machine Learning. Ph.D. Thesis, The University of Waikato, Hamilton, New Zeland, 1999. [Google Scholar]
  22. Ding, C.; Peng, H. Minimum redundancy feature selection from microarray gene expression data. In Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003, Stanford, CA, USA, 11–14 August 2003. [Google Scholar]
  23. Diao, R.; Shen, Q. Feature selection with harmony search. IEEE Trans. Syst. ManCybern. Part B Cybern. 2012, 42, 1509–1523. [Google Scholar] [CrossRef]
  24. Awada, W.; Khoshgoftaar, T.M.; Dittman, D.; Wald, R.; Napolitano, A. A review of the stability of feature selection techniques for bioinformatics data. In Proceedings of the 2012 IEEE 13th International Conference on Information Reuse & Integration (IRI), Las Vegas, NV, USA, 8–10 August 2012. [Google Scholar]
  25. Lajevardi, S.M.; Hussain, Z.M. Feature selection for facial expression recognition based on optimization algorithm. In Proceedings of the INDS 2009: 2nd International Workshop on Nonlinear Dynamics and Synchronization, Klagenfurt, Austria, 20–21 July 2009; pp. 182–185. [Google Scholar] [CrossRef]
  26. Guo, G.; Dyer, C.R. Simultaneous feature selection and classifier training via linear programming: A case study for face expression recognition. In Proceedings of the 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2003, Madison, WI, USA, 18–20 June 2003; Volume 1. [Google Scholar] [CrossRef]
  27. Gharsalli, S.; Emile, B.; Laurent, H.; Desquesnes, X. Feature Selection for Emotion Recognition based on Random Forest. Visigrapp 2016, 4, 610–617. [Google Scholar] [CrossRef] [Green Version]
  28. Li, P.; Phung, S.L.; Bouzerdom, A.; Tivive, F.H.C. Feature Selection for Facial Expression Recognition. In Proceedings of the 2010 2nd European Workshop on Visual Information Processing (EUVIP), Paris, France, 5–6 July 2010; pp. 35–40. [Google Scholar] [CrossRef] [Green Version]
  29. Lajevardi, S.M.; Hussain, Z.M. Automatic facial expression recognition: Feature extraction and selection. SignalImage Video Process. 2012, 6, 159–169. [Google Scholar] [CrossRef]
  30. Lyons, M.; Akamatsu, S.; Kamachi, M.; Gyoba, J. Coding facial expressions with Gabor wavelets. In Proceedings of the 3rd IEEE International Conference on Automatic Face and Gesture Recognition, FG, Nara, Japan, 14–16 April 1998; pp. 200–205. [Google Scholar] [CrossRef] [Green Version]
  31. Langner, O.; Dotsch, R.; Bijlstra, G.; Wigboldus, D.H.J.; Hawk, S.T.; van Knippenberg, A. Presentation and validation of the radboud faces database. Cogn. Emot. 2010, 24, 1377–1388. [Google Scholar] [CrossRef]
  32. Wang, G.; Yang, Y.; Kong, H. Self-Learning facial emotional feature selection based on rough set theory. Math. Probl. Eng. 2009. [Google Scholar] [CrossRef]
  33. Ghosh, M.; Kundu, T.; Ghosh, D.; Sarkar, R. Feature selection for facial emotion recognition using late hill-climbing based memetic algorithm. Multimed. Tools Appl. 2019, 78, 25753–25779. [Google Scholar] [CrossRef]
  34. Mistry, L.; Zhang, S.; Neoh, C.; Lim, P.; Fielding, B. A Micro-GA Embedded PSO Feature Selection Approach to Intelligent Facial Emotion Recognition. IEEE Trans. Cybern. 2017, 47, 1496–1509. [Google Scholar] [CrossRef] [Green Version]
  35. Mlakar, U.; Fister, I.; Brest, J.; Potočnik, B. Multi-Objective Differential Evolution for feature selection in Facial Expression Recognition systems. Expert Syst. Appl. 2017, 89, 129–137. [Google Scholar] [CrossRef]
  36. Das, S.; Singh, P.K.; Bhowmik, S.; Sarkar, R.; Nasipuri, M. A Harmony Search Based Wrapper Feature Selection Method for Holistic Bangla Word Recognition. arXiv 2017, arXiv:1707.08398. [Google Scholar] [CrossRef] [Green Version]
  37. Sarkar, S.; Ghosh, M.; Chatterjee, A.; Malakar, S.; Sarkar, R. An advanced particle swarm optimization based feature selection method for tri-script handwritten digit recognition. In Proceedings of the International conference on computational intelligence, communications, and business analytics, Kalyani, India, 27–28 July 2018; pp. 82–94. [Google Scholar]
  38. Wang, Y.; Liu, Y.; Feng, L.; Zhu, X. Novel feature selection method based on harmony search for email classification. Knowl. Based Syst. 2015, 73, 311–323. [Google Scholar] [CrossRef]
  39. Zainuddin, Z.; Lai, K.H.; Ong, P. An enhanced harmony search based algorithm for feature selection: Applications in epileptic seizure detection and prediction. Comput. Electr. Eng. 2016, 53, 143–162. [Google Scholar] [CrossRef]
  40. Bagyamathi, M.; Inbarani, H.H. A Novel Hybridized Rough Set and Improved Harmony Search Based Feature Selection for Protein Sequence Classification. In Big Data in Complex System; Springer: Berlin, Germany, 2015; pp. 173–204. [Google Scholar]
  41. Wang, Y.; Perez, L. The Effectiveness of Data Augmentation in Image Classification using Deep Learning. arXiv 2017, arXiv:1712.04621. [Google Scholar]
  42. Viola, P.; Jones, M.J. Robust Real-Time Face Detection. Int. J. Comput. Vis. 2004, 57, 137–154. [Google Scholar] [CrossRef]
  43. Ghosh, S.; Bhowmik, S.; Ghosh, K.; Sarkar, R.; Chakraborty, S. A filter ensemble feature selection method for handwritten numeral recognition. EMR 2019, 2016, 007213. [Google Scholar]
  44. Bosch, A.; Zisserman, A.; Munoz, X. Representing shape with a spatial pyramid kernel. In Proceedings of the 6th ACM International Conference on Image and Video Retrieval, CIVR 2007, Amsterdam, The Netherlands, 8 July 2007; pp. 401–408. [Google Scholar] [CrossRef]
  45. Li, Z.; Imai, J.I.; Kaneko, M. Facial-component-based bag of words and PHOG descriptor for facial expression recognition. In Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, Toronto, ON, Canada, 7–10 October 2009; pp. 1353–1358. [Google Scholar] [CrossRef]
  46. Ali, M.; Clausi, D. Using The Canny Edge Detector for Feature Extraction and Enhancement of Remote Sensing Images. In Proceedings of the IEEE 2001 International Geoscience and Remote Sensing Symposium, Sydney, Australia, 3–13 July 2001; pp. 2298–2300. [Google Scholar]
  47. Jana, P.; Ghosh, S.; Sarkar, R.; Nasipuri, M. A Fuzzy C-Means Based Approach Towards Efficient Document Image Binarization. In Proceedings of the Ninth International Conference on Advances in Pattern Recognition, ICAPR 2017, Bangalore, India, 27–30 December 2017; pp. 1–6. [Google Scholar]
  48. Jain, K.; Farrokhnia, F. Unsupervised texture segmentation using Gabor filters. Pattern Recognit. 1991, 24, 1167–1186. [Google Scholar] [CrossRef] [Green Version]
  49. Liu, X.; Wechsler, H. Gabor feature based classification using the enhanced Fisher linear discriminant model for face recognition. IEEE Trans. Image Process. 2002, 11, 467–476. [Google Scholar] [CrossRef] [Green Version]
  50. Ou, J.; Bai, X.-B.; Pei, Y.; Ma, L.; Liu, W. Automatic Facial Expression Recognition Using Gabor Filter and Expression Analysis. In Proceedings of the 2010 Second International Conference on Computer Modeling and Simulation, Sanyan, China, 22–24 January 2010; pp. 215–218. [Google Scholar] [CrossRef]
  51. Ojala, T.; Pietikäinen, M.; Mäenpää, T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 971–987. [Google Scholar] [CrossRef]
  52. Kim, J.H. Harmony Search Algorithm: A Unique Music-inspired Algorithm. Procedia Eng. 2016, 154, 1401–1405. [Google Scholar] [CrossRef] [Green Version]
  53. Lee, K.S.; Geem, Z.W. A new structural optimization method based on the harmony search algorithm. Comput. Struct. 2004, 82, 781–798. [Google Scholar] [CrossRef]
  54. Geem, Z.W.; Kim, J.H.; Loganathan, G.V. A New Heuristic Optimization Algorithm: Harmony Search. Optimization 2001, 35–54. [Google Scholar] [CrossRef]
  55. Geem, Z.W. Optimal cost design of water distribution networks using harmony search. Eng. Optim. 2006, 38, 259–277. [Google Scholar] [CrossRef]
  56. Mahdavi, M.; Fesanghary, M.; Damangir, E. An improved harmony search algorithm for solving optimization problems. Appl. Math. Comput. 2007, 188, 1567–1579. [Google Scholar] [CrossRef]
  57. Pratap, V.; Tomar, S.; Dwivedi, D.; Gwalior, M. Ansys Modelling and Simulation of Temperature. Int. J. Adv. Eng. Res. Dev. 2015, 2015, 1–4. [Google Scholar]
  58. Peng, H.; Long, F.; Ding, C. Multi-label feature selection based on mutual information. In Proceedings of the ICNC-FSKD 2018—14th International Conference on Natural Computing Fuzzy Systems Knowledge Discovery, Huangshan, China, 28–30 July 2018; pp. 1379–1386. [Google Scholar] [CrossRef]
  59. Radovic, M.; Ghalwash, M.; Filipovic, N.; Obradovic, Z. Minimum redundancy maximum relevance feature selection approach for temporal gene expression data. BMC Bioinform. 2017, 18, 1–14. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  60. Sakar, C.O.; Kursun, O.; Gurgen, F. A feature selection method based on kernel canonical correlation analysis and the minimum Redundancy-Maximum Relevance filter method. Expert Syst. Appl. 2012, 39, 3432–3437. [Google Scholar] [CrossRef]
  61. Senawi, A.; Wei, H.L.; Billings, S.A. A new maximum relevance-minimum multicollinearity (MRmMC) method for feature selection and ranking. Pattern Recognit. 2017, 67, 47–61. [Google Scholar] [CrossRef]
  62. Pearson’s Correlation Coefficient Definition. In Encyclopedia of Public Health; Springer: Berlin, Germany, 2008; p. 1172. [CrossRef]
  63. Wei, J.; Zhang, R.; Yu, Z.; Hu, R.; Tang, J.; Gui, C.; Yuan, Y. A BPSO-SVM algorithm based on memory renewal and enhanced mutation mechanisms for feature selection. Appl. Soft Comput. J. 2017, 58, 176–192. [Google Scholar] [CrossRef]
  64. Mafarja, M.; Mirjalili, S. Whale optimization approaches for wrapper feature selection. Appl. Soft Comput. J. 2018, 62, 441–453. [Google Scholar] [CrossRef]
Figure 1. Images taken from the Japanese female facial expression (JAFFE) dataset showing various emotions: (a) anger, (b) disgust, (c) happy, (d) fear, (e) sad, (f) surprise and (g) neutral.
Figure 1. Images taken from the Japanese female facial expression (JAFFE) dataset showing various emotions: (a) anger, (b) disgust, (c) happy, (d) fear, (e) sad, (f) surprise and (g) neutral.
Applsci 10 02816 g001
Figure 2. Sample images taken from Radboud faces database (RaFD) dataset displaying various emotions (a) anger, (b) disgust, (c) happy, (d) fear, (e) neutral, (f) sad, (g) surprise and (h) contempt.
Figure 2. Sample images taken from Radboud faces database (RaFD) dataset displaying various emotions (a) anger, (b) disgust, (c) happy, (d) fear, (e) neutral, (f) sad, (g) surprise and (h) contempt.
Applsci 10 02816 g002
Figure 3. Clockwise direction gives (01110100)2 = (116)10 as the center pixel after local binary pattern (LBP) computation.
Figure 3. Clockwise direction gives (01110100)2 = (116)10 as the center pixel after local binary pattern (LBP) computation.
Applsci 10 02816 g003
Figure 4. Calculation of horizontal–vertical neighborhood local binary pattern (hvnLBP) feature descriptor where clockwise direction gives (00100111)2 = (39)10 as the center pixel after evaluation.
Figure 4. Calculation of horizontal–vertical neighborhood local binary pattern (hvnLBP) feature descriptor where clockwise direction gives (00100111)2 = (39)10 as the center pixel after evaluation.
Applsci 10 02816 g004
Figure 5. Diagrammatic representation of the feature selection process using supervised filter harmony search algorithm (SFHSA).
Figure 5. Diagrammatic representation of the feature selection process using supervised filter harmony search algorithm (SFHSA).
Applsci 10 02816 g005
Figure 6. Schematic diagram of the proposed feature selection model called SFHSA developed for classification of facial emotions.
Figure 6. Schematic diagram of the proposed feature selection model called SFHSA developed for classification of facial emotions.
Applsci 10 02816 g006
Figure 7. Graphical representation of the dimensions and accuracy scores (%) of the reduced-feature subsets in HM for PHOG-based feature vector extracted from images of dimension 32 × 32 in the JAFFE dataset. Distinguishable colors denote different feature dimensions of the reduced-feature subsets.
Figure 7. Graphical representation of the dimensions and accuracy scores (%) of the reduced-feature subsets in HM for PHOG-based feature vector extracted from images of dimension 32 × 32 in the JAFFE dataset. Distinguishable colors denote different feature dimensions of the reduced-feature subsets.
Applsci 10 02816 g007
Figure 8. Graphical representation of the dimensions and accuracy scores (%) of the reduced-feature subsets in HM for PHOG-based feature vector extracted from images of dimension 32 × 32 in RaFD dataset, where distinguishable colors denote different feature dimensions of the reduced-feature subsets.
Figure 8. Graphical representation of the dimensions and accuracy scores (%) of the reduced-feature subsets in HM for PHOG-based feature vector extracted from images of dimension 32 × 32 in RaFD dataset, where distinguishable colors denote different feature dimensions of the reduced-feature subsets.
Applsci 10 02816 g008
Table 1. Estimation of feature vector size of Gabor filter bank (applied in 8 different orientations and 5 distinct scales) for various image dimensions with a down sampling factor of 8.
Table 1. Estimation of feature vector size of Gabor filter bank (applied in 8 different orientations and 5 distinct scales) for various image dimensions with a down sampling factor of 8.
Image DimensionFeature DimensionFinal Feature Dimension after down Sampling
32 × 32 40,960   ( = 40 × 32 × 32 ) 640 ( = 40,960 8 × 8   )
48 × 48 92,160   ( = 40 × 48 × 48 ) 1440 ( = 92,160 8 × 8 )
64 × 64 163,840   ( = 40 × 64 × 64 ) 2560 ( = 163,840 8 × 8   )
Table 2. Tabular representation of the details of original-feature size, reduced-feature size, accuracy score and their corresponding percentage of reduced-feature size with respect to original-feature size and change in accuracy score of 5 different feature vectors extracted from facial images of 3 different dimensions for two popular FER datasets.
Table 2. Tabular representation of the details of original-feature size, reduced-feature size, accuracy score and their corresponding percentage of reduced-feature size with respect to original-feature size and change in accuracy score of 5 different feature vectors extracted from facial images of 3 different dimensions for two popular FER datasets.
Feature DescriptorDatasetDimension of Facial ImagesSize of the Original-Feature VectorAccuracy Score (%)Size of the Optimal Feature Vector Obtained by SFHSA [Reduction in %]Accuracy Score (%)[Change in %]
uLBPJAFFE32 × 3294462.34570[60.38%]78.95[+16.61%]
48 × 4862.34339[35.91%]75.32[+12.98%]
64 × 6459.74541[57.31%]74.13[+14.39%]
RaFD32 × 3283.58608[64.41%]87.75[+4.17%]
48 × 4888.62445[47.14%]85.48[−3.14%]
64 × 6486.38573[60.70%]92.16[+5.78%]
hvnLBPJAFFE32 × 32409657.141380[33.69%]67.41[+10.27%]
48 × 4846.751416[34.57%]55.84[+12.09%]
64 × 6444.161433[34.99%]67.42[+23.26%]
RaFD32 × 3266.421513[36.94%]72.39[+5.97%]
48 × 4874.071493[36.45%]75.74[+1.37%]
64 × 6469.41494[36.47%]75.81[+6.41%]
GaborJAFFE32 × 3264067.53197[30.78%]81.82[+14.29%]
48 × 48144072.73560[38.89%]92.21[+19.48%]
64 × 64256071.43818[31.95%]90.91[+19.48%]
RaFD32 × 3264090.49241[37.66%]91.91[+1.42%]
48 × 48144095.71341[23.68%]96.51[+0.8%]
64 × 64256098.51462[18.04%]97.79[−0.72%]
HOGJAFFE32 × 3232471.43189[58.33%]87.94[+16.51%]
48 × 4890074.03403[44.78%]92.21[+18.18%]
64 × 64176471.431411[79.99%]85.71[+14.28%]
RaFD32 × 3232488.43205[63.27%]89.15[+0.72%]
48 × 4890094.22385[42.78%]95.40[+1.18%]
64 × 64176493.66544[30.83%]96.32[+2.66%]
PHOGJAFFE32 × 3268053.25321[47.21%]76.32[+20.07%]
48 × 4866.23408[60.00%]85.27[+19.04%]
64 × 6459.74409[60.15%]84.39[+24.65%]
RaFD32 × 3278.54429[63.09%]85.01[+6.17%]
48 × 4885.45271[39.85%]89.03[+3.58%]
64 × 6488.81300[44.12%]88.30[−0.51%]
Table 3. Performance of SFHSA with respect to No FS, SA, GA, MA, ME-BPSO, WAO-CM and LHCMA for uLBP features.
Table 3. Performance of SFHSA with respect to No FS, SA, GA, MA, ME-BPSO, WAO-CM and LHCMA for uLBP features.
DatasetImage SizeNo FSSAGAMA
Feature DimensionAccuracy Score (%)Feature DimensionAccuracy Score (%)Feature DimensionAccuracy Score (%)Feature DimensionAccuracy Score (%)
JAFFE32 × 3294462.3448137.6666171.4354276.62
48 × 4862.3448745.4567879.2267276.62
64 × 6459.7449345.4561170.1370371.43
RAFD32 × 3283.5848569.4079186.3871686.75
48 × 4888.6244178.7369692.1669691.47
64 × 6486.3848776.4967790.1155491.42
DatasetME-BPSOWAO-CMLHCMASFHSA
Feature DimensionAccuracy Score (%)Feature DimensionAccuracy Score (%)Feature DimensionAccuracy Score (%)Feature DimensionAccuracy Score (%)
JAFFE57664.9391161.0459476.6257078.95
47262.3467074.0357476.6233975.32
60372.3486963.6457074.0354174.13
RAFD62078.1791582.2860087.1360887.75
53384.8988392.7255291.6144585.48
63883.4082186.0155590.3057392.16
Table 4. Performance of SFHSA with respect to No FS, SA, GA, MA, ME-BPSO, WAO-CM and LHCMA for hvnLBP features.
Table 4. Performance of SFHSA with respect to No FS, SA, GA, MA, ME-BPSO, WAO-CM and LHCMA for hvnLBP features.
DatasetImage SizeNo FSSAGAMA
Feature DimensionAccuracy Score (%)Feature DimensionAccuracy Score (%)Feature DimensionAccuracy Score (%)Feature DimensionAccuracy Score (%)
JAFFE32 × 32409657.14205951.95261370.13223270.13
48 × 4846.75201142.86228461.04245158.44
64 × 6444.16213238.96225457.14220858.44
RAFD32 × 3266.42202461.19272170.15208172.01
48 × 4874.07204968.10258075.00258476.49
64 × 6469.4202664.18245772.76209074.44
DatasetME-BPSOWAO-CMLHCMASFHSA
Feature DimensionAccuracy Score (%)Feature DimensionAccuracy Score (%)Feature DimensionAccuracy Score (%)Feature DimensionAccuracy Score (%)
JAFFE211861.04389461.04223572.73138067.41
197550.65292151.95215863.64141655.84
259550.65146954.55206058.44143367.42
RAFD275866.98377270.71221170.34151372.39
261572.57348977.61227975.19149375.74
258469.22391475.56238373.69149475.81
Table 5. Performance of SFHSA with respect to No FS, SA, GA, MA, ME-BPSO, WAO-CM and LHCMA for Gabor filter-based features.
Table 5. Performance of SFHSA with respect to No FS, SA, GA, MA, ME-BPSO, WAO-CM and LHCMA for Gabor filter-based features.
DatasetImage SizeNo FSSAGAMA
Feature DimensionAccuracy Score (%)Feature DimensionAccuracy Score (%)Feature DimensionAccuracy Score (%)Feature DimensionAccuracy Score (%)
JAFFE32 × 3264067.5332366.2337579.2237784.42
48 × 48144072.7370468.8391080.5283684.42
64 × 64256071.43129371.43154181.82140883.12
RAFD32 × 3264090.4932083.2140093.2842994.03
48 × 48144095.7168391.6085198.3289498.88
64 × 64256098.51133395.90161398.32141498.75
DatasetME-BPSOWAO-CMLHCMASFHSA
Feature DimensionAccuracy Score (%)Feature DimensionAccuracy Score (%)Feature DimensionAccuracy Score (%)Feature DimensionAccuracy Score (%)
JAFFE30184.0321779.2231984.4219781.82
70184.4263083.1276783.1256092.21
155783.1241989.61142883.1281890.91
RAFD34492.3655793.7433794.5924191.91
77095.52107496.4675898.8834196.51
130098.13118697.01127199.2546297.79
Table 6. Performance of SFHSA with respect to No FS, SA, GA, MA, ME-BPSO, WAO-CM and LHCMA for HOG features.
Table 6. Performance of SFHSA with respect to No FS, SA, GA, MA, ME-BPSO, WAO-CM and LHCMA for HOG features.
DatasetImage SizeNo FSSAGAMA
Feature DimensionAccuracy Score (%)Feature DimensionAccuracy Score (%)Feature DimensionAccuracy Score (%)Feature DimensionAccuracy Score (%)
JAFFE32 × 3232471.4317870.1319585.7120683.12
48 × 4890074.0344467.5353089.6150787.01
64 × 64176471.4388758.44109780.52110583.12
RAFD32 × 3232488.4314385.7418692.1616792.16
48 × 4890094.2253891.5448097.0139097.01
64 × 64176493.6681692.6686796.2767596.27
DatasetME-BPSOWAO-CMLHCMASFHSA
Feature DimensionAccuracy Score (%)Feature DimensionAccuracy Score (%)Feature DimensionAccuracy Score (%)Feature DimensionAccuracy Score (%)
JAFFE19582.3228181.0318287.0118987.94
44085.3237188.2248289.6140392.21
92382.12144681.62100883.12141185.71
RAFD21185.0729586.9416092.3520589.15
53093.9160593.4745597.2038595.40
103995.15104194.9680097.5754496.32
Table 7. Performance of SFHSA with respect to No FS, SA, GA, MA, ME-BPSO, WAO-CM and LHCMA for PHOG features.
Table 7. Performance of SFHSA with respect to No FS, SA, GA, MA, ME-BPSO, WAO-CM and LHCMA for PHOG features.
DatasetImage SizeNo FSSAGAMA
Feature DimensionAccuracy Score (%)Feature DimensionAccuracy Score (%)Feature DimensionAccuracy Score (%)Feature DimensionAccuracy Score (%)
JAFFE32 × 3268053.2535746.7541964.9437468.83
48 × 4866.2334458.4439681.8240580.52
64 × 6459.7434262.3442379.2241279.22
RAFD32 × 3278.5435175.1948982.4641684.14
48 × 4885.4535484.8939890.4934491.98
64 × 64 88.8136687.8741191.2336493.84
DatasetME-BPSOWAO-CMLHCMASFHSA
Feature DimensionAccuracy Score (%)Feature DimensionAccuracy Score (%)Feature DimensionAccuracy Score (%)Feature DimensionAccuracy Score (%)
JAFFE36163.6463359.7435972.7332176.32
44071.4344167.5337380.5240885.27
33474.0317871.4337481.8240984.39
RAFD42078.9249180.4139683.2142985.01
39187.8754190.1133191.0427189.03
35290.6750392.3539593.1030088.30
Table 8. Performance of SFHSA with respect to WAO-CM and LHCMA for uLBP features in terms of Precision, Recall and F-measure.
Table 8. Performance of SFHSA with respect to WAO-CM and LHCMA for uLBP features in terms of Precision, Recall and F-measure.
WAO-CMLHCMASFHSA
DatasetImage SizePrecisionRecallF-MeasurePRECISIONRecallF-MeasurePrecisionRecallF-Measure
JAFFE 32 × 320.6130.6100.6110.7670.7660.7660.7970.7890.791
48 × 480.7380.7400.7390.7650.7660.7650.7570.7530.753
64 × 640.6340.6360.6350.7430.7400.7410.7500.7410.742
RaFD 32 × 320.8260.8230.8250.8700.8710.8710.8750.8780.877
48 × 480.9300.9270.9280.9180.9160.9170.8530.8550.853
64 × 640.8570.8600.8590.9060.9030.9050.9280.9210.923
Table 9. Performance of SFHSA with respect to WAO-CM and LHCMA for hvnLBP features in terms of Precision, Recall and F-measure.
Table 9. Performance of SFHSA with respect to WAO-CM and LHCMA for hvnLBP features in terms of Precision, Recall and F-measure.
WAO-CMLHCMASFHSA
DatasetImage SizePrecisionRecallF-MeasurePrecisionRecallF-MeasurePrecisionRecallF-Measure
JAFFE 32 × 320.6090.6100.6090.7250.7270.7270.6800.6740.675
48 × 480.5230.5200.5210.6350.6360.6350.5610.5580.559
64 × 640.5480.5460.5460.5870.5840.5850.6670.6740.672
RaFD 32 × 320.7100.7070.7090.7080.7030.7060.7210.7240.723
48 × 480.7730.7760.7750.7480.7520.7490.7660.7570.760
64 × 640.7570.7560.7560.7400.7370.7370.7570.7580.757
Table 10. Performance of SFHSA with respect to WAO-CM and LHCMA for Gabor-based features in terms of Precision, Recall and F-measure.
Table 10. Performance of SFHSA with respect to WAO-CM and LHCMA for Gabor-based features in terms of Precision, Recall and F-measure.
WAO-CMLHCMA SFHSA
Dataset Image SizePrecisionRecallF-MeasurePrecisionRecallF-MeasurePrecisionRecallF-Measure
JAFFE 32 × 320.7970.7920.7940.8470.8440.8450.8210.8180.819
48 × 480.8270.8310.8310.8360.8310.8330.9270.9220.924
64 × 640.8980.8960.8970.8270.8310.8300.9130.9090.910
RaFD32 × 320.9390.9370.9380.9490.9460.9460.9230.9190.920
48 × 480.9710.9650.9680.9910.9890.9890.9730.9650.966
64 × 640.9680.9700.9690.9890.9920.9900.9860.9780.982
Table 11. Performance of SFHSA with respect to WAO-CM and LHCMA for HOG features in terms of Precision, Recall and F-measure.
Table 11. Performance of SFHSA with respect to WAO-CM and LHCMA for HOG features in terms of Precision, Recall and F-measure.
WAO-CM LHCMA SFHSA
Dataset Image SizePrecisionRecallF-MeasurePrecisionRecallF-MeasurePrecisionRecallF-Measure
JAFFE32 × 320.8080.8100.8090.8640.8700.8680.8750.8790.876
48 × 480.8860.8820.8830.8980.8960.8970.9290.9220.924
64 × 640.8170.8160.8160.8350.8310.8320.8630.8570.859
RaFD32 × 320.8650.8690.8680.9260.9240.9240.8940.8910.892
48 × 480.9380.9350.9370.9700.9720.9710.9600.9540.956
64 × 640.9530.9500.9510.9730.9760.9740.9610.9630.962
Table 12. Performance of SFHSA with respect to WAO-CM and LHCMA for PHOG features in terms of Precision, Recall and F-measure.
Table 12. Performance of SFHSA with respect to WAO-CM and LHCMA for PHOG features in terms of Precision, Recall and F-measure.
WAO-CM LHCMA SFHSA
Dataset Image SizePrecisionRecallF-MeasurePrecisionRecallF-MeasurePrecisionRecallF-Measure
JAFFE32 × 320.6010.5970.5990.7250.7270.7260.7620.7630.762
48 × 480.6720.6750.6740.8020.8050.8030.8560.8530.853
64 × 640.7180.7140.7160.8200.8180.8180.8450.8440.844
RaFD32 × 320.8060.8040.8040.8350.8320.8340.8610.8500.851
48 × 480.9000.9010.9000.9120.9100.9110.8940.8900.892
64 × 640.9270.9240.9250.9370.9310.9330.8850.8830.883

Share and Cite

MDPI and ACS Style

Saha, S.; Ghosh, M.; Ghosh, S.; Sen, S.; Singh, P.K.; Geem, Z.W.; Sarkar, R. Feature Selection for Facial Emotion Recognition Using Cosine Similarity-Based Harmony Search Algorithm. Appl. Sci. 2020, 10, 2816. https://doi.org/10.3390/app10082816

AMA Style

Saha S, Ghosh M, Ghosh S, Sen S, Singh PK, Geem ZW, Sarkar R. Feature Selection for Facial Emotion Recognition Using Cosine Similarity-Based Harmony Search Algorithm. Applied Sciences. 2020; 10(8):2816. https://doi.org/10.3390/app10082816

Chicago/Turabian Style

Saha, Soumyajit, Manosij Ghosh, Soulib Ghosh, Shibaprasad Sen, Pawan Kumar Singh, Zong Woo Geem, and Ram Sarkar. 2020. "Feature Selection for Facial Emotion Recognition Using Cosine Similarity-Based Harmony Search Algorithm" Applied Sciences 10, no. 8: 2816. https://doi.org/10.3390/app10082816

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop