Abstract

Face recognition (FR) is a technique for recognizing individuals through the use of face photographs. The FR technology is widely applicable in a variety of fields, including security, biometrics, authentication, law enforcement, smart cards, and surveillance. Recent advances in deep learning (DL) models, particularly convolutional neural networks (CNNs), have demonstrated promising results in the field of FR. CNN models that have been pretrained can be utilized to extract characteristics for effective FR. In this regard, this research introduces the GWOECN-FR approach, a unique grey wolf optimization with an enhanced capsule network-based deep transfer learning model for real-time face recognition. The proposed GWOECN-FR approach is primarily concerned with reliably and rapidly recognizing faces in input photos. Additionally, the GWOECN-FR approach is preprocessed in two steps, namely, data augmentation and noise reduction by bilateral filtering (BF). Additionally, for feature vector extraction, an expanded capsule network (ECN) model can be used. Additionally, grey wolf optimization (GWO) combined with a stacked autoencoder (SAE) model is used to identify and classify faces in images. The GWO algorithm is used to optimize the SAE model’s weight and bias settings. The GWOECN-FR technique’s performance is validated using a benchmark dataset, and the results are analyzed in a variety of aspects. The GWOECN-FR approach achieved a TST of 0.03 s on the FEI dataset, whereas the AlexNet-SVM, ResNet-SVM, and AlexNet models achieved TSTs of 0.125 s, 0.0051 s, and 0.0062 s, respectively. The experimental results established that the GWOECN-FR technology outperformed more contemporary approaches.

1. Introduction

The face recognition (FR) system is commonly presented with essential facial features including the eyes, nose, and mouth, that is, a nonoccluded face [1]. But a wide variety of circumstances and situations impose people to wear masks on faces that are partially occluded or hidden. This common situation includes laboratories, pandemics, immoderate pollution, or medical operations [2]. For example, as per the WHO and Centers for Disease Control and Prevention (CDC), the better method for protecting people from the COVID-19 virus and being infected or preventing the spreading of the disease is practicing social distancing and wearing face masks. Therefore, each country in the world requires people to wear a protective face mask in common places that address the need to understand and investigate FR systems performed with face masks. But performing this safety guiding principle earnestly challenges the current authentication and security schemes that depend on FR that has already been implemented [3]. Techniques that are at the forefront of lighting and pose research are utilized often all around the world to solve illumination and posture issues. On the basis of this exploration, we are presenting a survey paper that comprises as much literature study as possible for the reader to understand what exactly the variations are that can be caused by variation in illumination and pose, as well as what approaches have been taken up until now to make continuous improvements in existing systems. Mostly, current approaches have been presented to determine whether a face is occluded or not, that is, masked face detection. Even though saving an individual’s life is effective, there is a crucial requirement for authenticating persons wearing masks without requiring to expose them. For example, premise access control and immigration points are among various locations in which a subject makes a cooperative presentation to a camera that increases the challenge of face detection since the occluded part is essential for FR and detection [4, 5].

The FR technique comprises four phases that include face alignment, representation, classification (facial feature extraction), and detection [6]. In the FR method, the major problem is the feature representation system utilized for extracting features, with the best approach for a given biometric trait, for representation. Feature extraction is the fundamental step for image classification. Feature extraction means preserving the crucial data that is needed for classification. There are several feature extraction processes that were introduced for usage in a biometric scheme, involving independent component analysis (ICA), principal component analysis (PCA), the histogram method, and local binary patterns (LBP) [7, 8]. In recent times, the convolutional neural network (CNN) displays significant benefits.

Currently, with the development of the DL method, the face detection method accomplishes remarkable outcomes. The CNN method, more commonly known as the deep neural network in computer vision applications, proves a significant benefit of automated visual feature extraction [9]. There are two types of approaches for training CNN for face detection systems, namely, metric learning and classification layer. There are distinct methods for utilizing CNN. Initially, learn the model from scratch. In such cases, the framework of the pretrained method is trained and utilized based on the dataset. Next, transfer learning (TL) uses features from pretrained CNN, where the dataset is larger. At last, CNN is utilized by the TL method to keep the convolution base in its original version and utilize output for feeding the classifier. The pretrained method is utilized as a fixed feature extraction method where the dataset is smaller [10].

This study designs an effective grey wolf optimization with an enhanced capsule network-based deep transfer learning model for real-time face recognition, named the GWOECN-FR technique. The proposed GWOECN-FR technique performs data augmentation and bilateral filtering- (BF-) based noise elimination at the preprocessing step. In addition, the enhanced capsule network (ECN) model can be utilized for extracting feature vectors. In the current investigation, enhanced capsule networks (ECN) are utilized in order to construct an appropriate diagnostic system. The fragmented pixel set of the X-ray image is referred to as a set of nerve cells in ECN. The capsule is the focus of this analysis. For the purpose of the X-ray image, the system utilized a pixel vector as an actuation vector. This pixel vector was encircled by an active capsule and may represent a certain category, such as healthy or COVID-19. Within this example, the capsule output and the coupling coefficient have been multiplied by the capsule routing within a layer. The value of the coupling coefficient is determined by the resistance of the parent capsule when it comes to routing. The low-level COVID-19 diagnosis has been determined based on the routing-by-agreement method, and the high-level capsule activation was responsible for making this determination. Furthermore, grey wolf optimization (GWO) with a stacked autoencoder (SAE) model is applied for the identification and classification of faces where the weight and bias values of the SAE model are chosen by the GWO algorithm. Mirjalili et al. developed the grey wolf optimizer in 2014 as a revolutionary heuristic swarm intelligence optimization technique. As the apex predator in the food chain, the wolf has a remarkable ability to capture prey. GWO is a novel metaheuristic optimization tool. Its guiding premise is to mimic the cooperative hunting behavior of grey wolves in nature. In terms of model structure, GWO differs from others. In order to showcase the supremacy of the GWOECN-FR technique, a comprehensive result analysis is carried out on a benchmark dataset.

Gwyn et al. [1] provided a comprehensive review of many advanced DL-based facial detection techniques to define accuracy and other metrics that are very efficient. In this work, VGG-16 and VGG-19 show the maximum level of image detection performance and F1 score. Zhu and Jiang [11] aimed to enhance the current face detection method, studied the face detection approach driven by big data, and presented a DL multifeature fusion face detection method driven by big data. The study employs the LBP (local binary pattern) approach for extracting the texture feature of the face, and that is incorporated with the global feature extracted through 2DPCA to multifeature fusion; thus, the fused feature could consider local and global features, which have good detection performance.

Al-Waisy et al. [12] proposed an architecture based on merging the advantage of the local handcrafted feature descriptor with the DBN for addressing the face detection problem in unrestrained conditions. Initially, a novel multimodal local feature extraction method based on merging the advantage of the curvelet transform with the fractal dimension has been presented and called the curvelet-fractal method. Mao et al. [13] used the differentially private method to enable the privacy-preserving edge-based training of the DNN face detection method. In the training process, DNN has split among the edge server and the user device so that model parameters and private data are secured, with a smaller cost of local computation.

Sharma and Kumar [14] developed a three-dimensional face reconstruction and a sequential DL-based architecture for face detection. It employs the reflection principle to generate the reconstructed points in 3D with the midface plane. From the reconstructed face, a sequential DL architecture is designed for recognizing the person, gender, emotion, and occlusion. The presented method uses the concept of triplet loss training, VAE, and BiLSMT. Anand et al. [15] focused on using one of the advanced ML methods in face detection to achieve maximum performance. Then, we generated our own dataset and trained it on the GoogLeNet (inception) DL method with the Caffe and Nvidia DIGITS architecture.

3. The Proposed Model

In this study, a new GWOECN-FR technique has been developed for the rapid and prompt identification of facial images. The presented GWOECN-FR technique initially performs data augmentation and BF-based noise elimination to preprocess the facial input images. Afterward, the ECN model is applied to produce a useful set of feature vectors. Besides, the SAE model is employed to recognize the faces and the performance of the SAE model can be improved by adjusting its parameters using the GWO algorithm. Figure 1 illustrates the overall process of the GWOECN-FR technique.

3.1. Data Augmentation and Noise Removal

Since DL models require a large number of training instances, data augmentation becomes essential. In this work, geometric transformation, filtering, and brightness operations are performed to augment the images. Image rotation, zooming, translation, brightness, and filtering operations are carried out to increase the number of variations of the images. Then, the BF technique can be employed to eradicate the existence of noise involved in it.

Assume that refers to the multichannel images, and consider that exists as slide windows of a fixed size . Let the pixel from signify the Cartesian coordinates, denoted by , the place of pixels from , where is endowed with the standard order. The BF exchanges the central pixels of all the filter windows by the weighted average of their neighbor color pixels. The weighted function was planned for smoothing from the area of related colors but keeping edges intact by the heavily weighted individual’s pixel which is both spatially close and photometrically related to the central pixels [16]. Represent by the Euclidean norms and the central pixels of concern. Afterward, the weighted equivalent to some pixel in accordance with refers to the product of 2 elements, 1 spatial and 1 photometrical, but the spatial element is offered as and the photometrical element is provided by where demonstrates the perceptual color error from the color space, and . The color vector outcome of filtering is calculated utilizing the normalization weight, so it can be provided as

The weighted purpose reduces as the spatial distance from the image among and improves, and the weighted function reduces as the perceptual color variance among the color vector enhances. The spatial element reduces the control of the furthest pixel decreasing blurring, but the photometric element decreases the control of the individual’s pixels that are perceptually distinct in accordance with one under process. During this approach, only perceptually related regions of pixels were averaged together and the sharpness of edges is maintained. The parameters and were utilized for adjusting the control of the spatial and photometric elements correspondingly. A rough threshold is assumed to identify pixels appropriately closer or related to the central one. Notice that if , the BF methods include Gaussian filtering, and if , the filter methods range filters without spatial notion. During the case, if and are combined, the BF performs as AMF.

3.2. Feature Extraction Using the ECN Model

During the feature extraction process, the preprocessed facial image is passed into the ECN model to generate feature vectors. The enhanced capsule network (ECN) system has been employed. In our method, the split pixel set of the images is labeled as a group of nerve cells respective to the capsule [17]. The pixel vectors are utilized as an activation vector enclosed through an active capsule; also, there might be a certain class, namely, tumor or healthy, for an image-related pixel vector segmentation representing the overall length. The capsule routing in a layer is implemented through the multiplication of the coupling coefficient (CC) and capsule output. The values of CC can be defined as the resistance of the parent capsule for routing. The lower-level tumor diagnoses are determined as higher-level capsule activation via a top-down feedback method that is named “routing-by-agreement” [17]. Assume as output capsule , and denotes the weight matrix as follows: where determines the detection vector that diagnoses the output parent capsule with capsule , and the pixel range has been applied for evaluating the weight quantity. The quantity for the weight was enhanced as long as the value has been reduced or the pixel is possibly included in the positive group. The softmax process is utilized by the previous layer capsule, and the possible parent capsule as a coefficient is encoded as where main logits display the log preceding possibility of routing capsule in the prior layer to capsule in the subsequent layer. In general, the “routing-by-agreement” method was implemented by logits of the capsule in each layer:

The preceding layer illustrates an essential component in the computation of the input of parent capsule , i.e., accomplished by

The compression value of the pixel vector was determined within using a nonlinear process named squashing. The computational operation is expressed by where . And the following layer capsule was accomplished by

The whole capsule classification is considered the margin loss in the class capsule for the capsule network according to the loss: where denotes the instant existence in class capsule , and , , and determine hyperparameter assistance. The training of ECN can be determined according to six hundred iterations according to the Adam optimizer for the optimum of the hyperparameter using the amount of learning rates. Adam is a stochastic gradient descent replacement optimization technique for training deep learning models. Adam combines the finest features of the AdaGrad and RMSProp methods to provide an optimization technique that can handle sparse gradients on noisy issues. The Adam optimizer combines two gradient descent methodologies: Momentum. This algorithm is used to accelerate the gradient descent technique by taking the “exponentially weighted average” of the gradients into account. Using averages causes the algorithm to converge faster to the minima. The Adam optimizer produces better results than other optimization algorithms, takes less time to compute, and requires fewer parameters to tune. Because of this, Adam is suggested as the default optimizer for the majority of applications.

3.3. Image Classification

At the final stage, the SAE model receives the feature vectors as input to allot proper class labels to the facial images. AE is a type of unsupervised learning infrastructure which retains 3 states such as input, hidden, and output states [17]. An autoencoder is an unsupervised learning strategy for neural networks that learns efficient data representations (encoding) by training the network to disregard signal “noise.” Autoencoders can be used for image denoising, image compression, and, in some situations, picture data synthesis. An autoencoder is made up of three major components: an encoder, a code, and a decoder. The initial data is converted into a coded result, and the network’s successive layers extend it into a finished output. A “denoising” autoencoder can help you understand autoencoders. The denoising autoencoder refines the output by combining original and noisy input. Autoencoders are useful in image processing, classification, and other elements of machine learning. The procedure of trained AE has 2 parts, encoded and decoded. The encoded part was utilized to map the input data to hidden representations, and the decoded part was signified as recreating input information in the hidden representations. To provide the unlabeled input dataset , where , implies the hidden encoded vector computed in , and stands for the decoded vector of the resultant state. Therefore, the encoding method is as follows: where signifies the encoded functions, refers to the weighted matrix of encoding, and implies the bias vectors.

The decoded procedure can be determined as where denotes the decoded functions, demonstrated the weighted matrix of decoding, and implies the bias vectors. The parameter set of AE was the optimization to minimize the reconstruction error: where stands for a loss function .

The infrastructure of SAE is stacked AEs to hidden states with an unsupervised statewise learning technique and next is fine-tuned by a supervised approach. Therefore, the SAE-based technique is separated into 3 phases: (i)Train the primary AE by input information and attain the learn feature vectors(ii)The feature vector of the previous state was utilized as input to the next stage, and this process was repeated till the training ends(iii)Afterward, each hidden state is trained, and the BP technique was utilized for minimizing the cost function and upgrading the weight with a labeled trained set for achieving fine-tuning

3.4. Parameter Tuning Using the GWO Algorithm

For optimally adjusting the parameters (such as weight and bias) involved in the SAE model, the GWO algorithm is applied to it. Recently, a novel SI-optimized technique is called GWO established by Mirjalili et al. [18]. In fact, it can be an original method which accelerates the social and hunting hierarchy of GW by default. For developing the social performance of GW, it can be categorized into 4 states such as , , , and . considered the optimum solutions executed by and , correspondingly, and the residual solution derived inThe 3 primary fittest wolves named, , and neighboring the prey support for identifying the food from the challenging region. During the surrounding step, wolves increase the place of or as illustrated in where refers to the present iteration, denotes the present place of prey, and signifies the current place of wolves. implies the distance between wolves and prey, and coefficient vectors and are resultant from the mathematical process as illustrated as follows [19]. Figure 2 showcases the flowchart of the GWO technique and pseudocode of GWO discussed in Algorithm 1. where and imply the 2 vectors created in zero and one from an arbitrary fashion, and the element of has decreased linearly from two to zero to all the iterations. At this point, , , and define the location nearer to the place of prey. During the case of hunting, the top 3 were optimum solutions and residual wolves are appropriate for replacing the fundamental of the 3 initial optimum wolves. The place of wolves was upgraded on the fundamental of where implies the place of ; defines the place of ; stands for the place of ; indicates the place of existing solutions; , , and signify the vectors created from an arbitrary fashion. Here, , , and are demonstrated as arbitrary vectors, and denotes the amount of rounds. The step size of wolves was executed; then,, , and are illustrated as in Equations (17)–(19), correspondingly. Afterward, the resultant places of wolves are evaluated on the fundamental of Equations (20)–(23).

The GWO approach derives a Feedback Framework (FF) for obtaining higher classification performance. It defines a positive integer for representing the optimum efficiency of the candidate solution. During this study, the minimized classification error rate assumed as FF is provided in Equation (24). The optimum solution is a lesser error rate, and the worst solution gains a higher error rate.

Population initialization: grey wolves
Parameter initialization: , , and
Determine fitness values of all searching agents
While ()
 For every searching agent
  Upgrade the location of the present searching agent
 End for
 Upgrade , , and
 Determine fitness values of all searching agents
 Upgrade , , and
 Increment
End while
Return

4. Performance Validation

In this section, the result analysis of the GWOECN-FR model is carried out using four benchmark datasets. The first GTAV face dataset [20] includes images of 44 persons. The next Georgia Tech face database [21] includes a collection of images for 50 persons. The third FEI face database [22] comprises 14 sets of images from 200 people. Finally, the Labeled Faces in the Wild (LFW) database [23] includes 13K facial images gathered from the Internet. A few sample images are demonstrated in Figure 3.

The overall accuracy outcome analysis of the GWOECN-FR technique under four datasets is portrayed in Figure 4. The results demonstrated that the GWOECN-FR algorithm has accomplished improved validation accuracy compared to training accuracy [2430]. It can be also observable that the accuracy values get saturated with the count of epochs.

The overall loss outcome analysis of the GWOECN-FR system under four datasets is depicted in Figure 5. The figure revealed that the GWOECN-FR approach has denoted the reduced validation loss over the training loss. It is additionally noticed that the loss values get saturated with the count of epochs.

Table 1 and Figure 6 report the enhanced accuracy examination of the GWOECN-FR model with recent methods on four datasets [3134]. The results indicated the betterment of the GWOECN-FR model on all the test datasets compared to existing methods. For instance, on the GTAV dataset, the GWOECN-FR model has obtained higher accuracy of 99.82%, whereas the AlexNet-SVM, ResNet-SVM, and AlexNet models have attained lower accuracy values of 99.42%, 99.54%, and 99.78%, respectively.

Meanwhile, on the FEI dataset, the GWOECN-FR model has gained increased accuracy of 99.56%, whereas the AlexNet-SVM, ResNet-SVM, and AlexNet models have accomplished reduced accuracy values of 97.33%, 98.37%, and 98.75%, respectively. Eventually, on the LPW dataset, the GWOECN-FR model has resulted in better accuracy of 98.38%, whereas the AlexNet-SVM, ResNet-SVM, and AlexNet models have offered reduced accuracy values of 93.80%, 93.99%, and 95.42%, respectively.

Table 2 and Figure 7 examine the enhanced precision examination of the GWOECN-FR technique with recent methods on four datasets. The results revealed the betterment of the GWOECN-FR method on all the test datasets compared to existing techniques [35, 36]. For instance, on the GTAV dataset, the GWOECN-FR method has obtained higher precision of 99.32%, whereas the AlexNet-SVM, ResNet-SVM, and AlexNet techniques have reached minimal precision values of 96.81%, 98.84%, and 98.75%, respectively. In the meantime, on the FEI dataset, the GWOECN-FR model has gained increased precision of 98.97%, whereas the AlexNet-SVM, ResNet-SVM, and AlexNet techniques have accomplished lower precision values of 94.26%, 95.40%, and 95.96%, respectively. Finally, on the LPW dataset, the GWOECN-FR approach has resulted in better precision of 98.74%, whereas the AlexNet-SVM, ResNet-SVM, and AlexNet algorithms have obtainable reduced precision values of 92.18%, 92.18%, and 93.13% correspondingly.

Table 3 and Figure 8 demonstrate the improved recall examination of the GWOECN-FR model with recent techniques on four datasets. The results showed the betterment of the GWOECN-FR technique on all the test datasets compared to existing techniques. For instance, on the GTAV dataset, the GWOECN-FR approach has attained superior recall of 99.51%, whereas the AlexNet-SVM, ResNet-SVM, and AlexNet techniques have attained lesser recall values of 98.52%, 98.14%, and 98.89% correspondingly. Afterward, on the FEI dataset, the GWOECN-FR model has gained increased recall of 99.42%, whereas the AlexNet-SVM, ResNet-SVM, and AlexNet methodologies have accomplished decreased recall values of 98.89%, 98.33%, and 98.05% correspondingly. At last, on the LPW dataset, the GWOECN-FR approach has resulted in better recall of 98.36%, whereas the AlexNet-SVM, ResNet-SVM, and AlexNet techniques have offered reduced recall values of 93.97%, 92.84%, and 95.28%, respectively.

Table 4 and Figure 9 define the increased F1 score examination of the GWOECN-FR system with recent algorithms on four datasets. The outcomes referred to the betterment of the GWOECN-FR method on all the test datasets compared to existing methods. For instance, on the GTAV dataset, the GWOECN-FR method has achieved a higher F1 score of 99.31%, whereas the AlexNet-SVM, ResNet-SVM, and AlexNet models have reached lower F1 score values of 96.69%, 98.05%, and 98.99% correspondingly. Likewise, on the FEI dataset, the GWOECN-FR system has gained a higher F1 score of 99.52%, whereas the AlexNet-SVM, ResNet-SVM, and AlexNet models have accomplished minimal F1 score values of 94.90%, 97.91%, and 98.01%, respectively. Lastly, on the LPW dataset, the GWOECN-FR methodology has resulted in a better F1 score of 99.16%, whereas the AlexNet-SVM, ResNet-SVM, and AlexNet techniques have accessible reduced F1 score values of 90.18%, 93.95%, and 94.05%, respectively.

Finally, a detailed testing time (TST) examination of the GWOECN-FR technique with recent methods [24] is performed in Table 5 and Figure 10. The results indicated that the GWOECN-FR technique has obtained effectual outcomes with the least TST under all datasets. For instance, on the GTAV dataset, the GWOECN-FR technique has resulted in a lower TST of 0.04 s, whereas the AlexNet-SVM, ResNet-SVM, and AlexNet models have attained increased TSTs of 0.100 s, 0.0361 s, and 0.080 s, respectively.

Similarly, on the FEI dataset, the GWOECN-FR technique has achieved a reduced TST of 0.03 s, whereas the AlexNet-SVM, ResNet-SVM, and AlexNet models have obtained higher TSTs of 0.125 s, 0.0051 s, and 0.0062 s, respectively. From the abovementioned result analysis, it is ensured that the GWO-ECN-FR technique has accomplished the maximum FR outcome over the other techniques.

5. Conclusion

In this study, a new GWOECN-FR technique has been developed for the rapid and prompt identification of facial images. The presented GWOECN-FR technique comprises several stages of operations such as data augmentation, BF-based noise elimination, ECN-based feature extraction, SAE-based classification, and GWO-based parameter tuning. Moreover, the GWO algorithm is utilized to optimally modify the weight and bias values of the SAE model. In order to showcase the supremacy of the GWOECN-FR technique, a comprehensive result analysis is carried out on a benchmark dataset. On the FEI dataset, the GWOECN-FR method got a lower TST of 0.03 s, while the AlexNet-SVM, ResNet-SVM, and AlexNet models all got higher TSTs of 0.125 s, 0.0051 s, and 0.0062 s, respectively. The experimental results demonstrated the betterment of the GWOECN-FR technique over the recent approaches. Therefore, the GWOECN-FR technique can be applied as an effective tool for FR. In the future, hybrid DL models with hyperparameter optimizers can be involved to improve the recognition performance.

Data Availability

The manuscript contains all of the data.

Conflicts of Interest

The authors state that they do not have any conflicts of interest.

Authors’ Contributions

The contributions of the authors involved in this study are as follows: conceptualization, KS and CP; methodology, SNK and SC; validation, RW; data curation, EOM; writing—original draft preparation, KS and SNK; writing—review and editing, CP and SC; and supervision, RW and EOM.