Abstract

Aiming at the problems of intensive background noise, low accuracy, and high computational complexity of the current significant object detection methods, the visual saliency detection algorithm based on Hierarchical Principal Component Analysis (HPCA) has been proposed in the paper. Firstly, the original RGB image has been converted to a grayscale image, and the original grayscale image has been divided into eight layers by the bit surface stratification technique. Each image layer contains significant object information matching the layer image features. Secondly, taking the color structure of the original image as the reference image, the grayscale image is reassigned by the grayscale color conversion method, so that the layered image not only reflects the original structural features but also effectively preserves the color feature of the original image. Thirdly, the Principal Component Analysis (PCA) has been performed on the layered image to obtain the structural difference characteristics and color difference characteristics of each layer of the image in the principal component direction. Fourthly, two features are integrated to get the saliency map with high robustness and to further refine our results; the known priors have been incorporated on image organization, which can place the subject of the photograph near the center of the image. Finally, the entropy calculation has been used to determine the optimal image from the layered saliency map; the optimal map has the least background information and most prominently saliency objects than others. The object detection results of the proposed model are closer to the ground truth and take advantages of performance parameters including precision rate (PRE), recall rate (REC), and -measure (FME). The HPCA model’s conclusion can obviously reduce the interference of redundant information and effectively separate the saliency object from the background. At the same time, it had more improved detection accuracy than others.

1. Introduction

The human’s visual attention mechanism had enabled humans to do real-time positioning in complex scene images corresponding to the position of important information, in order to determine the priority sequence of different objectives, which can effectively reduce the range of visual processing, thus greatly saving computing resources. Therefore, to study the human’s visual attention mechanism and apply it to the research field of computer vision and image processing has great significance. Today, researchers at home and abroad have been widely concerned with the saliency areas of the human’s visual attention mechanism based on the detection technology. The method has become an important research topic in the research field of computer vision and has been successfully applied to image cropping, multiple object tracking and recognition, and thumbnail generation.

The researchers in computer vision had often used the bottom-up process to simulate the mechanism of visual attention, which is called the bottom-up saliency model. For example, Itti et al. [1] had simulated the fusion mechanism of human brain visual cortex neurons to color, brightness, and orientation features, built the visual saliency model based on the principle of center periphery, and effectively detected the saliency area. The calculation process of the model is simple, but the detection of the target area is not accurate. Yang et al. [2] had improved Itti’s model [1] based on the graph theory model and proposed the GBVS model. The calculation method is similar to Itti’s model [1], and the image’s color, brightness, and direction are same. The GBVS model [2] can compute and calculate the saliency map by the Markov random field, and it can detect image saliency from a global perspective. But its drawback is its inefficiency and inability to identify the target contour. Hou and Zhang [3] had put forward to the Spectral Residual (SR) algorithm. Liao et al. [4] had considered that the amplitude spectrum of prior knowledge is subtracted from the amplitude spectrum of the image. The rest is the saliency part of the amplitude spectrum, and then, the target saliency map can be obtained through the transformation in the frequency domain. The algorithm is fast, but the accuracy is difficult to guarantee. The Low-Rank (LR) algorithm had been proposed by Zhou et al. [5]. It is able to extract more notable features from the high-level Apriori to the low-rank framework, but the computation is large, and the saliency map obtained is poor. Generally speaking, the bottom-up saliency methods are mostly basic, faster, and simpler, but the saliency testing results are often represented by dense highlights, so they cannot show the outline of saliency objects.

The visual saliency detection method is the top-down model. According to a specific task, Hou et al. [6] had realized the adjustment of shape, size, feature number, threshold, and so on from the bottom-up testing results. Achanta et al. [7] had proposed the Frequency-Tuned (FT) algorithm. It is the Euclidean distance between the average pixel Gaussian low-pass filtering of each image pixel value in the image. The image’s value has saliency value of the image point and formed a kind of measurement method based on the comparison of global saliency detection. The Region Contrast (RC) algorithm had been proposed by Cheng et al. [8], by calculating the saliency value of each partition area and building the saliency map based on local contrast. Dalal and Triggs [9] had proposed a method of human body detection based on the feature of the gradient histogram. The method had used gradient direction histogram information to express human characteristics and extracted human shape information and motion information. It had formed rich feature sets. Through local contrast image features, these top-down models [1014] were characteristically analyzed; due to the extraction of various features, these models’ operation speed is slow and is easily affected by the illumination environment and other objective factors, which makes the object detection accuracy greatly reduced.

In recent years, many researchers had applied such methods as machine learning to saliency detection and made great progress. For example, Yu et al. [15] and Chen et al. [16] had built the Deep Convolution Neural Network (DCNN) model based on the principle of human vision. It has combined with the superpixel clustering method to get the image region features. It can achieve effective saliency detection by learning the features. Zhou and Tang [17] had detected the effectiveness and robustness of machine learning and sparse coding. This method has high robustness, but the operation speed is slow. To this end, Principal Component Analysis (PCA) had been applied to the saliency detection; this method had preserved the efficiency of machine learning [18, 19]. However, when the image background information extracted principal components which had represented saliency goals that cannot be effective, it results in greater detection results with background noise [20, 21].

In visual saliency detection tasks, due to the complexity of the image, the saliency graph of the single level detection method is not clear [22]. In order to reduce the impact of image complexity, Wang et al. [23] had proposed the Hierarchical Saliency (HS) algorithm. Chen et al. [24] can effectively suppress the interference of background noise to object detection by stratifying the image and calculating the stratified graph.

Based on the above analysis, in order to weaken the impact efficiency of redundant information on the detection results and retention of machine learning, the paper has proposed the saliency object detection algorithm based on the Hierarchical PCA model, using the layered PCA method which divides the image into multilayer images of a lack of background information in different degrees, so that in the process of extracting principal component information in reducing the amount of calculation and weakening the background information of interference to the detection process, it retains the efficiency of machine learning, to increase the robustness of the algorithm. Figure 1 shows the detection results of the proposed algorithm in the paper.

For this paper, the main contributions are as follows: (1) to attenuate the impact of redundant information on testing results and preserve the efficiency of machine learning; (2) to divide the image into multilayer images with different background information which reduces the computational complexity and reduces the interference of background information on the detection process in the process of extracting the principal component information; and (3) to preserve the efficiency of machine learning and increase the robustness of the proposed algorithm.

Section 2 depicts the proposed algorithm details. Section 3 presents the generating saliency graph with Hierarchical PCA. Section 4 describes the experiments on the MRAS-1000, ASD-1000, and ECSSD-1000 datasets; compares; and analyzes with several methods such as IT, GBVS, SR, LC, HS, BSCA, HDCT, and DCRR. Section 5 summarizes the research work and looks forward to the future research works.

2. The Proposed Algorithm Details

The Hierarchical PCA visual saliency detection algorithm’s flowchart is shown in Figure 2. The image information contained in different bit surface layers is quite different, and the eighth image significantly reduces the information contained in the saliency object, so that the significant object area in the image is missing. Other images, to a certain extent, reduce the background information due to the missing bit layers, highlighting the information contained in the saliency objects.

The basic process procedure is as follows: (1) stratification of the original image, using different bit data reconstruction layers which contain an image thus highlights the saliency object information; (2) in order to integrate multiple features, the original color structure is transferred to the gray-level image after stratification, so that each layer of the image has the corresponding color structure corresponding to the original image; (3) PCA is used to extract the structure features and color features; (4) the two distinct features are fused to obtain multiple saliency graphs; and (5) the optimal results are selected through the information entropy. Figure 3 shows an example of the proposed algorithm.

Input: original image
Output: saliency map
Initialization: adjust the size of the input image
Step 1: bit surface stratification
 Step 1.1: the original image has been converted to a gray image, and it is used as the first layer of the image
 Step 1.2: the lowest effective bit layer of the first-layer image to zero gets a picture of the image which includes a seven-bit layer as the second-layer image output. The lowest effective bit layer to the second-layer image to zero has the third layer of the image output
 Step 1.3: the binary data of different bits are converted into decimal pixel values to obtain the multilayer image matching the number of bits
Step 2: color conversion
 Step 2.1: calculate Mark1 according to formula (3)
 Step 2.2: calculate Mark2 according to formula (4)
Step 3: feature extraction using PCA
 Step 3.1: calculate according to formula (7)
 Step 3.2: calculate the distance between each image block and average image block according to formula (6)
 Step 3.3: calculate the color feature of according to formula (8)
Step 4: saliency map fusion
 Step 4.1: calculate the fusion of feature mapping according to formula (9), and limit the range of fusion features to by the normalization method
 Step 4.2: combine the fusion feature mapping and Gaussian weight mapping to get the prominent visual saliency map according to formula (10)
Step 5: calculate information entropy of according to formula (11), and calculate the saliency graph of the best scale according to formula (12)
2.1. The Principle of Bit Surface Image Stratification

The eight-bit gray-level image is considered to be composed of eight planes of one bit, each of which contains saliency information that matches it. Four of the high-order bit planes, especially the last two bit planes, contain most of the information of the saliency object. The low-order bit plane contributes to more detailed gray-level details on the image, which means that we can use the saliency information and more bit levels to build the original image, highlighting the proportion of the saliency target in the whole image. Therefore, different bits of information can be used to represent the layered images. The algorithm steps are as follows: (1)The original image has been converted to a gray image, and it is used as the first layer of the image(2)The lowest effective bit layer of the first-layer image to zero gets a picture of the image which includes a seven-bit layer as the second-layer image output. The lowest effective bit layer to the second-layer image to zero has the third layer of the image output(3)The binary data of different bits are converted into decimal pixel values to obtain the multilayer image matching the number of bits

The way of removing binary data and the bit level has been chosen to achieve image stratification. The purpose is to produce images with multiple objects with the dominant target as the main information and to reduce the interference of background information. The operating results are shown in Figure 4.

In Figure 4, they can be seen that different bit planes contain different image information. The eight images obviously reduce the information contained in the saliency objects, so that the saliency object areas in the images are missing. Other images, due to the missing bit layer, also reduce the background information to a certain extent and highlight the information contained in the visual saliency objects.

2.2. The Color Conversion

The image’s hierarchy based on the bit surface has been carried out on the basis of the gray-level image. In order to maintain the original color features from layered images, the original image’s color structure has been used as the mold to transfer color to the gray-level image after image stratification.

In the color conversion process, the conversional technology has been often used in gray colorful transformation, meaning each image pixel of the black area and white area. They are made of the point of the gray value and sent to the three passages through the implementation of different brightness transformations. It generates the corresponding red value, green value, and blue value, namely, the color image and the pixel corresponding to the color’s value, which can not only retain the mode difference of the object and background of the original image but also enhance the two-color coded target contrast significantly, making the detection more convenient. The implementation step details of color conversion in the paper are as follows.

Firstly, with the original image as the reference image of(original image) and segmented images as the image to be processed of (gray image), will be extended to three channels, and the expansion of the image and was transformed into the space (where is the luminance component, is the blue color component, and is the red color component).

Secondly, the maximum and minimum values of every column from the image matrix constituted by are assumed. Assuming that the resolution of the image is , is used to represent the image pixels of the column; then, the maximum and minimum values of every column in the matrix can be expressed as follows:

Then, the maximum value of two images, and , and the minimum value of two images, and , are calculated. The two images are normalized to get the reconstructed color image model.

Transfer the colorful map, transfer the image pixel value in Mark2 to the pixel points in the corresponding Mark1, and make the hierarchical gray image have the same color structure as the original image. The result of the transformation is shown in Figure 5.

3. The Generating Saliency Graph with Hierarchical PCA

The PCA is the model used to analyze data in the multivariate statistical analysis procedure. It is a way to describe the sample with a small number of features to reduce the dimension of the feature space. The algorithm proposed in this paper is the reconstruction of the saliency map with two features based on the unique structure and color characteristics of pixels near the hierarchical saliency object.

Due to the hierarchical results, the integration model and color pattern of each layer of the image are different from the other images; by section, that image reduces the outstanding target; a layer of background information always exists in the hierarchical image so by calculating each layer image significantly, the results of the output are then found to be most close to the true value. The experimental results are shown in Figure 6, in which Figures 6(a)–6(h) represent the saliency graph of the corresponding stratified images above them, respectively. The specific calculation process of the algorithm is described as follows in Figure 6.

3.1. Extraction of Structural Features

In order to improve the efficiency of structural feature computation, the PCA model based on Wang et al. [23] has been represented in the paper.

Firstly, the layered color image is analyzed by the PCA model, and each layer is divided into blocks, and is the total number of blocks. For a single-layer image, each image block centered on the pixel point is expressed in , and the average image block can be defined as

They can calculate the distance between each image block and average image block , , along principal component direction. Whether an image block has significant structural characteristics is determined based on the distance. Here, the position coordinates of each image block are represented by its central pixel , and the position of the average image block is represented by . The definition of is shown as where is the variance between the two image blocks.

The rule of judgments is as follows: when is larger than the threshold of the dataset, the image block is considered to be the saliency area, and the other is a common image block. From the mathematical meaning, the extraction of structural features is attributed to the norm of in the PCA coordinate system. Therefore, the structural feature is further defined as

In formula (7), is the coordinate of in the PCA coordinate system. is the operation symbol of the norm.

3.2. The Extraction of Color Features

Although the extraction of structural features can find the most unique block in the image, it is not suitable for all images. As shown in Figure 7, the structure characteristics of each sphere are the same, but the colors are different. In this case, they are thinking that the color features are more distinctive. So, the extraction of color features is essential.

Here, two steps are used to detect the color difference of the image block. The first step is to divide each layer of the image into several blocks by using the simple linear iterative clustering superpixel segmentation method and then determine which block has unique color characteristics. In the second step, the sum of distance between the image block and the other image blocks in the color space is defined as the color difference of the image block. Here, is used to represent the position of block in the color space, and is used to represent the location of image block. From the mathematical meaning, the color feature extracted for image block is to calculate its norm in the PCA coordinate system. Therefore, the color feature of is defined as

In the upper form, is represented by the distance between two blocks. is the operational symbol of norm. represents the total number of blocks after superpixel segmentation.

3.3. The Saliency Fusion by Structural Features and Color Features

The single image structure feature or color feature cannot effectively characterize all information of the saliency object. In order to obtain accurate and saliency objects, they can combine the structure and color features of each layer of images to detect the saliency regions of different layers of images. Here, they are using the fusion feature to get

After that, the fusion features are limited to the range by normalization. Because visual pixels are usually clustered, they usually correspond to objects in real scenes. In order to further modify the saliency models, people usually use the center prior method to put the target area near the center. The center based on the Apriori algorithm usually assumes the target located at the center of the image as a hypothetical condition. By defining the center’s prior weight with a peak value-centered Gaussian function, the object saliency in the center of the image is prominently highlighted according to the weight distribution. Here, different target regions are represented by a set of pixels under different thresholds, and the threshold is uniformly distributed in the interval. Therefore, the process of the center prior calculation is as follows: (1)The image pixel sets of different layers are detected by the fusion of feature mapping , and the center of gravity of each threshold result is calculated(2)The center of gravity places a Gaussian distribution with of 10,000, and the corresponding Gaussian weights are calculated for each threshold(3)The Gaussian distribution with weight of five is added to the image center of each layer to improve the weight of the center position

The Gaussian weight mapping is used to represent the weighted sum of all Gaussian distributions, and different saliency priorities are given according to the difference of weight distribution. Therefore, they can further define the saliency mapping and combine the fusion feature mapping and Gaussian weight mapping to get the prominent visual saliency map .

3.4. The Decision of the Optimal Results

After the above steps, the saliency image corresponding to each layer can be obtained, and the best detection result diagram will be the final output image.

In the information theory, entropy is a relatively basic concept, which is represented by the average amount of information in random events. The information entropy often implies the distribution of the foreground and background noise in the image signal. Generally speaking, if the saliency area of the image is more obvious, it will be more prominent in the whole image performance, and the repeated background area will also inhibit more. Therefore, the saliency region is also gathered in the value of a particular region on the histogram. It provided the small information entropy. The general rule is that the minimum information entropy corresponds to the best saliency graph. For an image signal , its information entropy is defined by

In formula (11), represents the gray value of image in line and line , and means the probability of occurrence in image , and Ens represents the entropy of the image. Then, the saliency graph of the best scale can be expressed as

The information entropy is calculated for the multilayer saliency graph after the Hierarchical PCA processing. The information entropy of the stratified image is shown in Table 1.

The data in Table 1 is the information entropy of each image shown in Figure 8. They are using the above information entropy decision rule to decide the eight-level image, select the smallest information entropy image as the output of the optimal result, and get the saliency map with the least background information, which is the final result in Table 1.

4. The Experimental Results and Analysis

The experimental method has used the MATLAB software as the programming platform, and the algorithm is realized on the ThinkPad-E40 laptop. The Hierarchical PCA model in saliency detection is tested on datasets of MRAS-1000, ASD-1000, and ECSSD-1000 and compared with several methods, such as ITTI (IT) [1], GBVS (GB) [2], SR [3], LC [25], HS [23], BSCA [26], HDCT [27], and DCRR [28]. The results of Itti et al. [1] and Yang et al. [2], respectively, are provided by Hou and Zhang [3], Fang et al. [25], and Wang et al. [23] in each dataset. The CHS [29] had used the original data that is generated on the ECSSD dataset. The result of the visual contrast is shown in Figure 9. In addition, in order to objectively evaluate the detection results, various algorithms are used such as the precision rate (PRE), recall rate (REC), and -measure (FME) to evaluate the performance. The definitions of PRE, REC, and FME are shown in formulas (13)–(15) [30].

Among them, TP represents the number of image pixels that detect saliency objects. TN means that the background is correctly divided into the number of pixels in the background class. FP indicates the number of pixels that extract the wrong background. FN means that the saliency object error is divided into the number of pixels in the background class. The AUC indicator is defined as the lower area enclosed by the ROC curve and the coordinate axis, and the maximum value is 1. The larger the AUC, the better the prediction performance of the method on the gaze point of the human eye.

Figure 10 shows the - curve [11, 31] of different saliency detection algorithms on three typical common datasets. It can be seen that, because of the high recognition rate of the ECSSD dataset, the accuracy of the HS and LC algorithms is more than 90%, but the precision rate parameter is low. On the ASD dataset, each algorithm reduces the recall to a certain extent, and the precision rate parameter of the algorithms such as IT, GB, and LC is obviously reduced. In the ASD dataset, the recall rate of the GB algorithm is higher, and the HS algorithm has a certain advantage on the accuracy and the -measure value. In general, the accuracy of the algorithm on different datasets is over 90%, and the average -measure value is higher than the other algorithms. The detection results are shown as stable and robust.

Figure 11 is a contrast histogram of the -measure value results of various algorithms shown in Figure 12. It can be seen that due to the high image recognition rate of the ECSSD dataset, the accuracy of the HS and LC algorithms exceeds 90%, but the -measure value is low. On the more complex MRAS dataset, each algorithm reduces the recall rate to a certain extent, and the -measure values of IT, GB, LC, and other algorithms are significantly reduced. On the relatively simple ASD dataset, the GB algorithm has a higher recall rate, and the HS algorithm has certain advantages in accuracy and -measure values. In general, the accuracy of the algorithm in different datasets exceeds 90%, and the average -measure value is higher than the other algorithms. The detection effect is stable, and the robustness is better than the other solutions.

The results in Table 2 can show the AUC score of each method. It can be seen that our method has the highest AUC score, indicating that our method still has a good detection effect in natural images with complex backgrounds, and can effectively label saliency targets. At the same time, it shows that the method in this paper has higher accuracy and the saliency map obtained is closer to the ground truth.

The calculation speed is an important index for evaluating the superiority of the method. The calculation speed of the method determines whether it can be applied to a real-time system. As a preprocessing process in various image processing fields, the calculation speed is very important. On the premise that the accuracy of the method meets the expected requirements, the faster the calculation speed, the better the overall performance of the method. The average calculation time of each method is shown in Table 3. The method in this paper has fast calculation speed and can meet the basic application requirements.

5. Conclusions

In this paper, the saliency object detection algorithm based on the Hierarchical PCA model was proposed. The experimental results had shown that the proposed algorithm can reduce the interference of background noise, and the background and target separation has certain advantages in precision, recall, and-measure parameters, while retaining the excellent characteristics of machine learning methods in order to improve the saliency detection effect. Therefore, the Hierarchical PCA saliency detection is an effective method for object detection under complex backgrounds. The Hierarchical PCA algorithm cannot analyze all the information in the image at the same time. When the objects in the background have the same level of brightness and resolution, it is difficult to extract the complete object information. Therefore, the future work for the proposed technique is to study the problem of incomplete object information and further improve the information utilization of the whole image to get more accurate and saliency object information.

Data Availability

No data were used to support this study.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work is supported by grants of the National Natural Science Foundation of China (Nos. 61972056, 61972212, 61402053, and 61981340416), the Open Research Fund of Hunan Provincial Key Laboratory of Intelligent Processing of Big Data on Transportation (No. 2015TP1005), the Changsha Science and Technology Planning (Nos. KQ1703018, KQ1706064, KQ1703018-01, and KQ1703018-04), the Research Foundation of Education Bureau of Hunan Province (Nos. 17A007 and 19B005), the Changsha Industrial Science and Technology Commissioner (No. 2017-7), the Natural Science Foundation of Jiangsu Province (No. BK20190089), and the Junior Faculty Development Program Project of Changsha University of Science and Technology (No. 2019QJCZ011).