Elsevier

Neurocomputing

Volume 453, 17 September 2021, Pages 801-811
Neurocomputing

JWSAA: Joint weak saliency and attention aware for person re-identification

https://doi.org/10.1016/j.neucom.2020.05.106Get rights and content

Abstract

Attention mechanisms can extract salient features in images, which has been proven to be effective for person re-identification. However, focusing on the saliency of an image is not enough. On the one hand, the salient features extracted from the model are not necessarily the features needed, e.g., a similar background may also be mistaken as salient features; on the other hand, various salient features are often more conducive to improving the performance of the model. Based on this, in this paper, a model that has joint weak saliency and attention aware is proposed, which can obtain more complete global features by weakening saliency features. The model then obtains diversified saliency features via attention diversity to improve the performance of the model. Experiments on commonly used datasets prove the effectiveness of the proposed method.

Introduction

Person re-identification (re-ID) is a technology hotspot in the face recognition field, and is primarily used to identify whether a person in different images taken in non-overlapping areas is the same person. Person re-ID involves many fields, such as computer vision and pattern recognition, and has broad application prospects in the fields of security, criminal investigation, and smart cities. It has recently received increasingly more attention in both academic and industrial circles.

Person re-ID technology originated from image recognition, however, because of low pixel, multiple shooting angles, complex image backgrounds, and large changes in a person’s posture, accompanied by changes in lighting, occlusion, and other factors, person re-identification is a very challenging task. However, with the application of deep learning in computer vision, person re-ID technology has been pushed to a new stage.

At present, the person re-ID methods based on deep learning primarily focus on two aspects: representation learning and metric learning. Representation learning automatically extracts the representative features of images via models according to different tasks, and is commonly used in classification and verification problems. In a previous study [1], the identity of a person is used as a label, the classification and verification sub-networks were fused, and whether images of two people were of the same person was determined. However, in existing research [2], [3], [4], [5], [6], [7], it has been asserted that the ID information of a person alone is not sufficient to train a model with strong generalization performance. Therefore, several personal attributes (such as hair color, gender, and clothing color), posture information, and other information are used as guidance to extract more auxiliary features to enhance the generalization performance of the model. When global feature representation learning encounters a bottleneck, some researchers [3] have developed methods based on local feature representation. The common methods of extracting local features are image segmentation, skeleton key point positioning, and pose correction, which have been found to be effective in recent research and have achieved good results. In addition, unsupervised learning method and graph neural network are also well used in person re-identification, Ye et al. [8] design a positive re-weighting strategy to refine the intermediate labels and learned better similarity measurement from it, which improved the tag estimation process, and in order to make full use of rich video information and reduce inaccurate matching, a dynamic graph matching framework is further integrated. There are also some scholars [9] who study the cross modal pedestrian feature representation and successfully apply it to the night-time person re-identification task. Different from representation learning, metric learning aims to learn the similarity of two images through a network. In person re-ID task, it is expected that the similarity of images of the same person is greater than that of images of different person. This requires the network to make the distance between images of the same person (positive samples) as small as possible and to make the distance between images of different person (negative samples) as large as possible. The commonly used measurement methods include comparative loss [10], triple loss [11], [12], [13], triple hard loss with batch hard mining (TriHard loss) [14], and margin sample mining loss (MSML) [15].

By combining representational learning with metric learning, many excellent person re-ID methods have been proposed. However, a very important problem arises in person recognition, that is easy to ignore, namely, background bias caused by saliency detection bias. In person re-ID, the model fails to focus the saliency on the person, resulting in the background feature being wrongly considered as a saliency feature by the model. Therefore, some images have higher ranking in query sorting because they have similar background with query images, as show in Fig. 1. This phenomenon explains the effect of saliency detection error on person re-identification.

To achieve better performance, attention mechanism is widely used in the field of person re-ID. In person re-ID tasks, the main purpose of the attention mechanism is twofold, namely, to obtain more discriminative features, and local features of the person. In [16], both spatial and channel attention were combined to extract the most discriminative features. In other existing research [3], [17], [18], [19], image segmentation, attribute recognition, attitude recognition, and an attention mechanism were combined to extract the local features of a person. In addition, other works [20], [21] have directly diversified attention mechanisms and obtained diversified, varied attention maps for the extraction of more local features.

Inspired by the existing research, the joint weak saliency and attention-aware (JWSAA) method is proposed in this work for person re-ID. The weak saliency mechanism weakens the feature area with a high response, thus allowing the network to pay more attention to more useful features of a person to remove the background influence. Based on the extraction of all features of a person, an attention-aware module is utilized to extract different discriminative features, which makes the final features of the network more representative, robust, and discriminative.

In summary, the following contributions to the research on person re-ID are made in this paper.

  • A new end-to-end trainable model is proposed to solve background interference and extract diverse features of a person in the person re-ID task;

  • A novel concept of a weak saliency mechanism that can adaptively change the weakening factor to extract all the useful information is formulated for the optimization of person re-ID in deep learning;

  • A reference method of attention diversity is designed to obtain different local features via a variety of attention maps.

  • Extensive ablation studies and convolution visualization demonstrate that the proposed method can substantially improve person re-ID performance.

Section snippets

Related work

Deep learning has resulted in the explosive growth of person re-ID methods. In this section, the representative methods that are closely related to the present work are reviewed.

In previous work, few studies paid attention to the impact of image background on algorithm performance. But some excellent unsupervised object segmentation [23], [24], [25] methods can be used to remove background interference. Wang et al [26] emphasized the importance of the internal correlation between video frames,

Method

The aim of the proposed method is to remove the background information from an image and focus on the person in the image using the weak saliency mechanism. After obtaining the person features that remove the background noise, the features of different parts of the person are extracted adaptively by the attention-aware module. These features are then fused with the main features of the person to complement each other as the final features for person re-ID.

Experiment and results

To demonstrate the superior performance of the proposed method, the datasets Market-1501 [46] and DukeMTMC-reID [47], [48], which are commonly used in re-ID tasks, were chosen for experiments.

The Market-1501 dataset contains 1501 identities observed under six camera viewpoints, 19,732 gallery images, and 12,936 training images detected by [49]. The DukeMTMC-reID dataset contains 1404 identities, 16,522 training images, 2228 queries, and 17,661 gallery images. Because so many images were

Conclusions

In this paper, a new method of person re-ID called the joint weak saliency mechanism and attention-aware model (JWSAA), was introduced, and the weak saliency mechanism was proposed. By weakening the high-response features, the model gradually pays attention to all the valuable areas of the features rather than only the most significant features in an image. A multi-attention branches sub-network is then constructed via the loss of diversity to make the model pay attention to different regional

CRediT authorship contribution statement

Xin Ning: Conceptualization, Methodology, Software, Investigation, Writing - original draft. Ke Gong: Investigation, Validation, Formal analysis, Visualization, Software. Weijun Li: Validation, Formal analysis, Visualization. Liping Zhang: Resources, Writing - review & editing, Supervision, Data curation.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work is supported by the National Natural Science Foundation of China (Grant No. 61901436).

Xin Ning received his Ph.D. in 2017 from Institute of Semiconductors, Chinese Academy of Sciences. He is currently an Assistant Professor of Artificial Intelligence at Institute of Semiconductors Chinese Academy of Sciences. His research interests include deep learning machine art, pattern recognition, and image cognitive computation. He is a member of IEEE.

References (60)

  • M. Ye et al.

    Dynamic graph co-matching for unsupervised video-based person re-identification

    IEEE Trans. Image Process.

    (2019)
  • M. Ye et al.

    Bi-directional center-constrained top-ranking for visible thermal person re-identification

    IEEE Trans. Inf. Forensic Secur.

    (2020)
  • R.R. Varior, M. Haloi, G. Wang, Gated siamese convolutional neural network architecture for human re-identification,...
  • B. Huang, [Read] FaceNet: A Unified Embedding for Face Recognition and Clustering, (2017) 1–2....
  • H. Liu et al.

    End-to-end comparative attention networks for person re-identification

    IEEE Trans. Image Process.

    (2017)
  • D. Cheng, Y. Gong, S. Zhou, J. Wang, N. Zheng, Person re-identification by multi-channel parts-based CNN with improved...
  • A. Hermans, L. Beyer, B. Leibe, In defense of the triplet loss for person re-identification, CoRR. abs/1703.0 (2017)....
  • Q. Xiao, H. Luo, C. Zhang, Margin Sample Mining Loss: A Deep Learning Based Method for Person Re-identification, CoRR....
  • W. Li, X. Zhu, S. Gong, Harmonious Attention Network for Person Re-identification, Proc. IEEE Comput. Soc. Conf....
  • H. Guo, H. Wu, C. Zhao, H. Zhang, J. Wang, H. Lu, CASCADE ATTENTION NETWORK FOR PERSON RE-IDENTIFICATION National...
  • J. Xu, R. Zhao, F. Zhu, H. Wang, W. Ouyang, Attention-Aware Compositional Network for Person Re-identification, Proc....
  • T. Chen, S. DIng, J. Xie, Y. Yuan, W. Chen, Y. Yang, Z. Ren, Z. Wang, ABD-net: Attentive but diverse person...
  • S. Li, S. Bak, P. Carr, X. Wang, Diversity Regularized Spatiotemporal Attention for Video-Based Person...
  • W. Wang et al.

    Saliency-aware video object segmentation

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2018)
  • W. Wang, J. Shen, S. Member, L. Shao, S. Member, Consistent Video Saliency Using Local Gradient Flow Optimization and...
  • W. Wang et al.

    Video salient object detection via fully convolutional networks

    IEEE Trans. Image Process.

    (2018)
  • X. Lu, W. Wang, C. Ma, J. Shen, L. Shao, F. Porikli, See More , Know More: Unsupervised Video Object Segmentation With...
  • W. Wang, X. Lu, J. Shen, D. Crandall, L. Shao, Zero-shot video object segmentation via attentive graph neural networks,...
  • M. Tian, S. Yi, H. Li, S. Li, X. Zhang, J. Shi, J. Yan, X. Wang, Eliminating background-bias for robust person...
  • K. Li, Z. Wu, K.C. Peng, J. Ernst, Y. Fu, Tell me where to look: guided attention inference network, Proc. IEEE Comput....
  • Cited by (118)

    • Online object tracking based interactive attention

      2023, Computer Vision and Image Understanding
    • Joint learning with diverse knowledge for re-identification

      2023, Signal Processing: Image Communication
    View all citing articles on Scopus

    Xin Ning received his Ph.D. in 2017 from Institute of Semiconductors, Chinese Academy of Sciences. He is currently an Assistant Professor of Artificial Intelligence at Institute of Semiconductors Chinese Academy of Sciences. His research interests include deep learning machine art, pattern recognition, and image cognitive computation. He is a member of IEEE.

    Ke Gong received his bachelor degree in China university of petroleum (Beijing) in 2018, and now he is working at Beijing Wave Security Technology company limited, Cognitive Computing Technology Joint Laboratory, Wave Group.

    Weijun Li received his Ph.D. in 2004 from Institute of Semiconductors, Chinese Academy of Sciences. He is currently a Professor of Artificial Intelligence at Institute of Semiconductors Chinese Academy of Sciences (ISCAS) and the University of Chinese Academy of Sciences. He is in charge of the Artificial intelligence research Center of ISCAS, also the Director of the Lab of Highspeed Circuits & Neural Networks of ISCAS. His research interests include deep modeling, machine art, pattern recognition, artificial neural networks and intelligent system. He is a senior member of IEEE.

    Liping Zhang received her Ph.D. from Institute of Semiconductors, Chinese Academy of Sciences in 2018. Currently, she is an assistant research fellow in the Laboratory of High-speed Circuit and Artificial Neural networks at Institute of Semiconductors, Chinese Academy of Sciences. Her research interests include biometrics, pattern analysis. She is a member of IEEE.

    View full text