ABSTRACT
Person re-identification can quickly locate and find all the specified targets in complex scenes with multiple cameras, which has been widely applied in intelligent video surveillance and security system. As a state-of-art method proposed recently, ResNet exhibits promising performances on person re-identification. However, without intermediate fully connected layer, ResNet fails to fully grasp the global information in the detection process. To overcome the above problem, this paper proposes a person re-identification method named RG-BoTNet by fusing the Relation-aware Global Attention mechanism into BoTNet. Since relation-aware global attention is good at grasping the global information of the image, RG-BoTNet is powerful in extracting personal features. The good performances conducted on cuhk03 dataset in terms of Mean Average Precision (MAP) and Rank-1demonstrate the effectiveness of RG-BoTNet for person re-identification task.
- Ye, M., Liang, C., Wang, Z., Leng, Q., Chen, J., & Liu, J. 2015. Specific person retrieval via incomplete text description. In Proceedings of the 5th ACM on International Conference on Multimedia Retrieval (ICMR 2015). Association for Computing Machinery, Shanghai, China, 547-550. https://doi.org/10.1145/2671188.2749347Google ScholarDigital Library
- Almasawa, M. O., Elrefaei, L. A., & Moria, K. (2019). “A survey on deep learning-based person re-identification systems”. IEEE Access, 7:175228-175247.Google ScholarCross Ref
- Sun, Y., Zheng, L., Yang, Y., Tian, Q., & Wang, S. 2018. Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline). In Proceedings of the European conference on computer vision (ECCV 2018). Springer Verlag, Munich, Germany, 480-496. https://doi.org/10.1007/978-3-030-01225-0_30Google ScholarDigital Library
- Wang, X., Girshick, R., Gupta, A., & He, K. 2018. Non-local neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR 2018). IEEE, Salt Lake City, UT, 7794-7803. https://doi.org/10.1109/CVPR.2018.00813Google ScholarCross Ref
- Woo, S., Park, J., Lee, J. Y., & Kweon, I. S. 2018. Cbam: Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV 2018). Springer Verlag, Munich, Germany, 3-19. https://doi.org/10.1007/978-3-030-01234-2_1Google ScholarDigital Library
- Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., & Lu, H. 2019. Dual attention network for scene segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR 2019). IEEE, Long Beach, CA, 3146-3154. http:// doi.org/10.1109/CVPR.2019.00326Google ScholarCross Ref
- Hu, J., Shen, L., & Sun, G. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR 2018), IEEE, Salt Lake City, UT, 7132-7141. https://doi.org/10.1109/CVPR.2018.00745Google ScholarCross Ref
- Zhang, Z., Lan, C., Zeng, W., Jin, X., & Chen, Z. 2020. Relation-aware global attention for person re-identification. In Proceedings of the ieee/cvf conference on computer vision and pattern recognition (CVPR 2020), IEEE, Virtual, Online, United states, 3186-3195. https://doi.org/10.1109/CVPR42600.2020.00325Google ScholarCross Ref
- He, K., Zhang, X., Ren, S., & Sun, J. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR 2016). IEEE, Las Vegas, NV. 770-778. https://doi.org/10.1109/CVPR.2016.90Google ScholarCross Ref
- Li, X., Duan, C., Yin, P., & Wang, N. 2021. Pedestrian Re-identity Based on ResNet Lightweight Network. In Proceedings of the 2nd International Conference on Applied Physics and Computing (ICAPC 2021), IOP Publishing Ltd, Ottawa, ON. 1-6. https://doi.org/10.1088/1742-6596/2083/3/032087Google ScholarCross Ref
- Park, H., & Ham, B. 2020. Relation network for person re-identification. In Proceedings of the AAAI conference on artificial intelligence (AAAI 2020), New York, NY, 11839-11847. https://arxiv.org/abs/1911.09318v2Google ScholarCross Ref
- Li, X., Duan, C., Yin, P., & Wang, N. 2021. Pedestrian Re-identity Based on ResNet Lightweight Network. In Proceedings of the 2nd International Conference on Applied Physics and Computing (ICAPC 2021). IOP Publishing Ltd, Ottawa, ON. 1-6. https://doi.org/10.1088/1742-6596/2083/3/032087Google ScholarCross Ref
- Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., & Polosukhin, I. 2017. Attention is all you need. In Proceedings of the 31st Annual Conference on Neural Information Processing Systems (NIPS 2017). Neural information processing systems foundation, Long Beach, CA. 5999-6009. https://doi.org/10.48550/arXiv.1706.03762Google ScholarCross Ref
- Wu, H., Xiao, B., Codella, N., Liu, M., Dai, X., Yuan, L., & Zhang, L. 2021. Cvt: Introducing convolutions to vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV 2021). Institute of Electrical and Electronics Engineers Inc, Virtual, Online, Canada. 22-31. https://doi.org/10.1109/ICCV48922.2021.00009Google ScholarCross Ref
- Srinivas, A., Lin, T. Y., Parmar, N., Shlens, J., Abbeel, P., & Vaswani, A. 2021. Bottleneck transformers for visual recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR 2021), IEEE, Virtual, Online, United states. 16519-16529. https://doi.org/10.1109/CVPR46437.2021.01625Google ScholarCross Ref
- Zhang, XM., Sun, GY., Jia, XP., Wu, LX., Zhang, AZ., Ren, JC., Fu, H., & Yao, YJ. (2022). “Spectral–Spatial Self-Attention Networks for Hyperspectral Image Classification”. IEEE Transactions on Geoscience and Remote Sensing, 60:1-15.Google Scholar
- Guo, QP., Qiu, XP., Xue, XY., & Zhang, Z. (2019). “Low-Rank and Locality Constrained Self-Attention for Sequence Modeling”. IEEE/ACM Transactions on Audio, Speech, and Language Processing Ramachandran, 27(12): 2213 – 2222.Google ScholarDigital Library
- Vaswani, A., Bello, I., Levskaya, A., & Shlens, J. 2019. Stand-alone self-attention in vision models. In Proceedings of the 33rd Annual Conference on Neural Information Processing Systems (NIPS 2019). Neural information processing systems foundation, Vancouver, BC. 1-15. https://doi.org/10.48550/arXiv.1906.05909Google ScholarCross Ref
- Tan, H., Liu, X., Yin, B., & Li, X. (2022). “MHSA-Net: Multihead self-attention network for occluded person re-identification”. IEEE Transactions on Neural Networks and Learning Systems. 14(8):1-15. https://doi.org/10.1109/TNNLS.2022.3144163Google ScholarCross Ref
- Xiao, L., Hu, X., Chen, Y., Xue, Y., & Tang, B. (2020). “Multi-head self-attention based gated graph convolutional networks for aspect-based sentiment classification”. Multimedia Tools and Applications. 81(14): 19051-19070.Google ScholarDigital Library
- Liu, Y., & Ning, K. (2022). “Multi-Stage Transformer 3D Object Detection Method”. Frontiers in Computing and Intelligent Systems, 1(2): 27-30.Google ScholarCross Ref
Index Terms
- A Person Re-identification Method Fusing Bottleneck Transformer andRelation-aware Global Attention
Recommendations
Person re-identification fusing via ranking consensus
ICIMCS '15: Proceedings of the 7th International Conference on Internet Multimedia Computing and ServicePerson Re-identification (Re-ID) has become an important application in video surveillance. This paper investigates the problem of fusing multiple matching and ranking results from different Re-ID methods. Ranking consensus information of various ...
Transformer-based global–local feature learning model for occluded person re-identification
AbstractMost recent occluded person re-identification (re-ID) methods usually learn global features directly from pedestrian images, or use additional pose estimation and semantic analysis model to learn local features, while ignoring the relationship ...
Multi-scale local-global architecture for person re-identification
AbstractWith the emergence of deep learning method, which has been driven a great success for the field of person re-identification (re-ID). However, the existing works mainly focus on first-order attention (i.e., spatial and channels attention) ...
Comments