Effective skeleton topology and semantics-guided adaptive graph convolution network for action recognition

Qiu, Zhong-Xiang; Zhang, Hong-Bo; Deng, Wei-Mo; Du, Ji-Xiang; Lei, Qing; Zhang, Guo-Liang

doi:10.1007/s00371-022-02473-7

Effective skeleton topology and semantics-guided adaptive graph convolution network for action recognition

Original article
Published: 17 April 2022

Volume 39, pages 2191–2203, (2023)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Zhong-Xiang Qiu¹,
Hong-Bo Zhang ORCID: orcid.org/0000-0001-5536-5224¹,
Wei-Mo Deng²,
Ji-Xiang Du³,
Qing Lei² &
…
Guo-Liang Zhang³

434 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

Most of the existing graph convolutional network-based action recognition methods use an adaptive mechanism to learn action features from a skeleton sequence. Although this mechanism improves the recognition accuracy to some extent, its performance is still limited by the initial skeleton topology, which uses a natural human connection approach to connect skeletal joints. In addition, the semantic information of skeletal joints is naturally informative and discriminative for action recognition tasks, but its inclusion has rarely been investigated in the existing methods. To solve these problems, in this work, we propose a novel multistream-based effective skeleton topology and semantically guided adaptive graph convolution network for action recognition. By comparing several different topological graphs, we design an elbow- and knee-centric topology structure that forms the input to the adaptive graph convolutional network. Moreover, we explicitly embed the high-level semantic skeletal information into this network to enhance the feature representation capabilities. Finally, we study the positional relationships between different joints and the center of gravity in the same frame to generate relative position data. They are combined with the joint data, bone data and their corresponding motion information by a multistream network to further improve the action recognition accuracy. Extensive experiments show that the proposed method achieves state-of-the-art performance levels.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deformable graph convolutional transformer for skeleton-based action recognition

Article 17 November 2022

Skeleton-based multi-stream adaptive-attentional sub-graph convolution network for action recognition

Article 12 May 2023

A lightweight graph convolutional network for skeleton-based action recognition

Article 18 June 2022

References

Ardianto, S., Hang, H.M.: Multi-view and multi-modal action recognition with learned fusion. In: 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 1601–1604. IEEE (2018)
Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7291–7299 (2017)
Carreira, J., Zisserman, A.: Quo vadis, action recognition? a new model and the kinetics dataset. In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6299–6308 (2017)
Chan, W., Tian, Z., Wu, Y.: Gas-gcn: Gated action-specific graph convolutional networks for skeleton-based action recognition. Sensors 20(12), 3499 (2020)
Article Google Scholar
Cheng, K., Zhang, Y., He, X., Chen, W., Cheng, J., Lu, H.: Skeleton-based action recognition with shift graph convolutional network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 183–192 (2020)
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)
Du, Y., Fu, Y., Wang, L.: Representation learning of temporal dynamics for skeleton-based action recognition. IEEE Trans. Image Process. 25(7), 3010–3022 (2016)
Article MathSciNet MATH Google Scholar
Du, Y., Wang, W., Wang, L.: Hierarchical recurrent neural network for skeleton based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1110–1118 (2015)
Fernando, B., Gavves, E., Oramas, J.M., Ghodrati, A., Tuytelaars, T.: Modeling video evolution for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5378–5387 (2015)
Gupta, P., Thatipelli, A., Aggarwal, A., Maheshwari, S., Trivedi, N., Das, S., Sarvadevabhatla, R.K.: Quo vadis, skeleton action recognition? arXiv preprint arXiv:2007.02072 (2020)
Han, J., Shao, L., Xu, D., Shotton, J.: Enhanced computer vision with microsoft kinect sensor: A review. IEEE Trans. Cybernet. 43(5), 1318–1334 (2013)
Article Google Scholar
Heidari, N., Iosifidis, A.: Temporal attention-augmented graph convolutional network for efficient skeleton-based human action recognition. arXiv preprint arXiv:2010.12221 (2020)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Hu, J.F., Zheng, W.S., Ma, L., Wang, G., Lai, J.: Real-time rgb-d activity prediction by soft regression. In: European Conference on Computer Vision, pp. 280–296. Springer (2016)
Ke, Q., Bennamoun, M., An, S., Sohel, F., Boussaid, F.: A new representation of skeleton sequences for 3d action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3288–3297 (2017)
Kim, T.S., Reiter, A.: Interpretable 3d human action analysis with temporal convolutional networks. In: 2017 IEEE conference on computer vision and pattern recognition workshops (CVPRW), pp. 1623–1631. IEEE (2017)
Li, B., Dai, Y., Cheng, X., Chen, H., Lin, Y., He, M.: Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep cnn. In: 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), pp. 601–604. IEEE (2017)
Li, C., Zhong, Q., Xie, D., Pu, S.: Skeleton-based action recognition with convolutional neural networks. In: 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), pp. 597–600. IEEE (2017)
Li, C., Zhong, Q., Xie, D., Pu, S.: Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation. arXiv preprint arXiv:1804.06055 (2018)
Li, D., Jahan, H., Huang, X., Feng, Z.: Human action recognition method based on historical point cloud trajectory characteristics. Vis. Comput. (8) (2021)
Li, F., Li, J., Zhu, A., Xu, Y., Yin, H., Hua, G.: Enhanced spatial and extended temporal graph convolutional network for skeleton-based action recognition. Sensors 20(18), 5260 (2020)
Article Google Scholar
Li, F., Zhu, A., Xu, Y., Cui, R., Hua, G.: Multi-stream and enhanced spatial-temporal graph convolution network for skeleton-based action recognition. IEEE Access 8, 97757–97770 (2020)
Article Google Scholar
Li, S., Li, W., Cook, C., Zhu, C., Gao, Y.: Independently recurrent neural network (indrnn): Building a longer and deeper rnn. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5457–5466 (2018)
Li, W., Liu, X., Liu, Z., Du, F., Zou, Q.: Skeleton-based action recognition using multi-scale and multi-stream improved graph convolutional network. IEEE Access 8, 144529–144542 (2020)
Article Google Scholar
Liu, H., Tu, J., Liu, M.: Two-stream 3d convolutional neural network for skeleton-based action recognition. arXiv preprint arXiv:1705.08106 (2017)
Liu, J., Shahroudy, A., Xu, D., Wang, G.: Spatio-temporal lstm with trust gates for 3d human action recognition. In: European conference on computer vision, pp. 816–833. Springer (2016)
Liu, M., Liu, H., Chen, C.: Enhanced skeleton visualization for view invariant human action recognition. Pattern Recogn. 68, 346–362 (2017)
Article Google Scholar
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017)
Peng, W., Hong, X., Chen, H., Zhao, G.: Learning graph convolutional network for skeleton-based human action recognition by neural searching. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 2669–2676 (2020)
Peng, W., Hong, X., Zhao, G.: Video action recognition via neural architecture searching. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 11–15 (2019). https://doi.org/10.1109/ICIP.2019.8802919
Peng, W., Hong, X., Zhao, G.: Tripool: Graph triplet pooling for 3d skeleton-based action recognition. Pattern Recogn. 115, 107921 (2021)
Article Google Scholar
Peng, W., Shi, J., Xia, Z., Zhao, G.: Mix dimension in poincaré geometry for 3d skeleton-based action recognition. In: Proceedings of the 28th ACM International Conference on Multimedia, MM ’20, p. 1432-1440. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3394171.3413910
Peng, W., Shi, J., Zhao, G.: Spatial temporal graph deconvolutional network for skeleton-based human action recognition. IEEE Signal Process. Lett. 28, 244–248 (2021). https://doi.org/10.1109/LSP.2021.3049691
Article Google Scholar
Presti, L.L., La Cascia, M.: 3d skeleton-based human action classification: A survey. Pattern Recogn. 53, 130–147 (2016)
Article Google Scholar
Shahroudy, A., Liu, J., Ng, T.T., Wang, G.: Ntu rgb+ d: A large scale dataset for 3d human activity analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1010–1019 (2016)
Shi, H., Peng, W., Liu, X., Zhao, G.: Graph adversarial learning for noisy skeleton-based action recognition. Electronic Imaging 2021(10), 239–1–239–7 (2021). https://doi.org/10.2352/ISSN.2470-1173.2021.10.IPAS-239. https://www.ingentaconnect.com/content/ist/ei/2021/00002021/00000010/art00007
Shi, L., Zhang, Y., Cheng, J., Lu, H.: Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12026–12035 (2019)
Shi, L., Zhang, Y., Cheng, J., Lu, H.: Skeleton-based action recognition with multi-stream adaptive graph convolutional networks. IEEE Trans. Image Process. 29, 9532–9545 (2020)
Article MATH Google Scholar
Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-time human pose recognition in parts from single depth images. In: CVPR 2011, pp. 1297–1304. Ieee (2011)
Si, C., Chen, W., Wang, W., Wang, L., Tan, T.: An attention enhanced graph convolutional lstm network for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1227–1236 (2019)
Si, C., Jing, Y., Wang, W., Wang, L., Tan, T.: Skeleton-based action recognition with spatial reasoning and temporal stack learning. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 103–118 (2018)
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. arXiv preprint arXiv:1406.2199 (2014)
Song, S., Lan, C., Xing, J., Zeng, W., Liu, J.: An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In: Proceedings of the AAAI conference on artificial intelligence, vol. 31 (2017)
Subetha, T., Chitrakala, S.: A survey on human activity recognition from videos. In: 2016 International Conference on Information Communication and Embedded Systems (ICICES), pp. 1–7. IEEE (2016)
Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5693–5703 (2019)
Tang, Y., Tian, Y., Lu, J., Li, P., Zhou, J.: Deep progressive reinforcement learning for skeleton-based action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5323–5332 (2018)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. arXiv preprint arXiv:1706.03762 (2017)
Vemulapalli, R., Arrate, F., Chellappa, R.: Human action recognition by representing 3d skeletons as points in a lie group. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 588–595 (2014)
Wang, H., Schmid, C.: Action recognition with improved trajectories. In: Proceedings of the IEEE international conference on computer vision, pp. 3551–3558 (2013)
Wang, Y., Zhou, L., Qiao, Y.: Temporal hallucinating for action recognition with few still images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5314–5322 (2018)
Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., Philip, S.Y.: A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems (2020)
Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Proceedings of the AAAI conference on artificial intelligence, vol. 32 (2018)
Zhang, P., Lan, C., Xing, J., Zeng, W., Xue, J., Zheng, N.: View adaptive recurrent neural networks for high performance human action recognition from skeleton data. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2117–2126 (2017)
Zhang, P., Lan, C., Zeng, W., Xing, J., Xue, J., Zheng, N.: Semantics-guided neural networks for efficient skeleton-based human action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1112–1121 (2020)
Zheng, H., Fu, J., Zha, Z.J., Luo, J.: Learning deep bilinear transformation for fine-grained image representation. arXiv preprint arXiv:1911.03621 (2019)
Zheng, W., Li, L., Zhang, Z., Huang, Y., Wang, L.: Relational network for skeleton-based action recognition. In: 2019 IEEE International Conference on Multimedia and Expo (ICME), pp. 826–831. IEEE (2019)

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their valuable and insightful comments on an earlier version of this manuscript. This work was supported by the Natural Science Foundation of China (No. 61871196, 61902330, 62001176); National Key Research and Development Program of China (NO.2019YFC1604700); Natural Science Foundation of Fujian Province of China (No. 2019J01082 and 2020J01085); and the Promotion Program for Young and Middle-aged Teacher in Science and Technology Research of Huaqiao University (ZQN-YX601).

Author information

Authors and Affiliations

School of Computer Science and Technology, Huaqiao University, Xiamen, 361000, China
Zhong-Xiang Qiu & Hong-Bo Zhang
Xiamen Key Laboratory of Computer Vision and Pattern Recognition, Huaqiao University, Xiamen, 361000, China
Wei-Mo Deng & Qing Lei
Fujian Key Laboratory of Big Data Intelligence and Security, Huaqiao University, Xiamen, 361000, China
Ji-Xiang Du & Guo-Liang Zhang

Authors

Zhong-Xiang Qiu
View author publications
You can also search for this author in PubMed Google Scholar
Hong-Bo Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Wei-Mo Deng
View author publications
You can also search for this author in PubMed Google Scholar
Ji-Xiang Du
View author publications
You can also search for this author in PubMed Google Scholar
Qing Lei
View author publications
You can also search for this author in PubMed Google Scholar
Guo-Liang Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hong-Bo Zhang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Qiu, ZX., Zhang, HB., Deng, WM. et al. Effective skeleton topology and semantics-guided adaptive graph convolution network for action recognition. Vis Comput 39, 2191–2203 (2023). https://doi.org/10.1007/s00371-022-02473-7

Download citation

Accepted: 20 March 2022
Published: 17 April 2022
Issue Date: May 2023
DOI: https://doi.org/10.1007/s00371-022-02473-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Effective skeleton topology and semantics-guided adaptive graph convolution network for action recognition

Abstract

Access this article

Similar content being viewed by others

Deformable graph convolutional transformer for skeleton-based action recognition

Skeleton-based multi-stream adaptive-attentional sub-graph convolution network for action recognition

A lightweight graph convolutional network for skeleton-based action recognition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Effective skeleton topology and semantics-guided adaptive graph convolution network for action recognition

Abstract

Access this article

Similar content being viewed by others

Deformable graph convolutional transformer for skeleton-based action recognition

Skeleton-based multi-stream adaptive-attentional sub-graph convolution network for action recognition

A lightweight graph convolutional network for skeleton-based action recognition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation