Skip to main content
Log in

Effective skeleton topology and semantics-guided adaptive graph convolution network for action recognition

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Most of the existing graph convolutional network-based action recognition methods use an adaptive mechanism to learn action features from a skeleton sequence. Although this mechanism improves the recognition accuracy to some extent, its performance is still limited by the initial skeleton topology, which uses a natural human connection approach to connect skeletal joints. In addition, the semantic information of skeletal joints is naturally informative and discriminative for action recognition tasks, but its inclusion has rarely been investigated in the existing methods. To solve these problems, in this work, we propose a novel multistream-based effective skeleton topology and semantically guided adaptive graph convolution network for action recognition. By comparing several different topological graphs, we design an elbow- and knee-centric topology structure that forms the input to the adaptive graph convolutional network. Moreover, we explicitly embed the high-level semantic skeletal information into this network to enhance the feature representation capabilities. Finally, we study the positional relationships between different joints and the center of gravity in the same frame to generate relative position data. They are combined with the joint data, bone data and their corresponding motion information by a multistream network to further improve the action recognition accuracy. Extensive experiments show that the proposed method achieves state-of-the-art performance levels.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Ardianto, S., Hang, H.M.: Multi-view and multi-modal action recognition with learned fusion. In: 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 1601–1604. IEEE (2018)

  2. Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7291–7299 (2017)

  3. Carreira, J., Zisserman, A.: Quo vadis, action recognition? a new model and the kinetics dataset. In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6299–6308 (2017)

  4. Chan, W., Tian, Z., Wu, Y.: Gas-gcn: Gated action-specific graph convolutional networks for skeleton-based action recognition. Sensors 20(12), 3499 (2020)

    Article  Google Scholar 

  5. Cheng, K., Zhang, Y., He, X., Chen, W., Cheng, J., Lu, H.: Skeleton-based action recognition with shift graph convolutional network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 183–192 (2020)

  6. Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)

  7. Du, Y., Fu, Y., Wang, L.: Representation learning of temporal dynamics for skeleton-based action recognition. IEEE Trans. Image Process. 25(7), 3010–3022 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  8. Du, Y., Wang, W., Wang, L.: Hierarchical recurrent neural network for skeleton based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1110–1118 (2015)

  9. Fernando, B., Gavves, E., Oramas, J.M., Ghodrati, A., Tuytelaars, T.: Modeling video evolution for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5378–5387 (2015)

  10. Gupta, P., Thatipelli, A., Aggarwal, A., Maheshwari, S., Trivedi, N., Das, S., Sarvadevabhatla, R.K.: Quo vadis, skeleton action recognition? arXiv preprint arXiv:2007.02072 (2020)

  11. Han, J., Shao, L., Xu, D., Shotton, J.: Enhanced computer vision with microsoft kinect sensor: A review. IEEE Trans. Cybernet. 43(5), 1318–1334 (2013)

    Article  Google Scholar 

  12. Heidari, N., Iosifidis, A.: Temporal attention-augmented graph convolutional network for efficient skeleton-based human action recognition. arXiv preprint arXiv:2010.12221 (2020)

  13. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  14. Hu, J.F., Zheng, W.S., Ma, L., Wang, G., Lai, J.: Real-time rgb-d activity prediction by soft regression. In: European Conference on Computer Vision, pp. 280–296. Springer (2016)

  15. Ke, Q., Bennamoun, M., An, S., Sohel, F., Boussaid, F.: A new representation of skeleton sequences for 3d action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3288–3297 (2017)

  16. Kim, T.S., Reiter, A.: Interpretable 3d human action analysis with temporal convolutional networks. In: 2017 IEEE conference on computer vision and pattern recognition workshops (CVPRW), pp. 1623–1631. IEEE (2017)

  17. Li, B., Dai, Y., Cheng, X., Chen, H., Lin, Y., He, M.: Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep cnn. In: 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), pp. 601–604. IEEE (2017)

  18. Li, C., Zhong, Q., Xie, D., Pu, S.: Skeleton-based action recognition with convolutional neural networks. In: 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), pp. 597–600. IEEE (2017)

  19. Li, C., Zhong, Q., Xie, D., Pu, S.: Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation. arXiv preprint arXiv:1804.06055 (2018)

  20. Li, D., Jahan, H., Huang, X., Feng, Z.: Human action recognition method based on historical point cloud trajectory characteristics. Vis. Comput. (8) (2021)

  21. Li, F., Li, J., Zhu, A., Xu, Y., Yin, H., Hua, G.: Enhanced spatial and extended temporal graph convolutional network for skeleton-based action recognition. Sensors 20(18), 5260 (2020)

    Article  Google Scholar 

  22. Li, F., Zhu, A., Xu, Y., Cui, R., Hua, G.: Multi-stream and enhanced spatial-temporal graph convolution network for skeleton-based action recognition. IEEE Access 8, 97757–97770 (2020)

    Article  Google Scholar 

  23. Li, S., Li, W., Cook, C., Zhu, C., Gao, Y.: Independently recurrent neural network (indrnn): Building a longer and deeper rnn. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5457–5466 (2018)

  24. Li, W., Liu, X., Liu, Z., Du, F., Zou, Q.: Skeleton-based action recognition using multi-scale and multi-stream improved graph convolutional network. IEEE Access 8, 144529–144542 (2020)

    Article  Google Scholar 

  25. Liu, H., Tu, J., Liu, M.: Two-stream 3d convolutional neural network for skeleton-based action recognition. arXiv preprint arXiv:1705.08106 (2017)

  26. Liu, J., Shahroudy, A., Xu, D., Wang, G.: Spatio-temporal lstm with trust gates for 3d human action recognition. In: European conference on computer vision, pp. 816–833. Springer (2016)

  27. Liu, M., Liu, H., Chen, C.: Enhanced skeleton visualization for view invariant human action recognition. Pattern Recogn. 68, 346–362 (2017)

    Article  Google Scholar 

  28. Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017)

  29. Peng, W., Hong, X., Chen, H., Zhao, G.: Learning graph convolutional network for skeleton-based human action recognition by neural searching. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 2669–2676 (2020)

  30. Peng, W., Hong, X., Zhao, G.: Video action recognition via neural architecture searching. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 11–15 (2019). https://doi.org/10.1109/ICIP.2019.8802919

  31. Peng, W., Hong, X., Zhao, G.: Tripool: Graph triplet pooling for 3d skeleton-based action recognition. Pattern Recogn. 115, 107921 (2021)

    Article  Google Scholar 

  32. Peng, W., Shi, J., Xia, Z., Zhao, G.: Mix dimension in poincaré geometry for 3d skeleton-based action recognition. In: Proceedings of the 28th ACM International Conference on Multimedia, MM ’20, p. 1432-1440. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3394171.3413910

  33. Peng, W., Shi, J., Zhao, G.: Spatial temporal graph deconvolutional network for skeleton-based human action recognition. IEEE Signal Process. Lett. 28, 244–248 (2021). https://doi.org/10.1109/LSP.2021.3049691

    Article  Google Scholar 

  34. Presti, L.L., La Cascia, M.: 3d skeleton-based human action classification: A survey. Pattern Recogn. 53, 130–147 (2016)

    Article  Google Scholar 

  35. Shahroudy, A., Liu, J., Ng, T.T., Wang, G.: Ntu rgb+ d: A large scale dataset for 3d human activity analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1010–1019 (2016)

  36. Shi, H., Peng, W., Liu, X., Zhao, G.: Graph adversarial learning for noisy skeleton-based action recognition. Electronic Imaging 2021(10), 239–1–239–7 (2021). https://doi.org/10.2352/ISSN.2470-1173.2021.10.IPAS-239. https://www.ingentaconnect.com/content/ist/ei/2021/00002021/00000010/art00007

  37. Shi, L., Zhang, Y., Cheng, J., Lu, H.: Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12026–12035 (2019)

  38. Shi, L., Zhang, Y., Cheng, J., Lu, H.: Skeleton-based action recognition with multi-stream adaptive graph convolutional networks. IEEE Trans. Image Process. 29, 9532–9545 (2020)

    Article  MATH  Google Scholar 

  39. Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-time human pose recognition in parts from single depth images. In: CVPR 2011, pp. 1297–1304. Ieee (2011)

  40. Si, C., Chen, W., Wang, W., Wang, L., Tan, T.: An attention enhanced graph convolutional lstm network for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1227–1236 (2019)

  41. Si, C., Jing, Y., Wang, W., Wang, L., Tan, T.: Skeleton-based action recognition with spatial reasoning and temporal stack learning. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 103–118 (2018)

  42. Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. arXiv preprint arXiv:1406.2199 (2014)

  43. Song, S., Lan, C., Xing, J., Zeng, W., Liu, J.: An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In: Proceedings of the AAAI conference on artificial intelligence, vol. 31 (2017)

  44. Subetha, T., Chitrakala, S.: A survey on human activity recognition from videos. In: 2016 International Conference on Information Communication and Embedded Systems (ICICES), pp. 1–7. IEEE (2016)

  45. Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5693–5703 (2019)

  46. Tang, Y., Tian, Y., Lu, J., Li, P., Zhou, J.: Deep progressive reinforcement learning for skeleton-based action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5323–5332 (2018)

  47. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. arXiv preprint arXiv:1706.03762 (2017)

  48. Vemulapalli, R., Arrate, F., Chellappa, R.: Human action recognition by representing 3d skeletons as points in a lie group. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 588–595 (2014)

  49. Wang, H., Schmid, C.: Action recognition with improved trajectories. In: Proceedings of the IEEE international conference on computer vision, pp. 3551–3558 (2013)

  50. Wang, Y., Zhou, L., Qiao, Y.: Temporal hallucinating for action recognition with few still images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5314–5322 (2018)

  51. Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., Philip, S.Y.: A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems (2020)

  52. Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Proceedings of the AAAI conference on artificial intelligence, vol. 32 (2018)

  53. Zhang, P., Lan, C., Xing, J., Zeng, W., Xue, J., Zheng, N.: View adaptive recurrent neural networks for high performance human action recognition from skeleton data. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2117–2126 (2017)

  54. Zhang, P., Lan, C., Zeng, W., Xing, J., Xue, J., Zheng, N.: Semantics-guided neural networks for efficient skeleton-based human action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1112–1121 (2020)

  55. Zheng, H., Fu, J., Zha, Z.J., Luo, J.: Learning deep bilinear transformation for fine-grained image representation. arXiv preprint arXiv:1911.03621 (2019)

  56. Zheng, W., Li, L., Zhang, Z., Huang, Y., Wang, L.: Relational network for skeleton-based action recognition. In: 2019 IEEE International Conference on Multimedia and Expo (ICME), pp. 826–831. IEEE (2019)

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their valuable and insightful comments on an earlier version of this manuscript. This work was supported by the Natural Science Foundation of China (No. 61871196, 61902330, 62001176); National Key Research and Development Program of China (NO.2019YFC1604700); Natural Science Foundation of Fujian Province of China (No. 2019J01082 and 2020J01085); and the Promotion Program for Young and Middle-aged Teacher in Science and Technology Research of Huaqiao University (ZQN-YX601).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hong-Bo Zhang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qiu, ZX., Zhang, HB., Deng, WM. et al. Effective skeleton topology and semantics-guided adaptive graph convolution network for action recognition. Vis Comput 39, 2191–2203 (2023). https://doi.org/10.1007/s00371-022-02473-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-022-02473-7

Keywords

Navigation