Proposal-Based Graph Attention Networks for Workflow Detection

Zhang, Min; Hu, Haiyang; Li, Zhongjin; Chen, Jie

doi:10.1007/s11063-021-10622-7

Proposal-Based Graph Attention Networks for Workflow Detection

Published: 13 August 2021

Volume 54, pages 101–123, (2022)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

Min Zhang^1,2,
Haiyang Hu ORCID: orcid.org/0000-0002-6070-8524¹,
Zhongjin Li¹ &
…
Jie Chen¹

409 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

In the process of “Industry 4.0”, video analysis plays a vital role in a variety of industrial applications. Video-based action detection has obtained promising performance in computer vision community. However, in complex factory environment, how to detect workflow of both machines and workers in production process is not well resolved. To solve this issue, we propose a generic proposal based Graph Attention Networks for workflow detection. Specifically, an efficient and effective action proposal method is firstly employed to generate workflow proposals. Then, these proposals and their relations are exploited for proposal graph construction. Here, two types of relationships are considered for identifying the workflow phases, which are contextual and surrounding relations to capture context information and characterize the correlations between different workflow instances. To improve the recognition accuracy, within-category and between-category attention are incorporated to learn long-range and dynamic dependencies respectively. Thus, the capability of feature representation for workflow detection can be greatly enhanced. Experimental results verify that the proposed approach is considerably improved upon the state-of-the-arts on THUMOS’14 and a practical workflow dataset, achieving 6.7% and 3.9% absolute improvement compared to the advanced GTAN detector at tIoU threshold 0.4, respectively. Moreover, augmentation experiments are carried out on ActivityNet1.3 to prove the effectiveness of performance improvement by modeling workflow proposal relationships.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Attention-based encoder-decoder networks for workflow recognition

Article 06 March 2021

Spot What Matters: Learning Context Using Graph Convolutional Networks for Weakly-Supervised Action Detection

LGAFormer: transformer with local and global attention for action detection

Article 06 May 2024

References

Jalal A, Kamal S, Kim DS (2018) Detecting complex 3D human motions with body model low-rank representation for real-time smart activity monitoring system. KSII Trans Int Inform Syst 12(3)
Chen Y, Zhao D, Lv L et al (2018) Multi-task learning for dangerous object detection in autonomous driving. Inform Sci 432:559–571
Article Google Scholar
SalazarAutores DRC, Maldonado CBG, Alvarado HFG et al (2018) Patterns for semantic human behavior analysis. In: Iberian Conference on Information Systems and Technologies (CISTI), pp 1–5
Voulodimos A, Kosmopoulos D, Vasileiou G et al (2011) A dataset for workflow recognition in industrial scenes. In: IEEE International Conference on Image Processing, pp 3249–3252
Voulodimos A, Kosmopoulos D, Vasileiou G et al (2012) A threefold dataset for activity and workflow recognition in complex industrial environments. IEEE MultiMedia 19(3):42–52
Article Google Scholar
Li Z, Hu H, Hu H et al (2018) Multi-objective scheduling for scientific workflow in multicloud environment. J Netw Comput Appl 114:108–122
Article Google Scholar
Li Z, Ge J, Hu H et al (2015) Cost and energy aware scheduling algorithm for scientific workflows with deadline constraint in clouds. IEEE Trans Serv Comput 11(4):713–726
Article Google Scholar
Feichtenhofer C, Pinz A, Zisserman A (2016) Convolutional two-stream network fusion for video action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1933–1941
Karpathy A, Toderici G, Shetty S et al (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp 1725–1732
Shou Z, Wang D, Chang SF (2016) Temporal action localization in untrimmed videos via multi-stage cnns. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1049–1058
Chao YW, Vijayanarasimhan S, Seybold B et al (2018) Rethinking the faster r-cnn architecture for temporal action localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1130–1139
Protopapadakis EE, Doulamis AD, Doulamis ND (2013) Tapped delay multiclass support vector machines for industrial workflow recognition. In: International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS), pp 1–4
Veres G, Grabner H, Middleton L et al (2010) Automatic workflow monitoring in industrial environments. In: Asian Conference on Computer Vision, pp 200–213
Zolfaghari M, Singh K, Brox T (2018) Eco: Efficient convolutional network for online video understanding. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 695–712
Li C, Zhong Q, Xie D (2019) Collaborative Spatiotemporal Feature Learning for Video Action Recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7872–7881
Pinhanez CS, Bobick AF (1997) Intelligent studios modeling space and action to control tv cameras. Appl Art Intell 11(4):285–305
Article Google Scholar
Koile K, Tollmar K, Demirdjian D et al (2003) Activity zones for context-aware computing. In: International Conference on Ubiquitous Computing, pp 90–106
Xiang T, Gong S (2008) Optimising dynamic graphical models for video content analysis. Comput Vision Image Understand 112(3):310–323
Article Google Scholar
Vu VT, Brémond F, Thonnat M (2003) Automatic video interpretation: a novel algorithm for temporal scenario recognition. In: International joint conference on artificial intelligence, pp 1295–1300
Shi Y, Bobick A, Essa I (2006) Learning temporal sequence model from partially labeled data. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 1631–1638
Jin Y, Dou Q, Chen H et al (2017) SV-RCNet: workflow recognition from surgical videos using recurrent convolutional network. IEEE Trans Med Imag 37(5):1114–1126
Article Google Scholar
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. arXiv:1406.2199
Ji S, Xu W, Yang M (2012) 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231
Article Google Scholar
Ma S, Sigal L, Sclaroff S (2016) Learning activity progression in lstms for activity detection and early detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1942–1950
Yu G, Yuan J (2015) Fast action proposals for human action detection and search. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1302–1311
Caba Heilbron F, Carlos Niebles J, Ghanem B (2016) Fast temporal activity proposals for efficient detection of human actions in untrimmed videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1914–1923
Escorcia V, Heilbron FC, Niebles JC et al (2016) Daps: Deep action proposals for action understanding. In: European Conference on Computer Vision, pp 768–784
Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. arXiv:1609.02907
Tan M, Shi Q, van den Hengel A et al (2015) Learning graph structure for multi-label image classification via clique generation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4100–4109
Wang X, Gupta A (2018) Videos as space-time region graphs. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 399–417
Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Thirty-Second AAAI Conference on Artificial Intelligence
Zhao L, Peng X, Tian Y et al (2019) Semantic Graph Convolutional Networks for 3D Human Pose Regression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3425–3435
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv:1409.0473
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Polosukhin I (2017) Attention is all you need. arXiv:1706.03762
Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Bengio Y (2015) Show, attend and tell: Neural image caption generation with visual attention. In: International conference on machine learning, pp 2048–2057
Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Tang X (2017) Residual attention network for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156–3164
Wang L, Huang Y, Hou Y, Zhang S, Shan J (2019) Graph attention convolution for point cloud semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10296–10305
Yu J, Tao D, Wang M, Rui Y (2014) Learning to rank using user clicks and visual features for image retrieval. IEEE Trans Cybern 45(4):767–779
Article Google Scholar
Hong C, Yu J, Tao D, Wang M (2014) Image-based three-dimensional human pose recovery by multiview locality-sensitive sparse retrieval. IEEE Trans Indus Electron 62(6):3742–3751
Google Scholar
Hong C, Yu J, Wan J, Tao D, Wang M (2015) Multimodal deep autoencoder for human pose recovery. IEEE Trans Image Process 24(12):5659–5670
Article MathSciNet Google Scholar
Hong C, Yu J, Zhang J, Jin X, Lee KH (2018) Multimodal face-pose estimation with multitask manifold deep learning. IEEE Trans Indus Inform 15(7):3952–3961
Article Google Scholar
Yu J, Tan M, Zhang H, Tao D, Rui Y (2019) Hierarchical deep click feature prediction for fine-grained image recognition. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2019.2932058
Article Google Scholar
Carreira J, Zisserman A (2017) Quo vadis, action recognition? a new model and the kinetics dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6299–6308
Dai X, Singh B, Zhang G et al (2017) Temporal context network for activity localization in videos. In: Proceedings of the IEEE International Conference on Computer Vision, pp 5793–5802
Veličković P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y (2017) Graph attention networks. arXiv:1710.10903
Zhao Y, Xiong Y, Wang L, et al (2017) Temporal action detection with structured segment networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2914–2923
Hamilton WL, Ying R, Leskovec J (2017) Inductive representation learning on large graphs. arXiv:1706.02216
Jiang YG, Liu J, Roshan Zamir A, Toderici G, Laptev I, Shah M, Sukthankar R (2014) THUMOS challenge: action recognition with a large number of classes. http://crcv.ucf.edu/THUMOS14/
Zhang L, Wang QW (2018) XIOLIFT Database, https://pan.baidu.com/s/lySILNURWD-N40q5TpAvGKUA
Caba Heilbron F, Escorcia V, Ghanem B et al (2015) Activitynet: A large-scale video benchmark for human activity understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 961–970
Lin T, Zhao X, Su H et al (2018) Bsn: Boundary sensitive network for temporal action proposal generation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 3–19
Li J, Liu X, Zong Z, Zhao W, Zhang M, Song J (2020) Graph Attention Based Proposal 3D ConvNets for Action Detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 4626–4633
Oneata D, Verbeek J, Schmid C (2014) The lear submission at thumos 2014.https://hal.inria.fr/hal-01074442/
Richard A, Gall J (2016) Temporal action detection using a statistical language model. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3131-3140
Shou Z, Chan J, Zareian A et al (2017) Cdc: Convolutional-de-convolutional networks for precise temporal action localization in untrimmed videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5734-5743
Yuan Z, Stroud JC, Lu T et al (2017) Temporal action localization by structured maximal sums. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3684-3692
Long F, Yao T, Qiu Z, Tian X, Luo J, Mei T (2019) Gaussian temporal awareness networks for action localization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 344-353
Wang L, Xiong Y, Lin D (2017) Untrimmednets for weakly supervised action recognition and detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4325-4334
Oyedotun OK, Aouada D (2020) Why do deep neural networks with skip connections and concatenated hidden representations work?. In: International Conference on Neural Information Processing, pp 380-392

Download references

Acknowledgements

This research is based upon work partially supported by National Natural Science Foundation of China (Grant no. 61572251, 61572162, 61702144 and 61802095), the Natural Science Foundation of Zhejiang Province (LQ17F020003), the Key Science and Technology Project Foundation of Zhejiang Province (2018C01012).

Author information

Authors and Affiliations

School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, China
Min Zhang, Haiyang Hu, Zhongjin Li & Jie Chen
Department of Design and Art, Zhejiang Industry Polytechnic College, Shaoxing, China
Min Zhang

Authors

Min Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Haiyang Hu
View author publications
You can also search for this author in PubMed Google Scholar
Zhongjin Li
View author publications
You can also search for this author in PubMed Google Scholar
Jie Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haiyang Hu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, M., Hu, H., Li, Z. et al. Proposal-Based Graph Attention Networks for Workflow Detection. Neural Process Lett 54, 101–123 (2022). https://doi.org/10.1007/s11063-021-10622-7

Download citation

Accepted: 05 August 2021
Published: 13 August 2021
Issue Date: February 2022
DOI: https://doi.org/10.1007/s11063-021-10622-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Proposal-Based Graph Attention Networks for Workflow Detection

Abstract

Access this article

Similar content being viewed by others

Attention-based encoder-decoder networks for workflow recognition

Spot What Matters: Learning Context Using Graph Convolutional Networks for Weakly-Supervised Action Detection

LGAFormer: transformer with local and global attention for action detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Proposal-Based Graph Attention Networks for Workflow Detection

Abstract

Access this article

Similar content being viewed by others

Attention-based encoder-decoder networks for workflow recognition

Spot What Matters: Learning Context Using Graph Convolutional Networks for Weakly-Supervised Action Detection

LGAFormer: transformer with local and global attention for action detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation