skip to main content
research-article

Adversarial Multi-Grained Embedding Network for Cross-Modal Text-Video Retrieval

Published:16 February 2022Publication History
Skip Abstract Section

Abstract

Cross-modal retrieval between texts and videos has received consistent research interest in the multimedia community. Existing studies follow a trend of learning a joint embedding space to measure the distance between text and video representations. In common practice, video representation is constructed by feeding clips into 3D convolutional neural networks for a coarse-grained global visual feature extraction. In addition, several studies have attempted to align the local objects of video with the text. However, these representations share a drawback of neglecting rich fine-grained relation features capturing spatial-temporal object interactions that benefits mapping textual entities in the real-world retrieval system. To tackle this problem, we propose an adversarial multi-grained embedding network (AME-Net), a novel cross-modal retrieval framework that adopts both fine-grained local relation and coarse-grained global features in bridging text-video modalities. Additionally, with the newly proposed visual representation, we also integrate an adversarial learning strategy into AME-Net, to further narrow the domain gap between text and video representations. In summary, we contribute AME-Net with an adversarial learning strategy for learning a better joint embedding space, and experimental results on MSR-VTT and YouCook2 datasets demonstrate that our proposed framework consistently outperforms the state-of-the-art method.

REFERENCES

  1. [1] Carreira Joao and Zisserman Andrew. 2017. Quo vadis, action recognition? A new model and the Kinetics dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 62996308.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Chen Jingjing and Ngo Chong-Wah. 2016. Deep-based ingredient recognition for cooking recipe retrieval. In Proceedings of the ACM International Conference on Multimedia (MM’16). 3241. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. [3] Chen Jing-Jing, Ngo Chong-Wah, Feng Fu-Li, and Chua Tat-Seng. 2018. Deep understanding of cooking procedure for cross-modal recipe retrieval. In Proceedings of the ACM International Conference on Multimedia (MM’18). 10201028. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Chen Kai, Wang Jiaqi, Pang Jiangmiao, Cao Yuhang, Xiong Yu, Li Xiaoxiao, Sun Shuyang, et al. 2019. MMDetection: Open MMLab detection toolbox and benchmark. CoRR abs/1906.07155 (2019). http://arxiv.org/abs/1906.07155.Google ScholarGoogle Scholar
  5. [5] Chen Shizhe, Zhao Yida, Jin Qin, and Wu Qi. 2020. Fine-grained video-text retrieval with hierarchical graph reasoning. In Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR’20). 1063510644.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.Google ScholarGoogle Scholar
  7. [7] Diba Ali, Sharma Vivek, Gool Luc Van, and Stiefelhagen Rainer. 2019. DynamoNet: Dynamic action and motion network. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV’19). 61926201.Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Dong Jianfeng, Li Xirong, Xu Chaoxi, Ji Shouling, He Yuan, Yang Gang, and Wang Xun. 2019. Dual encoding for zero-example video retrieval. In Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR’19). 93469355.Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Faghri Fartash, Fleet David J., Kiros Jamie Ryan, and Fidler Sanja. 2017. VSE++: Improving visual-semantic embeddings with hard negatives. arXiv preprint arXiv:1707.05612.Google ScholarGoogle Scholar
  10. [10] Feng Fangxiang, Wang Xiaojie, Li Ruifan, and Ahmad Ibrar. 2015. Correspondence autoencoders for cross-modal retrieval. ACM Transactions on Multimedia Computing, Communications, and Applications 12, 1s (2015), Article 26, 22 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Feng Zerun, Zeng Zhimin, Guo Caili, and Li Zheng. 2020. Exploiting visual semantic reasoning for video-text retrieval. In Proceedings of the 29th International Joint Conference on Artificial Intelligence (IJCAI’20). 10051011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Gabeur Valentin, Sun Chen, Alahari Karteek, and Schmid Cordelia. 2020. Multi-modal transformer for video retrieval. In Proceedings of the European Conference on Computer Vision (ECCV’20), Vol. 5.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Ganin Yaroslav and Lempitsky Victor. 2015. Unsupervised domain adaptation by backpropagation. In Proceedings of the International Conference on Machine Learning. 11801189. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Ghosh Pallabi, Yao Yi, Davis Larry, and Divakaran Ajay. 2020. Stacked spatio-temporal graph convolutional networks for action segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 576585.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Girshick Ross. 2015. Fast R-CNN. In Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR’15). 14401448. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. [16] Goodfellow Ian, Pouget-Abadie Jean, Mirza Mehdi, Xu Bing, Warde-Farley David, Ozair Sherjil, Courville Aaron, and Bengio Yoshua. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems. 26722680. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] Gu Jiuxiang, Cai Jianfei, Joty Shafiq R., Niu Li, and Wang Gang. 2018. Look, imagine and match: Improving textual-visual cross-modal retrieval with generative models. In Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR’18). 71817189.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Guo Yutian, Chen Jingjing, Zhang Hao, and Jiang Yu-Gang. 2020. Visual relations augmented cross-modal retrieval. In Proceedings of the 2020 International Conference on Multimedia Retrieval (ICMR’20). 915. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Hara Kensho, Kataoka Hirokatsu, and Satoh Yutaka. 2018. Can spatiotemporal 3D CNNs retrace the history of 2D CNN and ImageNet? In Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR’18). 65466555.Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Deep residual learning for image recognition. In Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR’16). 770778.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Hotelling Harold. 1936. Relations between two sets of variates. Biometrika 28, 3–4 (1936), 321377.Google ScholarGoogle Scholar
  22. [22] Hu Han, Gu Jiayuan, Zhang Zheng, Dai Jifeng, and Wei Yichen. 2018. Relation networks for object detection. In Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR’18). 35883597.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Jain Ashesh, Zamir Amir R., Savarese Silvio, and Saxena Ashutosh. 2016. Structural-RNN: Deep learning on spatio-temporal graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 53085317.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Ji Shuiwang, Xu Wei, Yang Ming, and Yu Kai. 2012. 3D convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 1 (2012), 221231. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Johnson Justin, Krishna Ranjay, Stark Michael, Li Li-Jia, Shamma David, Bernstein Michael, and Fei-Fei Li. 2015. Image retrieval using scene graphs. In Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR’15). 36683678.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Kipf Thomas N. and Welling Max. 2017. Semi-supervised classification with graph convolutional networks. In Proceedings of the International Conference on Learning Representations (ICLR’17).Google ScholarGoogle Scholar
  27. [27] Klein Benjamin, Lev Guy, Sadeh Gil, and Wolf Lior. 2015. Associating neural word embeddings with deep image representations using Fisher vectors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 44374446.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Krizhevsky Alex, Sutskever Ilya, and Hinton Geoffrey E.. 2012. ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems 25 (2012). 10971105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Li Fu, Gan Chuang, Liu Xiao, Bian Yunlong, Long Xiang, Li Yandong, Li Zhichao, Zhou Jie, and Wen Shilei. 2017. Temporal modeling approaches for large-scale YouTube-8m video understanding. arXiv preprint arXiv:1707.04555 (2017).Google ScholarGoogle Scholar
  30. [30] Lin Tsung-Yi, Maire Michael, Belongie Serge, Hays James, Perona Pietro, Ramanan Deva, Dollár Piotr, and Zitnick C. Lawrence. 2014. Microsoft COCO: Common objects in context. In Proceedings of the European Conference on Computer Vision (ECCV’14). 740755.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Liu Meng, Wang Xiang, Nie Liqiang, He Xiangnan, Chen Baoquan, and Chua Tat-Seng. 2018. Attentive moment retrieval in videos. In Proceedings of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval (MM’18). 1524. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Liu Yang, Albanie Samuel, Nagrani Arsha, and Zisserman Andrew. 2019. Use what you have: Video retrieval using representations from collaborative experts. arXiv preprint arXiv:1907.13487.Google ScholarGoogle Scholar
  33. [33] Mavroudi Effrosyni, Haro Benjamín Béjar, and Vidal René. 2019. Neural message passing on hybrid spatio-temporal visual and symbolic graphs for video understanding. arXiv preprint arXiv:1905.07385.Google ScholarGoogle Scholar
  34. [34] Miech Antoine, Alayrac Jean-Baptiste, Smaira Lucas, Laptev Ivan, Sivic Josef, and Zisserman Andrew. 2020. End-to-end learning of visual representations from uncurated instructional videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’20). 98769886.Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Miech Antoine, Laptev Ivan, and Sivic Josef. 2018. Learning a text-video embedding from incomplete and heterogeneous data. arXiv preprint arXiv:1804.02516.Google ScholarGoogle Scholar
  36. [36] Miech Antoine, Zhukov Dimitri, Alayrac Jean-Baptiste, Tapaswi Makarand, Laptev Ivan, and Sivic Josef. 2019. HowTo100M: Learning a text-video embedding by watching hundred million narrated video clips. In Proceedings of the International Conference on Computer Vision (ICCV’19). 26302640.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Mithun Niluthpol Chowdhury, Li Juncheng, Metze Florian, and Roy-Chowdhury Amit K.. 2018. Learning joint embedding with multimodal cues for cross-modal video-text retrieval. In Proceedings of the International Conference on Multimedia Retrieval (ICMR’18). 1927. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Pan Boxiao, Cai Haoye, Huang De-An, Lee Kuan-Hui, Gaidon Adrien, Adeli Ehsan, and Niebles Juan Carlos. 2020. Spatio-temporal graph for video captioning with knowledge distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20). 1087010879.Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Pan Pingbo, Xu Zhongwen, Yang Yi, Wu Fei, and Zhuang Yueting. 2016. Hierarchical recurrent neural encoder for video representation with application to captioning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 10291038.Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Peng Yuxin and Qi Jinwei. 2019. CM-GANs: Cross-modal generative adversarial networks for common representation learning. ACM Transactions on Multimedia Computing, Communications, and Applications 15, 1 (2019), 124. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. [41] Peng Yuxin, Qi Jinwei, Huang Xin, and Yuan Yuxin. 2017. CCL: Cross-modal correlation learning with multigrained fusion by hierarchical network. IEEE Transactions on Multimedia 20, 2 (2017), 405420. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. [42] Perronnin Florent and Dance Christopher. 2007. Fisher kernels on visual vocabularies for image categorization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’07). 18.Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] Qian Xufeng, Zhuang Yueting, Li Yimeng, Xiao Shaoning, Pu Shiliang, and Xiao Jun. 2019. Video relation detection with spatio-temporal graph. In Proceedings of the ACM International Conference on Multimedia (MM’19). 8493. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. [44] Rouditchenko Andrew, Boggust Angie, Harwath David, Joshi Dhiraj, Thomas Samuel, Audhkhasi Kartik, Feris Rogerio, et al. 2020. AVLnet: Learning audio-visual language representations from instructional videos. arXiv preprint arXiv:2006.09199.Google ScholarGoogle Scholar
  45. [45] Russakovsky Olga, Deng Jia, Su Hao, Krause Jonathan, Satheesh Sanjeev, Ma Sean, Huang Zhiheng, et al. 2015. ImageNet large scale visual recognition challenge. International Journal of Computer Vision 115, 3 (2015), 211252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. [46] Sarafianos Nikolaos, Xu Xiang, and Kakadiaris Ioannis A.. 2019. Adversarial representation learning for text-to-image matching. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV’19). 58145824.Google ScholarGoogle ScholarCross RefCross Ref
  47. [47] Shang Xindi, Ren Tongwei, Guo Jingfan, Zhang Hanwang, and Chua Tat-Seng. 2017. Video visual relation detection. In Proceedings of the ACM International Conference on Multimedia (MM’17). 13001308. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. [48] Simonyan Karen and Zisserman Andrew. 2014. Two-stream convolutional networks for action recognition in videos. In Advances in Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS’14). 568576. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. [49] Song Yale and Soleymani Mohammad. 2019. Polysemous visual-semantic embedding for cross-modal retrieval. In Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR’19). 19791988.Google ScholarGoogle ScholarCross RefCross Ref
  50. [50] Srivastava Nitish, Mansimov Elman, and Salakhudinov Ruslan. 2015. Unsupervised learning of video representations using LSTMs. In Proceedings of the International Conference on Machine Learning. 843852. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. [51] Teney Damien, Liu Lingqiao, and Hengel Anton van Den. 2017. Graph-structured representations for visual question answering. In Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR’17). 19.Google ScholarGoogle ScholarCross RefCross Ref
  52. [52] Tran Du, Bourdev Lubomir, Fergus Rob, Torresani Lorenzo, and Paluri Manohar. 2015. Learning spatiotemporal features with 3D convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’15). 44894497. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. [53] Tran Du, Wang Heng, Torresani Lorenzo, Ray Jamie, LeCun Yann, and Paluri Manohar. 2018. A closer look at spatiotemporal convolutions for action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 64506459.Google ScholarGoogle ScholarCross RefCross Ref
  54. [54] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Łukasz, and Polosukhin Illia. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. 59986008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. [55] Veličković Petar, Cucurull Guillem, Casanova Arantxa, Romero Adriana, Lio Pietro, and Bengio Yoshua. 2017. Graph attention networks. CoRR abs/1710.10903 (2017). http://arxiv.org/abs/1710.10903.Google ScholarGoogle Scholar
  56. [56] Wang Bokun, Yang Yang, Xu Xing, Hanjalic Alan, and Shen Heng Tao. 2017. Adversarial cross-modal retrieval. In Proceedings of the ACM International Conference on Multimedia (MM’17). 154162. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. [57] Wang Hao, Sahoo Doyen, Liu Chenghao, Lim Ee-Peng, and Hoi Steven C. H.. 2019. Learning cross-modal embeddings with adversarial networks for cooking recipes and food images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). 1157211581.Google ScholarGoogle ScholarCross RefCross Ref
  58. [58] Wang Limin, Xiong Yuanjun, Wang Zhe, and Qiao Yu. 2015. Towards good practices for very deep two-stream ConvNets. arXiv preprint arXiv:1507.02159.Google ScholarGoogle Scholar
  59. [59] Wang Sijin, Wang Ruiping, Yao Ziwei, Shan Shiguang, and Chen Xilin. 2020. Cross-modal scene graph matching for relationship-aware image-text retrieval. In Proceedings of the Winter Conference on Applications of Computer Vision (WACV’20). 15081517.Google ScholarGoogle ScholarCross RefCross Ref
  60. [60] Wang Xiaolong and Gupta Abhinav. 2018. Videos as space-time region graphs. In Proceedings of the European Conference on Computer Vision (ECCV’18). 399417.Google ScholarGoogle ScholarCross RefCross Ref
  61. [61] Wray Michael, Larlus Diane, Csurka Gabriela, and Damen Dima. 2019. Fine-grained action retrieval through multiple parts-of-speech embeddings. In Proceedings of the International Conference on Computer Vision (ICCV’19). 450459.Google ScholarGoogle ScholarCross RefCross Ref
  62. [62] Wu Jianchao, Wang Limin, Wang Li, Guo Jie, and Wu Gangshan. 2019. Learning actor relation graphs for group activity recognition. In Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR’19). 99649974.Google ScholarGoogle ScholarCross RefCross Ref
  63. [63] Xiong Yu, Huang Qingqiu, Guo Lingfeng, Zhou Hang, Zhou Bolei, and Lin Dahua. 2019. A graph-based framework to bridge movies and synopses. In Proceedings of the International Conference on Computer Vision (ICCV’19). 45924601.Google ScholarGoogle ScholarCross RefCross Ref
  64. [64] Xu Danfei, Zhu Yuke, Choy Christopher B., and Fei-Fei Li. 2017. Scene graph generation by iterative message passing. In Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR’17). 54105419.Google ScholarGoogle ScholarCross RefCross Ref
  65. [65] Xu Huijuan, He Kun, Plummer Bryan A., Sigal Leonid, Sclaroff Stan, and Saenko Kate. 2019. Multilevel language and vision integration for text-to-clip retrieval. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI’19). 90629069. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. [66] Xu Jun, Mei Tao, Yao Ting, and Rui Yong. 2016. MSR-VTT: A large video description dataset for bridging video and language. In Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR’16). 52885296.Google ScholarGoogle ScholarCross RefCross Ref
  67. [67] Yang Xun, Dong Jianfeng, Cao Yixin, Wang Xun, Wang Meng, and Chua Tat-Seng. 2020. Tree-augmented cross-modal encoding for complex-query video retrieval. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’20). 13391348. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. [68] Yao Ting, Pan Yingwei, Li Yehao, and Mei Tao. 2018. Exploring visual relationship for image captioning. In Proceedings of the European Conference on Computer Vision (ECCV’18). 684699.Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. [69] Yu Youngjae, Kim Jongseok, and Kim Gunhee. 2018. A joint sequence fusion model for video question answering and retrieval. In Proceedings of the European Conference on Computer Vision (ECCV’18). 487503.Google ScholarGoogle ScholarCross RefCross Ref
  70. [70] Yu Youngjae, Ko Hyungjin, Choi Jongwook, and Kim Gunhee. 2017. End-to-end concept word detection for video captioning, retrieval, and question answering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 31653173.Google ScholarGoogle ScholarCross RefCross Ref
  71. [71] Zhou Luowei, Louis Nathan, and Corso Jason J.. 2018. Weakly-Supervised video object grounding from text by loss weighting and object interaction. In Proceedings of the British Machine Vision Conference (BMVC’18). 50.Google ScholarGoogle Scholar
  72. [72] Zhu Bin, Ngo Chong-Wah, Chen Jingjing, and Hao Yanbin. 2019. R2GAN: Cross-modal recipe retrieval with generative adversarial network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). 1147711486.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Adversarial Multi-Grained Embedding Network for Cross-Modal Text-Video Retrieval

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 18, Issue 2
      May 2022
      494 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/3505207
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 16 February 2022
      • Accepted: 1 August 2021
      • Revised: 1 June 2021
      • Received: 1 February 2021
      Published in tomm Volume 18, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text

    HTML Format

    View this article in HTML Format .

    View HTML Format