skip to main content
10.1145/3462244.3479963acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
short-paper
Public Access

Conversational Group Detection with Graph Neural Networks

Published:18 October 2021Publication History

ABSTRACT

We study conversational group detection in varied social scenes using a message-passing Graph Neural Network (GNN) in combination with the Dominant Sets clustering algorithm. Our approach first describes a scene as an interaction graph, where nodes encode individual features and edges encode pairwise relationship data. Then, it uses a GNN to predict pairwise affinity values that represent the likelihood of two people interacting together, and computes non-overlapping group assignments based on these affinities. We evaluate the proposed approach on the Cocktail Party and MatchNMingle datasets. Our results suggest that using GNNs to leverage both individual and relationship features when computing groups is beneficial, especially when more features are available for each individual.

Skip Supplemental Material Section

Supplemental Material

conversational_group_detection_presentation.mp4

mp4

13.4 MB

References

  1. Xavier Alameda-Pineda, Jacopo Staiano, Ramanathan Subramanian, Ligia Batrinca, Elisa Ricci, Bruno Lepri, Oswald Lanz, and Nicu Sebe. 2016. SALSA: A Novel Dataset for Multimodal Group Behavior Analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 8(2016), 1707–1720.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Peter W. Battaglia, Jessica B. Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinicius Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, 2018. Relational inductive biases, deep learning, and graph networks. arxiv:1806.01261 [cs.LG]Google ScholarGoogle Scholar
  3. Filippo Maria Bianchi, Daniele Grattarola, and Cesare Alippi. 2020. Spectral Clustering with Graph Neural Networks for Graph Pooling. In Proceedings of the 37th International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 119), Hal Daumé III and Aarti Singh (Eds.). PMLR, 874–883.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Dan Bohus, Chit W. Saw, and Eric Horvitz. 2014. Directions Robot: In-the-Wild Experiences and Lessons Learned. In Proceedings of the 2014 International Conference on Autonomous Agents and Multi-Agent Systems(Paris, France) (AAMAS ’14). International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, 637–644.Google ScholarGoogle Scholar
  5. Laura Cabrera-Quiros, Andrew Demetriou, Ekin Gedik, Leander van der Meij, and Hayley Hung. 2021. The MatchNMingle Dataset: A Novel Multi-Sensor Resource for the Analysis of Social Interactions and Group Dynamics In-the-Wild During Free-Standing Conversations and Speed Dates. IEEE Transactions on Affective Computing 12, 1 (2021), 113–130. https://doi.org/10.1109/TAFFC.2018.2848914Google ScholarGoogle ScholarCross RefCross Ref
  6. Wei-Lin Chiang, Xuanqing Liu, Si Si, Yang Li, Samy Bengio, and Cho-Jui Hsieh. 2019. Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (Anchorage, AK, USA) (KDD ’19). Association for Computing Machinery, New York, NY, USA, 257–266.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Marco Cristani, Loris Bazzani, Giulia Paggetti, Andrea Fossati, Diego Tosato, Alessio Del Bue, Gloria Menegaz, and Vittorio Murino. 2011. Social Interaction Discovery by Statistical Analysis of F-formations. In Proceedings of the British Machine Vision Conference (BMVC), Vol. 2. Citeseer, BMVA Press, 4.Google ScholarGoogle ScholarCross RefCross Ref
  8. Marco Cristani, Ramachandra Raghavendra, Alessio Del Bue, and Vittorio Murino. 2013. Human behavior analysis in video surveillance: A Social Signal Processing perspective. Neurocomputing 100(2013), 86–97.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Haowen Deng, Tolga Birdal, and Slobodan Ilic. 2018. PPF-FoldNet: Unsupervised Learning of Rotation Invariant 3D Local Descriptors. In Proceedings of the European Conference on Computer Vision (ECCV). 602–618.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Eyal Dim and Tsvi Kuflik. 2014. Automatic Detection of Social Behavior of Museum Visitor Pairs. ACM Transactions on Interactive Intelligent Systems (TiiS) 4, 4(2014), 1–30.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Ekin Gedik and Hayley Hung. 2018. Detecting Conversing Groups Using Social Dynamics From Wearable Acceleration: Group Size Awareness. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 4 (2018), 1–24.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Edward Twitchell Hall. 1966. The Hidden Dimension. Vol. 609. Garden City, NY: Doubleday.Google ScholarGoogle Scholar
  13. William L. Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive Representation Learning on Large Graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA) (NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 1025–1035.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Hooman Hedayati, Daniel Szafir, and Sean Andrist. 2019. Recognizing F-Formations in the Open World. In 2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI). 558–559.Google ScholarGoogle Scholar
  15. Hayley Hung and Ben Kröse. 2011. Detecting F-Formations as Dominant Sets. In Proceedings of the 13th International Conference on Multimodal Interfaces (Alicante, Spain) (ICMI ’11). Association for Computing Machinery, New York, NY, USA, 231–238. https://doi.org/10.1145/2070481.2070525Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Junko Ichino, Kazuo Isoda, Tetsuya Ueda, and Reimi Satoh. 2016. Effects of the Display Angle on Social Behaviors of the People around the Display: A Field Study at a Museum. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing (San Francisco, California, USA) (CSCW ’16). Association for Computing Machinery, New York, NY, USA, 26–37.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Manuela Jungmann, Richard Cox, and Geraldine Fitzpatrick. 2014. Spatial Play Effects in a Tangible Game with an F-Formation of Multiple Players. In Proceedings of the Fifteenth Australasian User Interface Conference - Volume 150(Auckland, New Zealand) (AUIC ’14). Australian Computer Society, Inc., AUS, 57–66.Google ScholarGoogle Scholar
  18. Adam Kendon. 1990. Conducting interaction: Patterns of behavior in focused encounters. Vol. 7. CUP Archive.Google ScholarGoogle Scholar
  19. Thomas N. Kipf and Max Welling. 2016. Semi-Supervised Classification with Graph Convolutional Networks. arxiv:1609.02907 [cs.LG]Google ScholarGoogle Scholar
  20. Xuelei Li, Liangkui Ding, Li Wang, and Fang Cao. 2017. FPGA accelerates deep residual learning for image recognition. In 2017 IEEE 2nd Information Technology, Networking, Electronic and Automation Control Conference (ITNEC). 837–840.Google ScholarGoogle Scholar
  21. Nicolai Marquardt, Ken Hinckley, and Saul Greenberg. 2012. Cross-Device Interaction via Micro-Mobility and f-Formations. In Proceedings of the 25th Annual ACM Symposium on User Interface Software and Technology (Cambridge, Massachusetts, USA) (UIST ’12). Association for Computing Machinery, 13–22.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Alejandro Moreno, Robby van Delden, Ronald Poppe, and Dennis Reidsma. 2013. Socially Aware Interactive Playgrounds. IEEE Pervasive Computing 12, 3 (2013), 40–47.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Massimiliano Pavan and Marcello Pelillo. 2006. Dominant sets and pairwise clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence 29, 1(2006), 167–172.Google ScholarGoogle ScholarCross RefCross Ref
  24. Ashwini Pokle, Roberto Martín-Martín, Patrick Goebel, Vincent Chow, Hans M Ewald, Junwei Yang, Zhenkai Wang, Amir Sadeghian, Dorsa Sadigh, Silvio Savarese, 2019. Deep Local Trajectory Replanning and Control for Robot Navigation. In 2019 International Conference on Robotics and Automation (ICRA). IEEE, 5815–5822.Google ScholarGoogle Scholar
  25. Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. 2017. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 652–660.Google ScholarGoogle Scholar
  26. Jorge Rios-Martinez, Anne Spalanzani, and Christian Laugier. 2015. From Proxemics Theory to Socially-Aware Navigation: A Survey. International Journal of Social Robotics 7, 2 (2015), 137–153.Google ScholarGoogle ScholarCross RefCross Ref
  27. Alvaro Sanchez-Gonzalez, Nicolas Heess, Jost Tobias Springenberg, Josh Merel, Martin Riedmiller, Raia Hadsell, and Peter Battaglia. 2018. Graph Networks as Learnable Physics Engines for Inference and Control. In Proceedings of the 35th International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 80). PMLR, 4470–4479.Google ScholarGoogle Scholar
  28. Francesco Setti, Hayley Hung, and Marco Cristani. 2013. Group detection in still images by F-formation modeling: A comparative study. In 2013 14th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS). IEEE, 1–4.Google ScholarGoogle ScholarCross RefCross Ref
  29. Francesco Setti, Chris Russell, Chiara Bassetti, and Marco Cristani. 2015. F-formation detection: Individuating free-standing conversational groups in images. PloS one 10, 5 (2015), e0123783.Google ScholarGoogle ScholarCross RefCross Ref
  30. Mason Swofford, John Peruzzi, Nathan Tsoi, Sydney Thompson, Roberto Martín-Martín, Silvio Savarese, and Marynel Vázquez. 2020. Improving Social Awareness Through DANTE: Deep Affinity Network for Clustering Conversational Interactants. Proc. ACM Hum.-Comput. Interact. 4, CSCW1, Article 020 (May 2020), 23 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Anton Tsitsulin, John Palowitch, Bryan Perozzi, and Emmanuel Müller. 2020. Graph Clustering with Graph Neural Networks. arxiv:2006.16904 [cs.LG]Google ScholarGoogle Scholar
  32. Sebastiano Vascon, Eyasu Z. Mequanint, Marco Cristani, Hayley Hung, Marcello Pelillo, and Vittorio Murino. 2016. Detecting conversational groups in images and sequences: A robust game-theoretic approach. Computer Vision and Image Understanding 143 (2016), 11–24. https://doi.org/10.1016/j.cviu.2015.09.012Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kai ser, and Illia Polosukhin. 2017. Attention is All You Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA) (NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 6000–6010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Marynel Vázquez, Elizabeth J. Carter, Braden McDorman, Jodi Forlizzi, Aaron Steinfeld, and Scott E. Hudson. 2017. Towards Robot Autonomy in Group Conversations: Understanding the Effects of Body Orientation and Gaze. In 2017 12th ACM/IEEE International Conference on Human-Robot Interaction (HRI. IEEE, 42–52.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Marynel Vázquez, Aaron Steinfeld, and Scott E. Hudson. 2015. Parallel detection of conversational groups of free-standing people and tracking of their lower-body orientation. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 3010–3017. https://doi.org/10.1109/IROS.2015.7353792Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2019. How Powerful are Graph Neural Networks?arxiv:1810.00826 [cs.LG]Google ScholarGoogle Scholar
  37. Xing Xu, Tan Wang, Yang Yang, Lin Zuo, Fumin Shen, and Heng Tao Shen. 2020. Cross-Modal Attention With Semantic Consistence for Image–Text Matching. IEEE transactions on neural networks and learning systems 31, 12(2020), 5412–5425.Google ScholarGoogle ScholarCross RefCross Ref
  38. Rex Ying, Jiaxuan You, Christopher Morris, Xiang Ren, William L. Hamilton, and Jure Leskovec. 2018. Hierarchical Graph Representation Learning with Differentiable Pooling. In Proceedings of the 32nd International Conference on Neural Information Processing Systems(Montréal, Canada) (NIPS’18). Curran Associates Inc., Red Hook, NY, USA, 4805–4815.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Ting Yu, Ser-Nam Lim, Kedar Patwardhan, and Nils Krahnstoever. 2009. Monitoring, recognizing and discovering social networks. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1462–1469.Google ScholarGoogle ScholarCross RefCross Ref
  40. Manzil Zaheer, Satwik Kottur, Siamak Ravanbhakhsh, Barnabás Póczos, Ruslan Salakhutdinov, and Alexander J Smola. 2017. Deep Sets. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA) (NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 3394–3404.Google ScholarGoogle Scholar
  41. Gloria Zen, Bruno Lepri, Elisa Ricci, and Oswald Lanz. 2010. Space speaks: towards socially and personality aware visual surveillance. In Proceedings of the 1st ACM International Workshop on Multimodal Pervasive Video Analysis. 37–42.Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    ICMI '21: Proceedings of the 2021 International Conference on Multimodal Interaction
    October 2021
    876 pages
    ISBN:9781450384810
    DOI:10.1145/3462244

    Copyright © 2021 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 18 October 2021

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • short-paper
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate453of1,080submissions,42%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format