ABSTRACT
We study conversational group detection in varied social scenes using a message-passing Graph Neural Network (GNN) in combination with the Dominant Sets clustering algorithm. Our approach first describes a scene as an interaction graph, where nodes encode individual features and edges encode pairwise relationship data. Then, it uses a GNN to predict pairwise affinity values that represent the likelihood of two people interacting together, and computes non-overlapping group assignments based on these affinities. We evaluate the proposed approach on the Cocktail Party and MatchNMingle datasets. Our results suggest that using GNNs to leverage both individual and relationship features when computing groups is beneficial, especially when more features are available for each individual.
Supplemental Material
- Xavier Alameda-Pineda, Jacopo Staiano, Ramanathan Subramanian, Ligia Batrinca, Elisa Ricci, Bruno Lepri, Oswald Lanz, and Nicu Sebe. 2016. SALSA: A Novel Dataset for Multimodal Group Behavior Analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 8(2016), 1707–1720.Google ScholarDigital Library
- Peter W. Battaglia, Jessica B. Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinicius Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, 2018. Relational inductive biases, deep learning, and graph networks. arxiv:1806.01261 [cs.LG]Google Scholar
- Filippo Maria Bianchi, Daniele Grattarola, and Cesare Alippi. 2020. Spectral Clustering with Graph Neural Networks for Graph Pooling. In Proceedings of the 37th International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 119), Hal Daumé III and Aarti Singh (Eds.). PMLR, 874–883.Google ScholarDigital Library
- Dan Bohus, Chit W. Saw, and Eric Horvitz. 2014. Directions Robot: In-the-Wild Experiences and Lessons Learned. In Proceedings of the 2014 International Conference on Autonomous Agents and Multi-Agent Systems(Paris, France) (AAMAS ’14). International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, 637–644.Google Scholar
- Laura Cabrera-Quiros, Andrew Demetriou, Ekin Gedik, Leander van der Meij, and Hayley Hung. 2021. The MatchNMingle Dataset: A Novel Multi-Sensor Resource for the Analysis of Social Interactions and Group Dynamics In-the-Wild During Free-Standing Conversations and Speed Dates. IEEE Transactions on Affective Computing 12, 1 (2021), 113–130. https://doi.org/10.1109/TAFFC.2018.2848914Google ScholarCross Ref
- Wei-Lin Chiang, Xuanqing Liu, Si Si, Yang Li, Samy Bengio, and Cho-Jui Hsieh. 2019. Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (Anchorage, AK, USA) (KDD ’19). Association for Computing Machinery, New York, NY, USA, 257–266.Google ScholarDigital Library
- Marco Cristani, Loris Bazzani, Giulia Paggetti, Andrea Fossati, Diego Tosato, Alessio Del Bue, Gloria Menegaz, and Vittorio Murino. 2011. Social Interaction Discovery by Statistical Analysis of F-formations. In Proceedings of the British Machine Vision Conference (BMVC), Vol. 2. Citeseer, BMVA Press, 4.Google ScholarCross Ref
- Marco Cristani, Ramachandra Raghavendra, Alessio Del Bue, and Vittorio Murino. 2013. Human behavior analysis in video surveillance: A Social Signal Processing perspective. Neurocomputing 100(2013), 86–97.Google ScholarDigital Library
- Haowen Deng, Tolga Birdal, and Slobodan Ilic. 2018. PPF-FoldNet: Unsupervised Learning of Rotation Invariant 3D Local Descriptors. In Proceedings of the European Conference on Computer Vision (ECCV). 602–618.Google ScholarDigital Library
- Eyal Dim and Tsvi Kuflik. 2014. Automatic Detection of Social Behavior of Museum Visitor Pairs. ACM Transactions on Interactive Intelligent Systems (TiiS) 4, 4(2014), 1–30.Google ScholarDigital Library
- Ekin Gedik and Hayley Hung. 2018. Detecting Conversing Groups Using Social Dynamics From Wearable Acceleration: Group Size Awareness. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 4 (2018), 1–24.Google ScholarDigital Library
- Edward Twitchell Hall. 1966. The Hidden Dimension. Vol. 609. Garden City, NY: Doubleday.Google Scholar
- William L. Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive Representation Learning on Large Graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA) (NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 1025–1035.Google ScholarDigital Library
- Hooman Hedayati, Daniel Szafir, and Sean Andrist. 2019. Recognizing F-Formations in the Open World. In 2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI). 558–559.Google Scholar
- Hayley Hung and Ben Kröse. 2011. Detecting F-Formations as Dominant Sets. In Proceedings of the 13th International Conference on Multimodal Interfaces (Alicante, Spain) (ICMI ’11). Association for Computing Machinery, New York, NY, USA, 231–238. https://doi.org/10.1145/2070481.2070525Google ScholarDigital Library
- Junko Ichino, Kazuo Isoda, Tetsuya Ueda, and Reimi Satoh. 2016. Effects of the Display Angle on Social Behaviors of the People around the Display: A Field Study at a Museum. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing (San Francisco, California, USA) (CSCW ’16). Association for Computing Machinery, New York, NY, USA, 26–37.Google ScholarDigital Library
- Manuela Jungmann, Richard Cox, and Geraldine Fitzpatrick. 2014. Spatial Play Effects in a Tangible Game with an F-Formation of Multiple Players. In Proceedings of the Fifteenth Australasian User Interface Conference - Volume 150(Auckland, New Zealand) (AUIC ’14). Australian Computer Society, Inc., AUS, 57–66.Google Scholar
- Adam Kendon. 1990. Conducting interaction: Patterns of behavior in focused encounters. Vol. 7. CUP Archive.Google Scholar
- Thomas N. Kipf and Max Welling. 2016. Semi-Supervised Classification with Graph Convolutional Networks. arxiv:1609.02907 [cs.LG]Google Scholar
- Xuelei Li, Liangkui Ding, Li Wang, and Fang Cao. 2017. FPGA accelerates deep residual learning for image recognition. In 2017 IEEE 2nd Information Technology, Networking, Electronic and Automation Control Conference (ITNEC). 837–840.Google Scholar
- Nicolai Marquardt, Ken Hinckley, and Saul Greenberg. 2012. Cross-Device Interaction via Micro-Mobility and f-Formations. In Proceedings of the 25th Annual ACM Symposium on User Interface Software and Technology (Cambridge, Massachusetts, USA) (UIST ’12). Association for Computing Machinery, 13–22.Google ScholarDigital Library
- Alejandro Moreno, Robby van Delden, Ronald Poppe, and Dennis Reidsma. 2013. Socially Aware Interactive Playgrounds. IEEE Pervasive Computing 12, 3 (2013), 40–47.Google ScholarDigital Library
- Massimiliano Pavan and Marcello Pelillo. 2006. Dominant sets and pairwise clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence 29, 1(2006), 167–172.Google ScholarCross Ref
- Ashwini Pokle, Roberto Martín-Martín, Patrick Goebel, Vincent Chow, Hans M Ewald, Junwei Yang, Zhenkai Wang, Amir Sadeghian, Dorsa Sadigh, Silvio Savarese, 2019. Deep Local Trajectory Replanning and Control for Robot Navigation. In 2019 International Conference on Robotics and Automation (ICRA). IEEE, 5815–5822.Google Scholar
- Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. 2017. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 652–660.Google Scholar
- Jorge Rios-Martinez, Anne Spalanzani, and Christian Laugier. 2015. From Proxemics Theory to Socially-Aware Navigation: A Survey. International Journal of Social Robotics 7, 2 (2015), 137–153.Google ScholarCross Ref
- Alvaro Sanchez-Gonzalez, Nicolas Heess, Jost Tobias Springenberg, Josh Merel, Martin Riedmiller, Raia Hadsell, and Peter Battaglia. 2018. Graph Networks as Learnable Physics Engines for Inference and Control. In Proceedings of the 35th International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 80). PMLR, 4470–4479.Google Scholar
- Francesco Setti, Hayley Hung, and Marco Cristani. 2013. Group detection in still images by F-formation modeling: A comparative study. In 2013 14th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS). IEEE, 1–4.Google ScholarCross Ref
- Francesco Setti, Chris Russell, Chiara Bassetti, and Marco Cristani. 2015. F-formation detection: Individuating free-standing conversational groups in images. PloS one 10, 5 (2015), e0123783.Google ScholarCross Ref
- Mason Swofford, John Peruzzi, Nathan Tsoi, Sydney Thompson, Roberto Martín-Martín, Silvio Savarese, and Marynel Vázquez. 2020. Improving Social Awareness Through DANTE: Deep Affinity Network for Clustering Conversational Interactants. Proc. ACM Hum.-Comput. Interact. 4, CSCW1, Article 020 (May 2020), 23 pages.Google ScholarDigital Library
- Anton Tsitsulin, John Palowitch, Bryan Perozzi, and Emmanuel Müller. 2020. Graph Clustering with Graph Neural Networks. arxiv:2006.16904 [cs.LG]Google Scholar
- Sebastiano Vascon, Eyasu Z. Mequanint, Marco Cristani, Hayley Hung, Marcello Pelillo, and Vittorio Murino. 2016. Detecting conversational groups in images and sequences: A robust game-theoretic approach. Computer Vision and Image Understanding 143 (2016), 11–24. https://doi.org/10.1016/j.cviu.2015.09.012Google ScholarDigital Library
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kai ser, and Illia Polosukhin. 2017. Attention is All You Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA) (NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 6000–6010.Google ScholarDigital Library
- Marynel Vázquez, Elizabeth J. Carter, Braden McDorman, Jodi Forlizzi, Aaron Steinfeld, and Scott E. Hudson. 2017. Towards Robot Autonomy in Group Conversations: Understanding the Effects of Body Orientation and Gaze. In 2017 12th ACM/IEEE International Conference on Human-Robot Interaction (HRI. IEEE, 42–52.Google ScholarDigital Library
- Marynel Vázquez, Aaron Steinfeld, and Scott E. Hudson. 2015. Parallel detection of conversational groups of free-standing people and tracking of their lower-body orientation. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 3010–3017. https://doi.org/10.1109/IROS.2015.7353792Google ScholarDigital Library
- Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2019. How Powerful are Graph Neural Networks?arxiv:1810.00826 [cs.LG]Google Scholar
- Xing Xu, Tan Wang, Yang Yang, Lin Zuo, Fumin Shen, and Heng Tao Shen. 2020. Cross-Modal Attention With Semantic Consistence for Image–Text Matching. IEEE transactions on neural networks and learning systems 31, 12(2020), 5412–5425.Google ScholarCross Ref
- Rex Ying, Jiaxuan You, Christopher Morris, Xiang Ren, William L. Hamilton, and Jure Leskovec. 2018. Hierarchical Graph Representation Learning with Differentiable Pooling. In Proceedings of the 32nd International Conference on Neural Information Processing Systems(Montréal, Canada) (NIPS’18). Curran Associates Inc., Red Hook, NY, USA, 4805–4815.Google ScholarDigital Library
- Ting Yu, Ser-Nam Lim, Kedar Patwardhan, and Nils Krahnstoever. 2009. Monitoring, recognizing and discovering social networks. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1462–1469.Google ScholarCross Ref
- Manzil Zaheer, Satwik Kottur, Siamak Ravanbhakhsh, Barnabás Póczos, Ruslan Salakhutdinov, and Alexander J Smola. 2017. Deep Sets. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA) (NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 3394–3404.Google Scholar
- Gloria Zen, Bruno Lepri, Elisa Ricci, and Oswald Lanz. 2010. Space speaks: towards socially and personality aware visual surveillance. In Proceedings of the 1st ACM International Workshop on Multimodal Pervasive Video Analysis. 37–42.Google ScholarDigital Library
Recommendations
Edge Based Graph Neural Network to Recognize Semigraph Representation of English Alphabets
MIKE 2013: Proceedings of the First International Conference on Mining Intelligence and Knowledge Exploration - Volume 8284Graph Neural Network based on edges is introduced in this paper and is used to recognize the English uppercase alphabets treating their corresponding graphs as semigraphs. Graph Neural Network(GNN) is a connectionist model comprising of two feedforward ...
Graph Neural Networks: Taxonomy, Advances, and Trends
Graph neural networks provide a powerful toolkit for embedding real-world graphs into low-dimensional spaces according to specific tasks. Up to now, there have been several surveys on this topic. However, they usually lay emphasis on different angles so ...
Graph Neural Networks: Foundation, Frontiers and Applications
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data MiningThe field of graph neural networks (GNNs) has seen rapid and incredible strides over the recent years. Graph neural networks, also known as deep learning on graphs, graph representation learning, or geometric deep learning, have become one of the ...
Comments