ABSTRACT
Correlates between social attention and personality traits have been widely acknowledged in social psychology studies. Head pose has commonly been employed as a proxy for determining the social attention direction in small group interactions. However, the impact of head pose estimation errors on personality estimates has not been studied to our knowledge.
In this work, we consider the unstructured and dynamic cocktail party scenario where the scene is captured by multiple, large field-of-view cameras. Head pose estimation is a challenging task under these conditions owing to the uninhibited motion of persons (due to which facial appearance varies owing to perspective and scale changes), and the low resolution of captured faces. Based on proxemic and social attention features computed from position and head pose annotations, we first demonstrate that social attention features are excellent predictors of the Extraversion and Neuroticism personality traits. We then repeat classification experiments with behavioral features computed from automated estimates-- obtained experimental results show that while prediction performance for both traits is affected by head pose estimation errors, the impact is more adverse for Extraversion.
- A. Airola, T. Pahikkala, W. Waegeman, B. D. Baets, and T. Salakoski. A comparison of auc estimators in small-sample studies. Journal of Machine Learning Research - Proceedings Track, 8:3--13, 2010.Google Scholar
- N. Ambady and R. Rosenthal. Thin slices' of expressive behaviors as predictors of interpersonal consequences. a meta analysis. Psychological Bulletin, 111:156--274, 1992.Google ScholarCross Ref
- S. O. Ba and J.-M. Odobez. Recognizing visual focus of attention from head pose in natural meetings. IEEE Transactions on Systems, Man, and Cybernetics--Part B: Cybernetics, 39(1):16--33, 2009. Google ScholarDigital Library
- D. R. Carney, C. R. Colvin, and J. A. Hall. A thin slice perspective on the accuracy of first impressions. Journal of Research in Personality, 41:1054--1072, 2007.Google ScholarCross Ref
- C. Cortes and V. Vapnik. Support-vector networks. In Machine Learning, pages 273--297, 1995. Google ScholarDigital Library
- N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In Computer Vision and Pattern Recognition, pages 886--893, 2005. Google ScholarDigital Library
- S. De Julio and K. Duffy. Neuroticism and proxemic behavior. Perception and Motor Skills, 45(1):51--55, 1977.Google ScholarCross Ref
- J. F. Dovidio and S. L. Ellyson. Decoding visual dominance: Attributions of power based on relative percentages of looking while speaking and looking while listening. Social Psychology Quarterly, 45(2):106--113, 1982.Google ScholarCross Ref
- T. Evgeniou and M. Pontil. Regularized multi--task learning. In Int'l conference on Knowledge Discovery and Data Mining, pages 109--117, 2004. Google ScholarDigital Library
- T. Evgeniou and M. Pontil. Regularized multi-task learning. In ACM Int'l Conference on Knowledge Discovery and Data Mining, 2004. Google ScholarDigital Library
- E. Frank, M. Hall, and B. Pfahringer. Locally weighted naive bayes. In Uncertainty in Artificial Intelligence, pages 249--256, 2003. Google ScholarDigital Library
- E. T. Hall. The hidden dimension. Anchor Books, 1963.Google Scholar
- H. Hung, D. B. Jayagopi, S. Ba, J.-M. Odobez, and D. Gatica-Perez. Investigating automatic dominance estimation in groups from visual attention and speaking activity. In Int'l Conference on Multimodal Interfaces, pages 233--236, 2008. Google ScholarDigital Library
- D. B. Jayagopi, H. Hung, C. Yeo, and D. Gatica-Perez. Modeling dominance in group conversations using nonverbal activity cues. IEEE Trans. Audio, Speech and Lang. Proc.- Special issue on multimodal processing in speech-based interactions, 17(3):501--513, 2009. Google ScholarDigital Library
- S. R. Langton, R. J. Watt, and I. Bruce. Do the eyes have it? cues to the direction of social attention. Trends in Cognitive Science, 4(2):50--59, 2000.Google ScholarCross Ref
- O. Lanz and R. Brunelli. Dynamic head location and pose from video. In Int'l Conference on Multisensor Fusion and Integration for Intelligent Systems, pages 47--52, 2006.Google Scholar
- B. Lepri, J. Staiano, G. Rigato, K. Kalimeri, A. Finnerty, F. Pianesi, N. Sebe, and A. Pentland. The sociometric badges corpus: A multilevel behavioral dataset for social behavior in complex organizations. In Int'l Conference on Social Computing, pages 623--628, 2012. Google ScholarDigital Library
- B. Lepri, R. Subramanian, K. Kalimeri, J. Staiano, F. Pianesi, and N. Sebe. Connecting meeting behavior with extraversion - a systematic study. IEEE Transactions on Affective Computing, 3(4):443--455, 2012. Google ScholarDigital Library
- L. Liang and V. Cherkassky. Connection between svmGoogle Scholar
- and multi-task learning. In Int'l Joint Conference on Neural Networks, 2008.Google Scholar
- J.-M. Odobez and S. O. Ba. A Cognitive and Unsupervised MAP Adaptation Approach to the Recognition of the Focus of Attention from Head Pose. In Int'l Conference on Multi-Media & Expo, 2007.Google Scholar
- M. Perugini and L. Di Blas. Analyzing personality-related adjectives from an eticemic perspective: the big five marker scale (bfms) and the italian ab5c taxonomy. Big Five Assessment, pages 281--304, 2002.Google Scholar
- A. K. Rajagopal, R. Subramanian, R. L. Vieriu, E. Ricci, O. Lanz, K. Ramakrishnan, and N. Sebe. An adaptation framework for head-pose classification in dynamic multi-view scenarios. In Asian conference on Computer Vision, pages 652--666, 2012. Google ScholarDigital Library
- J. Staiano, B. Lepri, R. Subramanian, N. Sebe, and F. Pianesi. Automatic modeling of personality states in small group interactions. In ACM Int'l conference on Multimedia, pages 989--992, 2011. Google ScholarDigital Library
- L. B. Statistics and L. Breiman. Random forests. In Machine Learning, pages 5--32, 2001. Google ScholarDigital Library
- R. Stiefelhagen, J. Yang, and A. Waibel. Modeling focus of attention for meeting indexing based on multiple cues. IEEE Transactions on Neural Networks, 13(4):928--938, 2002. Google ScholarDigital Library
- R. Subramanian, J. Staiano, K. Kalimeri, N. Sebe, and F. Pianesi. Putting the pieces together: multimodal analysis of social attention in meetings. In Int'l Conference on Multimedia, pages 659--662, 2010. Google ScholarDigital Library
- M. Voit and R. Stiefelhagen. Deducing the visual focus of attention from head pose estimation in dynamic multi-view meeting scenarios. In Int'l Conference on Multimodal interfaces, pages 173--180, 2008. Google ScholarDigital Library
- G. Zen, B. Lepri, E. Ricci, and O. Lanz. Space speaks: towards socially and personality aware visual surveillance. In ACM Int'l Workshop on Multimodal Pervasive Video Analysis, pages 37--42, 2010. Google ScholarDigital Library
Index Terms
- On the relationship between head pose, social attention and personality prediction for unstructured and dynamic group interactions
Recommendations
Tracking head pose and focus of attention with multiple far-field cameras
ICMI '06: Proceedings of the 8th international conference on Multimodal interfacesIn this work we present our recent approach on estimating head orientations and foci of attention of multiple people in a smart room, which is equipped with several cameras to monitor the room. In our approach, we estimate each person's head orientation ...
Head pose estimation with one camera, in uncalibrated environments
EGIHMI '10: Proceedings of the 2010 workshop on Eye gaze in intelligent human machine interactionHead pose together with eye gaze are a reliable indication regarding the estimate of the focus of attention of a person standing in front of a camera, with applications ranging from driver's attention estimation to meeting environments. As gaze ...
Continuous Emotion Recognition in Videos by Fusing Facial Expression, Head Pose and Eye Gaze
ICMI '19: 2019 International Conference on Multimodal InteractionContinuous emotion recognition is of great significance in affective computing and human-computer interaction. Most of existing methods for video based continuous emotion recognition utilize facial expression. However, besides facial expression, other ...
Comments