ABSTRACT
This paper presents and evaluates several original techniques for the latent classification of biographic attributes such as gender, age and native language, in diverse genres (conversation transcripts, email) and languages (Arabic, English). First, we present a novel partner-sensitive model for extracting biographic attributes in conversations, given the differences in lexical usage and discourse style such as observed between same-gender and mixed-gender conversations. Then, we explore a rich variety of novel sociolinguistic and discourse-based features, including mean utterance length, passive/active usage, percentage domination of the conversation, speaking rate and filler word usage. Cumulatively up to 20% error reduction is achieved relative to the standard Boulis and Ostendorf (2005) algorithm for classifying individual conversations on Switchboard, and accuracy for gender detection on the Switchboard corpus (aggregate) and Gulf Arabic corpus exceeds 95%.
- S. Argamon, M. Koppel, J. Fine, and A. R. Shimoni. 2003. Gender, genre, and writing style in formal written texts. Text-Interdisciplinary Journal for the Study of Discourse, 23(3):321--346.Google ScholarCross Ref
- T. Bocklet, A. Maier, and E. Nöth. 2008. Age Determination of Children in Preschool and Primary School Age with GMM-Based Supervectors and Support Vector Machines/Regression. In Proceedings of Text, Speech and Dialogue; 11th International Conference, volume 1, pages 253--260. Google ScholarDigital Library
- C. Boulis and M. Ostendorf. 2005. A quantitative analysis of lexical differences between genders in telephone conversations. Proceedings of ACL, pages 435--442. Google ScholarDigital Library
- J. D. Burger and J. C. Henderson. 2006. An exploration of observable features related to blogger age. In Computational Approaches to Analyzing Weblogs: Papers from the 2006 AAAI Spring Symposium, pages 15--20.Google Scholar
- C. Cieri, D. Miller, and K. Walker. 2004. The Fisher Corpus: a resource for the next generations of speech-to-text. In Proceedings of LREC.Google Scholar
- J. Coates. 1998. Language and Gender: A Reader. Blackwell Publishers.Google Scholar
- Linguistic Data Consortium. 2006. Gulf Arabic Conversational Telephone Speech Transcripts.Google Scholar
- P. Eckert and S. McConnell-Ginet. 2003. Language and Gender. Cambridge University Press.Google Scholar
- J. L. Fischer. 1958. Social influences on the choice of a linguistic variant. Word, 14:47--56.Google ScholarCross Ref
- JJ Godfrey, EC Holliman, and J. McDaniel. 1992. Switchboard: Telephone speech corpus for research and development. Proceedings of ICASSP, 1.Google Scholar
- S. C. Herring and J. C. Paolillo. 2006. Gender and genre variation in weblogs. Journal of Sociolinguistics, 10(4):439--459.Google ScholarCross Ref
- J. Holmes and M. Meyerhoff. 2003. The Handbook of Language and Gender. Blackwell Publishers.Google Scholar
- H. Jing, N. Kambhatla, and S. Roukos. 2007. Extracting social networks and biographical facts from conversational speech transcripts. Proceedings of ACL, pages 1040--1047.Google Scholar
- B. Klimt and Y. Yang. 2004. Introducing the Enron corpus. In First Conference on Email and AntiSpam (CEAS).Google Scholar
- M. Koppel, S. Argamon, and A. R. Shimoni. 2002. Automatically Categorizing Written Texts by Author Gender. Literary and Linguistic Computing, 17(4):401--412.Google ScholarCross Ref
- W. Labov. 1966. The Social Stratification of English in New York City. Center for Applied Linguistics, Washington DC.Google Scholar
- H. Liu and R. Mihalcea. 2007. Of Men, Women, and Computers: Data-Driven Gender Modeling for Improved User Interfaces. In International Conference on Weblogs and Social Media.Google Scholar
- R. K. S. Macaulay. 2005. Talk that Counts: Age, Gender, and Social Class Differences in Discourse. Oxford University Press, USA.Google Scholar
- S. Nowson and J. Oberlander. 2006. The identity of bloggers: Openness and gender in personal weblogs. Proceedings of the AAAI Spring Symposia on Computational Approaches to Analyzing Weblogs.Google Scholar
- J. Schler, M. Koppel, S. Argamon, and J. Pennebaker. 2006. Effects of age and gender on blogging. Proceedings of the AAAI Spring Symposia on Computational Approaches to Analyzing Weblogs.Google Scholar
- I. Shafran, M. Riley, and M. Mohri. 2003. Voice signatures. Proceedings of ASRU, pages 31--36.Google Scholar
- S. Singh. 2001. A pilot study on gender differences in conversational speech on lexical richness measures. Literary and Linguistic Computing, 16(3):251--264.Google ScholarCross Ref
Index Terms
- Modeling latent biographic attributes in conversational genres
Recommendations
Dialogue act modeling for automatic tagging and recognition of conversational speech
We describe a statistical approach for modeling dialogue acts in conversational speech, i.e., speech-act-like units such as STATEMENT, QUESTION, BACKCHANNEL, AGREEMENT, DISAGREEMENT, and APOLOGY. Our model detects and predicts dialogue acts based on ...
Gender affordances of conversational agents
Conversational agents are attributed humanlike characteristics; in particular, they are often assumed to have a gender. There is evidence that gender sets up expectations that have an impact on user experiences with agents. The objective of this paper ...
“Biographic spaces”: a personalized smoking cessation intervention in second life
UMAP'10: Proceedings of the 18th international conference on User Modeling, Adaptation, and PersonalizationIn this paper we are proposing a proof-of-concept leveraging the use of 3D virtual worlds in addictive behavior interventions We propose a model that we call biographic space, which embeds the successive stages that a smoker may go through while ...
Comments