skip to main content
10.5555/1690219.1690245dlproceedingsArticle/Chapter ViewAbstractPublication PagesaclConference Proceedingsconference-collections
research-article
Free Access

Modeling latent biographic attributes in conversational genres

Authors Info & Claims
Published:02 August 2009Publication History

ABSTRACT

This paper presents and evaluates several original techniques for the latent classification of biographic attributes such as gender, age and native language, in diverse genres (conversation transcripts, email) and languages (Arabic, English). First, we present a novel partner-sensitive model for extracting biographic attributes in conversations, given the differences in lexical usage and discourse style such as observed between same-gender and mixed-gender conversations. Then, we explore a rich variety of novel sociolinguistic and discourse-based features, including mean utterance length, passive/active usage, percentage domination of the conversation, speaking rate and filler word usage. Cumulatively up to 20% error reduction is achieved relative to the standard Boulis and Ostendorf (2005) algorithm for classifying individual conversations on Switchboard, and accuracy for gender detection on the Switchboard corpus (aggregate) and Gulf Arabic corpus exceeds 95%.

References

  1. S. Argamon, M. Koppel, J. Fine, and A. R. Shimoni. 2003. Gender, genre, and writing style in formal written texts. Text-Interdisciplinary Journal for the Study of Discourse, 23(3):321--346.Google ScholarGoogle ScholarCross RefCross Ref
  2. T. Bocklet, A. Maier, and E. Nöth. 2008. Age Determination of Children in Preschool and Primary School Age with GMM-Based Supervectors and Support Vector Machines/Regression. In Proceedings of Text, Speech and Dialogue; 11th International Conference, volume 1, pages 253--260. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. Boulis and M. Ostendorf. 2005. A quantitative analysis of lexical differences between genders in telephone conversations. Proceedings of ACL, pages 435--442. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. D. Burger and J. C. Henderson. 2006. An exploration of observable features related to blogger age. In Computational Approaches to Analyzing Weblogs: Papers from the 2006 AAAI Spring Symposium, pages 15--20.Google ScholarGoogle Scholar
  5. C. Cieri, D. Miller, and K. Walker. 2004. The Fisher Corpus: a resource for the next generations of speech-to-text. In Proceedings of LREC.Google ScholarGoogle Scholar
  6. J. Coates. 1998. Language and Gender: A Reader. Blackwell Publishers.Google ScholarGoogle Scholar
  7. Linguistic Data Consortium. 2006. Gulf Arabic Conversational Telephone Speech Transcripts.Google ScholarGoogle Scholar
  8. P. Eckert and S. McConnell-Ginet. 2003. Language and Gender. Cambridge University Press.Google ScholarGoogle Scholar
  9. J. L. Fischer. 1958. Social influences on the choice of a linguistic variant. Word, 14:47--56.Google ScholarGoogle ScholarCross RefCross Ref
  10. JJ Godfrey, EC Holliman, and J. McDaniel. 1992. Switchboard: Telephone speech corpus for research and development. Proceedings of ICASSP, 1.Google ScholarGoogle Scholar
  11. S. C. Herring and J. C. Paolillo. 2006. Gender and genre variation in weblogs. Journal of Sociolinguistics, 10(4):439--459.Google ScholarGoogle ScholarCross RefCross Ref
  12. J. Holmes and M. Meyerhoff. 2003. The Handbook of Language and Gender. Blackwell Publishers.Google ScholarGoogle Scholar
  13. H. Jing, N. Kambhatla, and S. Roukos. 2007. Extracting social networks and biographical facts from conversational speech transcripts. Proceedings of ACL, pages 1040--1047.Google ScholarGoogle Scholar
  14. B. Klimt and Y. Yang. 2004. Introducing the Enron corpus. In First Conference on Email and AntiSpam (CEAS).Google ScholarGoogle Scholar
  15. M. Koppel, S. Argamon, and A. R. Shimoni. 2002. Automatically Categorizing Written Texts by Author Gender. Literary and Linguistic Computing, 17(4):401--412.Google ScholarGoogle ScholarCross RefCross Ref
  16. W. Labov. 1966. The Social Stratification of English in New York City. Center for Applied Linguistics, Washington DC.Google ScholarGoogle Scholar
  17. H. Liu and R. Mihalcea. 2007. Of Men, Women, and Computers: Data-Driven Gender Modeling for Improved User Interfaces. In International Conference on Weblogs and Social Media.Google ScholarGoogle Scholar
  18. R. K. S. Macaulay. 2005. Talk that Counts: Age, Gender, and Social Class Differences in Discourse. Oxford University Press, USA.Google ScholarGoogle Scholar
  19. S. Nowson and J. Oberlander. 2006. The identity of bloggers: Openness and gender in personal weblogs. Proceedings of the AAAI Spring Symposia on Computational Approaches to Analyzing Weblogs.Google ScholarGoogle Scholar
  20. J. Schler, M. Koppel, S. Argamon, and J. Pennebaker. 2006. Effects of age and gender on blogging. Proceedings of the AAAI Spring Symposia on Computational Approaches to Analyzing Weblogs.Google ScholarGoogle Scholar
  21. I. Shafran, M. Riley, and M. Mohri. 2003. Voice signatures. Proceedings of ASRU, pages 31--36.Google ScholarGoogle Scholar
  22. S. Singh. 2001. A pilot study on gender differences in conversational speech on lexical richness measures. Literary and Linguistic Computing, 16(3):251--264.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Modeling latent biographic attributes in conversational genres

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image DL Hosted proceedings
              ACL '09: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
              August 2009
              595 pages
              ISBN:9781932432466
              • General Chair:
              • Keh-Yih Su

              Publisher

              Association for Computational Linguistics

              United States

              Publication History

              • Published: 2 August 2009

              Qualifiers

              • research-article

              Acceptance Rates

              Overall Acceptance Rate85of443submissions,19%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader