skip to main content
10.1145/2020408.2020477acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Democrats, republicans and starbucks afficionados: user classification in twitter

Published:21 August 2011Publication History

ABSTRACT

More and more technologies are taking advantage of the explosion of social media (Web search, content recommendation services, marketing, ad targeting, etc.). This paper focuses on the problem of automatically constructing user profiles, which can significantly benefit such technologies. We describe a general and robust machine learning framework for large-scale classification of social media users according to dimensions of interest. We report encouraging experimental results on 3 tasks with different characteristics: political affiliation detection, ethnicity identification and detecting affinity for a particular business.

References

  1. L. Barbosa and F. J. Robust Sentiment Detection on Twitter from Biased and Noisy Data. In Proceedings of COLING, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. F. Benevenuto, G. Magno, T. Rodrigues, and V. Almeida. Detecting Spammers on Twitter. In Proceedings of CEAS, 2010.Google ScholarGoogle Scholar
  3. D. Blei, A. Ng, and M. Jordan. Latent Dirichlet Allocation. JMLR, (3):993--1022, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Burger and J. Henderson. An exploration of observable features related to blogger age. In Computational Approaches to Analyzing Weblogs, pages 710--718, 2010.Google ScholarGoogle Scholar
  5. Burson-Marsteller. Press Releases Archives. In Archive of Sept 10, 2010.Google ScholarGoogle Scholar
  6. Z. Cheng, J. Caverlee, and K. Lee. You are where you tweet: A Content-based Approach to Geo-locating Twitter Users. In Proceedings of CIKM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. H. Friedman. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5):1189--1232, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  8. J. H. Friedman. Recent Advances in Predictive (Machine) Learning. Journal of Classification, 23(2):175--197, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  9. N. Garera and D. Yarovsky. Modeling latent biographic attributes in conversational genres. In Proceedings of CIKM, 2007.Google ScholarGoogle Scholar
  10. S. Herring and J. Paolillo. Gender and genre variation in weblogs. In Journal of Sociolinguistics, pages 710--718, 2010.Google ScholarGoogle Scholar
  11. A. Java, X. Song, T. Finin, and B. Tseng. Why we twitter: understanding microblogging usage and communities. In Proceedings of the 9th WebKDD and 1st SNA-KDD, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. Jones, R. Kumar, B. Pang, and A. Tomkins. I Know What you Did Last Summer - Query Logs and User Privacy. In Proceedings of CIKM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Kim and E. Hovy. CRYSTAL: Analyzing Predictive Opinions on the Web. In Proceedings of EMNLP, 2007.Google ScholarGoogle Scholar
  14. J. Otterbacher. Inferring Gender of Movie Reviewers: Exploiting Writing Style, Content and Metadata. In Proceedings of CIKM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Pasca. What you seek is what you get: Extraction of class attributes from query logs. In Proceedings of IJCAI, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Pennacchiotti and S. Gurumurthy. Investigating Topic Models for Social Media User Recommendation. In Proceedings of WWW, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Quantcast. Report May 2010. In http://www.quantcast.com/twitter.com, 2010.Google ScholarGoogle Scholar
  18. D. Ramage, S. Dumais, and D. Liebling. Characterizing Microblogs with Topic Models. In Proceedings of ICWSM, 2010.Google ScholarGoogle Scholar
  19. D. Rao, Y. D., A. Shreevats, and M. Gupta. Classifying Latent User Attributes in Twitter. In Proceedings of SMUC-10, pages 710--718, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Ritter, C. Cherry, and B. Dolan. Unsupervised Modeling of Twitter Conversations. In Proceedings of HLT-NAACL, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Smola and S. Narayanamurthy. An architecture for parallel topic models. In Proceedings of VLDB, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. Somasundaran and J. Wiebe. Recognizing Stances in Ideological On-Line Debates. In Proceedings of NAACL-HLT Workshop on Computational Approaches to Analysis and Generation of Emotion in Text, pages 116--124, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. Thomas, B. Pang, and L. Lee. Get out the vote: determining support or opposition from congressional floor-debate transcripts. In Proceedings of EMNLP, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. I. Weber and C. Castillo. The Demographics of Web Search. In Proceedings of SIGIR, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Wiebe, T. Wilson, and C. Cardie. Annotating expressions of opinions and emotions in language. In Language Resources and Evaluation, pages 165--210, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  26. J. Ye, C. Jyh-Herng, C. Jang, and Z. Zhaohui. Stochastic gradient boosted distributed decision trees. In Proceedings of CIKM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Democrats, republicans and starbucks afficionados: user classification in twitter

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
          August 2011
          1446 pages
          ISBN:9781450308137
          DOI:10.1145/2020408

          Copyright © 2011 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 21 August 2011

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate1,133of8,635submissions,13%

          Upcoming Conference

          KDD '24

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader