skip to main content
10.1145/3328526.3329582acmconferencesArticle/Chapter ViewAbstractPublication PagesecConference Proceedingsconference-collections
research-article

The Congressional Classification Challenge: Domain Specificity and Partisan Intensity

Authors Info & Claims
Published:17 June 2019Publication History

ABSTRACT

In this paper, we study the effectiveness and generalizability of techniques for classifying partisanship and ideology from text in the context of US politics. In particular, we are interested in how well measures of partisanship transfer across domains as well as the potential to rely upon measures of partisan intensity as a proxy for political ideology. We construct novel datasets of English texts from (1) the Congressional Record, (2) prominent conservative and liberal media websites, and (3) conservative and liberal wikis, and apply text classification algorithms to evaluate domain specificity via a domain adaptation technique. Surprisingly, we find that the cross-domain learning performance, benchmarking the ability to generalize from one of these datasets to another, is in general poor, even though the algorithms perform very well in within-dataset cross-validation tests. While party affiliation of legislators is not predictable based on models learned from other sources, we do find some ability to predict the leanings of the media and crowdsourced websites based on models learned from the Congressional Record. This predictivity is different across topics, and itself a priori predictable based on within-topic cross-validation results. Temporally, phrases tend to move from politicians to the media, helping to explain this predictivity. Finally, when we compare legislators themselves across different media (the Congressional Record and press releases), we find that while party affiliation is highly predictable, within-party ideology is completely unpredictable. Legislators are communicating different messages through different channels while clearly signaling party identity systematically across all channels. Choice of language is a clearly strategic act, among both legislators and the media, and we must therefore proceed with extreme caution in extrapolating from language to partisanship or ideology across domains.

Skip Supplemental Material Section

Supplemental Material

p71-yan.mp4

mp4

1.2 GB

References

  1. Amr Ahmed and Eric P Xing. 2010. Staying informed: Supervised and semi-supervised multi-view topical analysis of ideological perspective. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1140--1150. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Michael A Bailey. 2007. Comparable preference estimates across time and institutions for the Court, Congress, and Presidency. American Journal of Political Science, Vol. 51, 3 (2007), 433--448.Google ScholarGoogle ScholarCross RefCross Ref
  3. Eytan Bakshy, Solomon Messing, and Lada A Adamic. 2015. Exposure to ideologically diverse news and opinion on Facebook . Science, Vol. 348, 6239 (2015), 1130--1132.Google ScholarGoogle Scholar
  4. Pablo Barbera. 2015. Birds of the same feather tweet together: Bayesian ideal point estimation using Twitter data. Political Analysis, Vol. 23, 1 (2015), 76.Google ScholarGoogle ScholarCross RefCross Ref
  5. Matthew A Baum and Tim Groeling. 2008. New media and the polarization of American political discourse. Political Communication, Vol. 25, 4 (2008), 345--365.Google ScholarGoogle ScholarCross RefCross Ref
  6. Marianne Bertrand and Emir Kamenica. 2018. Coming apart? Cultural distances in the United States over time . Technical Report. National Bureau of Economic Research.Google ScholarGoogle Scholar
  7. David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research, Vol. 3, Jan (2003), 993--1022. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Adam Bonica. 2014. Mapping the ideological marketplace. American Journal of Political Science, Vol. 58, 2 (2014), 367--386.Google ScholarGoogle ScholarCross RefCross Ref
  9. Adam R Brown. 2011. Wikipedia as a data source for political scientists: Accuracy and completeness of coverage. PS: Political Science & Politics, Vol. 44, 2 (2011), 339--343.Google ScholarGoogle ScholarCross RefCross Ref
  10. Ceren Budak, Sharad Goel, and Justin M Rao. 2016. Fair and balanced? Quantifying media bias through crowdsourced content analysis. Public Opinion Quarterly, Vol. 80, S1 (2016), 250--271.Google ScholarGoogle ScholarCross RefCross Ref
  11. Barry C Burden, Gregory A Caldeira, and Tim Groseclose. 2000. Measuring the ideologies of US senators: The song remains the same. Legislative Studies Quarterly (2000), 237--258.Google ScholarGoogle Scholar
  12. Minmin Chen, Zhixiang Xu, Kilian Q Weinberger, and Fei Sha. 2012. Marginalized denoising autoencoders for domain adaptation. In Proceedings of the International Conference on Machine Learning. 767--774. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Joshua Clinton, Simon Jackman, and Douglas Rivers. 2004. The statistical analysis of roll call data. American Political Science Review, Vol. 98, 2 (2004), 355--370.Google ScholarGoogle ScholarCross RefCross Ref
  14. Raviv Cohen and Derek Ruths. 2013. Classifying political orientation on Twitter: It's not easy!. In Proceedings of the 7th International AAAI Conference on Web and Social Media .Google ScholarGoogle Scholar
  15. Sanmay Das and Allen Lavoie. 2014. Automated inference of point of view from user interactions in collective intelligence venues. In Proceedings of the International Conference on Machine Learning. 82--90. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Sanmay Das, Allen Lavoie, and Malik Magdon-Ismail. 2016. Manipulation among the arbiters of collective intelligence: How Wikipedia administrators mold public opinion. ACM Transactions on the Web (TWEB), Vol. 10, 4 (2016), 24:1--24:25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Sanmay Das and Malik Magdon-Ismail. 2010. Collective wisdom: Information growth in wikis and blogs. In Proceedings of the ACM Conference on Electronic Commerce. 231--240. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Stefano DellaVigna and Ethan Kaplan. 2007. The Fox News effect: Media bias and voting. The Quarterly Journal of Economics, Vol. 122, 3 (2007), 1187--1234.Google ScholarGoogle ScholarCross RefCross Ref
  19. Daniel Diermeier, Jean-Francc ois Godbout, Bei Yu, and Stefan Kaufmann. 2012. Language and ideology in Congress . British Journal of Political Science, Vol. 42, 1 (2012), 31--55.Google ScholarGoogle ScholarCross RefCross Ref
  20. Robert M Entman. 1989. How the media affect what people think: An information processing approach. The Journal of Politics, Vol. 51, 2 (1989), 347--370.Google ScholarGoogle ScholarCross RefCross Ref
  21. Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, Francc ois Laviolette, Mario Marchand, and Victor Lempitsky. 2016. Domain-adversarial training of neural networks. The Journal of Machine Learning Research, Vol. 17, 1 (2016), 2096--2030. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Matthew Gentzkow and Jesse M Shapiro. 2010. What drives media slant? Evidence from U.S. daily newspapers. Econometrica, Vol. 78, 1 (2010), 35--71.Google ScholarGoogle ScholarCross RefCross Ref
  23. Matthew Gentzkow, Jesse M Shapiro, and Matt Taddy. 2019. Measuring polarization in high-dimensional data: Method and application to Congressional speech. Econometrica (2019). Forthcoming.Google ScholarGoogle Scholar
  24. Sean Gerrish and David M Blei. 2011. Predicting legislative roll calls from text. In Proceedings of the 28th International Conference on Machine Learning. 489--496. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Xavier Glorot, Antoine Bordes, and Yoshua Bengio. 2011. Domain adaptation for large-scale sentiment classification: A deep learning approach. In Proceedings of the 28th International Conference on Machine Learning. 513--520. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Shane Greenstein and Feng Zhu. 2012. Is Wikipedia biased? American Economic Review, Vol. 102, 3 (2012), 343--348.Google ScholarGoogle ScholarCross RefCross Ref
  27. J. Grimmer and B. M. Stewart. 2013. Text as data: The promise and pitfalls of automatic content analysis methods for political texts. Political Analysis (2013), 1--31.Google ScholarGoogle Scholar
  28. Tim Groseclose and Jeffrey Milyo. 2005. A measure of media bias. The Quarterly Journal of Economics, Vol. 120, 4 (2005), 1191--1237.Google ScholarGoogle ScholarCross RefCross Ref
  29. Justin H Gross, Brice Acree, Yanchuan Sim, and Noah A Smith. 2013. Testing the Etch-a-Sketch hypothesis: A computational analysis of Mitt Romney's ideological makeover during the 2012 Primary vs. General Elections. In APSA Annual Meeting .Google ScholarGoogle Scholar
  30. Daniel E Ho, Kevin M Quinn, et almbox. 2008. Measuring explicit political positions of media. Quarterly Journal of Political Science, Vol. 3, 4 (2008), 353--377.Google ScholarGoogle ScholarCross RefCross Ref
  31. Kosuke Imai, James Lo, and Jonathan Olmsted. 2016. Fast estimation of ideal points with massive data. American Political Science Review, Vol. 110, 4 (2016), 631--656.Google ScholarGoogle ScholarCross RefCross Ref
  32. Mohit Iyyer, Peter Enns, Jordan Boyd-Graber, and Philip Resnik. 2014. Political ideology detection using recursive neural networks. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Vol. 1. 1113--1122.Google ScholarGoogle ScholarCross RefCross Ref
  33. Keith Krehbiel. 2010. Pivotal politics: A theory of US lawmaking .University of Chicago Press.Google ScholarGoogle Scholar
  34. Jure Leskovec, Lars Backstrom, and Jon Kleinberg. 2009. Meme-tracking and the dynamics of the news cycle. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 497--506. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Wei-Hao Lin, Eric Xing, and Alexander Hauptmann. 2008. A joint topic and perspective model for ideological discourse. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 17--32.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Nolan McCarty, Keith T Poole, and Howard Rosenthal. 2016. Polarized America: The Dance of Ideology and Unequal Riches .MIT Press.Google ScholarGoogle Scholar
  37. T. Mikolov, K. Chen, G. S. Corrado, and J. Dean. 2013. Efficient estimation of word representations in vector space. In Proceedings of the 1st International Conference on Learning Representations . http://arxiv.org/abs/1301.3781Google ScholarGoogle Scholar
  38. Sendhil Mullainathan and Jann Spiess. 2017. Machine learning: An applied econometric approach. Journal of Economic Perspectives, Vol. 31, 2 (2017), 87--106.Google ScholarGoogle ScholarCross RefCross Ref
  39. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, Vol. 12 (October 2011), 2825--2830.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Keith T Poole and Howard Rosenthal. 1997. Congress: A political-economic history of roll call voting .Oxford University Press.Google ScholarGoogle Scholar
  41. Richard Socher, Brody Huval, Christopher D Manning, and Andrew Y Ng. 2012. Semantic compositionality through recursive matrix-vector spaces. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics, 1201--1211. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Richard Socher, Jeffrey Pennington, Eric H Huang, Andrew Y Ng, and Christopher D Manning. 2011. Semi-supervised recursive autoencoders for predicting sentiment distributions. In Proceedings of the Conference on Empirical Methods in Natural Language Processing . Association for Computational Linguistics, 151--161. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D Manning, Andrew Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing . 1631--1642.Google ScholarGoogle Scholar
  44. Matt Thomas, Bo Pang, and Lillian Lee. 2006. Get out the vote: Determining support or opposition from Congressional floor-debate transcripts. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 327--335. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. The Congressional Classification Challenge: Domain Specificity and Partisan Intensity

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        EC '19: Proceedings of the 2019 ACM Conference on Economics and Computation
        June 2019
        947 pages
        ISBN:9781450367929
        DOI:10.1145/3328526

        Copyright © 2019 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 17 June 2019

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        EC '19 Paper Acceptance Rate106of382submissions,28%Overall Acceptance Rate664of2,389submissions,28%

        Upcoming Conference

        EC '24
        The 25th ACM Conference on Economics and Computation
        July 8 - 11, 2024
        New Haven , CT , USA

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader