ABSTRACT
In this paper, we study the effectiveness and generalizability of techniques for classifying partisanship and ideology from text in the context of US politics. In particular, we are interested in how well measures of partisanship transfer across domains as well as the potential to rely upon measures of partisan intensity as a proxy for political ideology. We construct novel datasets of English texts from (1) the Congressional Record, (2) prominent conservative and liberal media websites, and (3) conservative and liberal wikis, and apply text classification algorithms to evaluate domain specificity via a domain adaptation technique. Surprisingly, we find that the cross-domain learning performance, benchmarking the ability to generalize from one of these datasets to another, is in general poor, even though the algorithms perform very well in within-dataset cross-validation tests. While party affiliation of legislators is not predictable based on models learned from other sources, we do find some ability to predict the leanings of the media and crowdsourced websites based on models learned from the Congressional Record. This predictivity is different across topics, and itself a priori predictable based on within-topic cross-validation results. Temporally, phrases tend to move from politicians to the media, helping to explain this predictivity. Finally, when we compare legislators themselves across different media (the Congressional Record and press releases), we find that while party affiliation is highly predictable, within-party ideology is completely unpredictable. Legislators are communicating different messages through different channels while clearly signaling party identity systematically across all channels. Choice of language is a clearly strategic act, among both legislators and the media, and we must therefore proceed with extreme caution in extrapolating from language to partisanship or ideology across domains.
Supplemental Material
- Amr Ahmed and Eric P Xing. 2010. Staying informed: Supervised and semi-supervised multi-view topical analysis of ideological perspective. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1140--1150. Google ScholarDigital Library
- Michael A Bailey. 2007. Comparable preference estimates across time and institutions for the Court, Congress, and Presidency. American Journal of Political Science, Vol. 51, 3 (2007), 433--448.Google ScholarCross Ref
- Eytan Bakshy, Solomon Messing, and Lada A Adamic. 2015. Exposure to ideologically diverse news and opinion on Facebook . Science, Vol. 348, 6239 (2015), 1130--1132.Google Scholar
- Pablo Barbera. 2015. Birds of the same feather tweet together: Bayesian ideal point estimation using Twitter data. Political Analysis, Vol. 23, 1 (2015), 76.Google ScholarCross Ref
- Matthew A Baum and Tim Groeling. 2008. New media and the polarization of American political discourse. Political Communication, Vol. 25, 4 (2008), 345--365.Google ScholarCross Ref
- Marianne Bertrand and Emir Kamenica. 2018. Coming apart? Cultural distances in the United States over time . Technical Report. National Bureau of Economic Research.Google Scholar
- David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research, Vol. 3, Jan (2003), 993--1022. Google ScholarDigital Library
- Adam Bonica. 2014. Mapping the ideological marketplace. American Journal of Political Science, Vol. 58, 2 (2014), 367--386.Google ScholarCross Ref
- Adam R Brown. 2011. Wikipedia as a data source for political scientists: Accuracy and completeness of coverage. PS: Political Science & Politics, Vol. 44, 2 (2011), 339--343.Google ScholarCross Ref
- Ceren Budak, Sharad Goel, and Justin M Rao. 2016. Fair and balanced? Quantifying media bias through crowdsourced content analysis. Public Opinion Quarterly, Vol. 80, S1 (2016), 250--271.Google ScholarCross Ref
- Barry C Burden, Gregory A Caldeira, and Tim Groseclose. 2000. Measuring the ideologies of US senators: The song remains the same. Legislative Studies Quarterly (2000), 237--258.Google Scholar
- Minmin Chen, Zhixiang Xu, Kilian Q Weinberger, and Fei Sha. 2012. Marginalized denoising autoencoders for domain adaptation. In Proceedings of the International Conference on Machine Learning. 767--774. Google ScholarDigital Library
- Joshua Clinton, Simon Jackman, and Douglas Rivers. 2004. The statistical analysis of roll call data. American Political Science Review, Vol. 98, 2 (2004), 355--370.Google ScholarCross Ref
- Raviv Cohen and Derek Ruths. 2013. Classifying political orientation on Twitter: It's not easy!. In Proceedings of the 7th International AAAI Conference on Web and Social Media .Google Scholar
- Sanmay Das and Allen Lavoie. 2014. Automated inference of point of view from user interactions in collective intelligence venues. In Proceedings of the International Conference on Machine Learning. 82--90. Google ScholarDigital Library
- Sanmay Das, Allen Lavoie, and Malik Magdon-Ismail. 2016. Manipulation among the arbiters of collective intelligence: How Wikipedia administrators mold public opinion. ACM Transactions on the Web (TWEB), Vol. 10, 4 (2016), 24:1--24:25. Google ScholarDigital Library
- Sanmay Das and Malik Magdon-Ismail. 2010. Collective wisdom: Information growth in wikis and blogs. In Proceedings of the ACM Conference on Electronic Commerce. 231--240. Google ScholarDigital Library
- Stefano DellaVigna and Ethan Kaplan. 2007. The Fox News effect: Media bias and voting. The Quarterly Journal of Economics, Vol. 122, 3 (2007), 1187--1234.Google ScholarCross Ref
- Daniel Diermeier, Jean-Francc ois Godbout, Bei Yu, and Stefan Kaufmann. 2012. Language and ideology in Congress . British Journal of Political Science, Vol. 42, 1 (2012), 31--55.Google ScholarCross Ref
- Robert M Entman. 1989. How the media affect what people think: An information processing approach. The Journal of Politics, Vol. 51, 2 (1989), 347--370.Google ScholarCross Ref
- Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, Francc ois Laviolette, Mario Marchand, and Victor Lempitsky. 2016. Domain-adversarial training of neural networks. The Journal of Machine Learning Research, Vol. 17, 1 (2016), 2096--2030. Google ScholarDigital Library
- Matthew Gentzkow and Jesse M Shapiro. 2010. What drives media slant? Evidence from U.S. daily newspapers. Econometrica, Vol. 78, 1 (2010), 35--71.Google ScholarCross Ref
- Matthew Gentzkow, Jesse M Shapiro, and Matt Taddy. 2019. Measuring polarization in high-dimensional data: Method and application to Congressional speech. Econometrica (2019). Forthcoming.Google Scholar
- Sean Gerrish and David M Blei. 2011. Predicting legislative roll calls from text. In Proceedings of the 28th International Conference on Machine Learning. 489--496. Google ScholarDigital Library
- Xavier Glorot, Antoine Bordes, and Yoshua Bengio. 2011. Domain adaptation for large-scale sentiment classification: A deep learning approach. In Proceedings of the 28th International Conference on Machine Learning. 513--520. Google ScholarDigital Library
- Shane Greenstein and Feng Zhu. 2012. Is Wikipedia biased? American Economic Review, Vol. 102, 3 (2012), 343--348.Google ScholarCross Ref
- J. Grimmer and B. M. Stewart. 2013. Text as data: The promise and pitfalls of automatic content analysis methods for political texts. Political Analysis (2013), 1--31.Google Scholar
- Tim Groseclose and Jeffrey Milyo. 2005. A measure of media bias. The Quarterly Journal of Economics, Vol. 120, 4 (2005), 1191--1237.Google ScholarCross Ref
- Justin H Gross, Brice Acree, Yanchuan Sim, and Noah A Smith. 2013. Testing the Etch-a-Sketch hypothesis: A computational analysis of Mitt Romney's ideological makeover during the 2012 Primary vs. General Elections. In APSA Annual Meeting .Google Scholar
- Daniel E Ho, Kevin M Quinn, et almbox. 2008. Measuring explicit political positions of media. Quarterly Journal of Political Science, Vol. 3, 4 (2008), 353--377.Google ScholarCross Ref
- Kosuke Imai, James Lo, and Jonathan Olmsted. 2016. Fast estimation of ideal points with massive data. American Political Science Review, Vol. 110, 4 (2016), 631--656.Google ScholarCross Ref
- Mohit Iyyer, Peter Enns, Jordan Boyd-Graber, and Philip Resnik. 2014. Political ideology detection using recursive neural networks. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Vol. 1. 1113--1122.Google ScholarCross Ref
- Keith Krehbiel. 2010. Pivotal politics: A theory of US lawmaking .University of Chicago Press.Google Scholar
- Jure Leskovec, Lars Backstrom, and Jon Kleinberg. 2009. Meme-tracking and the dynamics of the news cycle. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 497--506. Google ScholarDigital Library
- Wei-Hao Lin, Eric Xing, and Alexander Hauptmann. 2008. A joint topic and perspective model for ideological discourse. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 17--32.Google ScholarDigital Library
- Nolan McCarty, Keith T Poole, and Howard Rosenthal. 2016. Polarized America: The Dance of Ideology and Unequal Riches .MIT Press.Google Scholar
- T. Mikolov, K. Chen, G. S. Corrado, and J. Dean. 2013. Efficient estimation of word representations in vector space. In Proceedings of the 1st International Conference on Learning Representations . http://arxiv.org/abs/1301.3781Google Scholar
- Sendhil Mullainathan and Jann Spiess. 2017. Machine learning: An applied econometric approach. Journal of Economic Perspectives, Vol. 31, 2 (2017), 87--106.Google ScholarCross Ref
- F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, Vol. 12 (October 2011), 2825--2830.Google ScholarDigital Library
- Keith T Poole and Howard Rosenthal. 1997. Congress: A political-economic history of roll call voting .Oxford University Press.Google Scholar
- Richard Socher, Brody Huval, Christopher D Manning, and Andrew Y Ng. 2012. Semantic compositionality through recursive matrix-vector spaces. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics, 1201--1211. Google ScholarDigital Library
- Richard Socher, Jeffrey Pennington, Eric H Huang, Andrew Y Ng, and Christopher D Manning. 2011. Semi-supervised recursive autoencoders for predicting sentiment distributions. In Proceedings of the Conference on Empirical Methods in Natural Language Processing . Association for Computational Linguistics, 151--161. Google ScholarDigital Library
- Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D Manning, Andrew Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing . 1631--1642.Google Scholar
- Matt Thomas, Bo Pang, and Lillian Lee. 2006. Get out the vote: Determining support or opposition from Congressional floor-debate transcripts. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 327--335. Google ScholarDigital Library
Index Terms
- The Congressional Classification Challenge: Domain Specificity and Partisan Intensity
Recommendations
E-State: Realistic or Utopian?
Information and Communication Technology ICT is known to facilitate governance and citizen participation in States' decision making processes. However, e-governance researchers have argued that beyond the current use of ICT to facilitate already ...
The representative's problem: how legislators use communication to secure constituent support
PLEAD '13: Proceedings of the 2nd workshop on Politics, elections and dataIn a representative democracy elected officials face what I call the representative's problem. Elected officials work in Washington to provide representation, yet constituents lack the incentive and capacity to track what their representative does in ...
Institutional Investor Activism and Employee Safety: The Role of Activist and Board Political Ideology
Although prior research on shareholder activism has highlighted how such activism can economically benefit the shareholders of targeted firms, recent studies also suggest that shareholder activism can economically disadvantage nonshareholder stakeholders, ...
Comments