research-article

The Congressional Classification Challenge: Domain Specificity and Partisan Intensity

Authors:
Hao Yan

Washington University in St. Louis, St. Louis, MO, USA

Washington University in St. Louis, St. Louis, MO, USA
View Profile

,
Sanmay Das

Washington University in St. Louis, St. Louis, MO, USA

Washington University in St. Louis, St. Louis, MO, USA
View Profile

,
Allen Lavoie

Washington University in St. Louis, St. Louis, MO, USA

Washington University in St. Louis, St. Louis, MO, USA
View Profile

,
Sirui Li

Washington University in St. Louis, St. Louis, MO, USA

Washington University in St. Louis, St. Louis, MO, USA
View Profile

,
Betsy Sinclair

Washington University in St. Louis, St. Louis, MO, USA

Washington University in St. Louis, St. Louis, MO, USA
View Profile

EC '19: Proceedings of the 2019 ACM Conference on Economics and ComputationJune 2019Pages 71–89https://doi.org/10.1145/3328526.3329582

Published:17 June 2019Publication History

EC '19: Proceedings of the 2019 ACM Conference on Economics and Computation

Pages 71–89

ABSTRACT

In this paper, we study the effectiveness and generalizability of techniques for classifying partisanship and ideology from text in the context of US politics. In particular, we are interested in how well measures of partisanship transfer across domains as well as the potential to rely upon measures of partisan intensity as a proxy for political ideology. We construct novel datasets of English texts from (1) the Congressional Record, (2) prominent conservative and liberal media websites, and (3) conservative and liberal wikis, and apply text classification algorithms to evaluate domain specificity via a domain adaptation technique. Surprisingly, we find that the cross-domain learning performance, benchmarking the ability to generalize from one of these datasets to another, is in general poor, even though the algorithms perform very well in within-dataset cross-validation tests. While party affiliation of legislators is not predictable based on models learned from other sources, we do find some ability to predict the leanings of the media and crowdsourced websites based on models learned from the Congressional Record. This predictivity is different across topics, and itself a priori predictable based on within-topic cross-validation results. Temporally, phrases tend to move from politicians to the media, helping to explain this predictivity. Finally, when we compare legislators themselves across different media (the Congressional Record and press releases), we find that while party affiliation is highly predictable, within-party ideology is completely unpredictable. Legislators are communicating different messages through different channels while clearly signaling party identity systematically across all channels. Choice of language is a clearly strategic act, among both legislators and the media, and we must therefore proceed with extreme caution in extrapolating from language to partisanship or ideology across domains.

Supplemental Material

p71-yan.mp4

mp4

1.2 GB

Download

References

Amr Ahmed and Eric P Xing. 2010. Staying informed: Supervised and semi-supervised multi-view topical analysis of ideological perspective. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1140--1150. Google ScholarDigital Library
Michael A Bailey. 2007. Comparable preference estimates across time and institutions for the Court, Congress, and Presidency. American Journal of Political Science, Vol. 51, 3 (2007), 433--448.Google ScholarCross Ref
Eytan Bakshy, Solomon Messing, and Lada A Adamic. 2015. Exposure to ideologically diverse news and opinion on Facebook . Science, Vol. 348, 6239 (2015), 1130--1132.Google Scholar
Pablo Barbera. 2015. Birds of the same feather tweet together: Bayesian ideal point estimation using Twitter data. Political Analysis, Vol. 23, 1 (2015), 76.Google ScholarCross Ref
Matthew A Baum and Tim Groeling. 2008. New media and the polarization of American political discourse. Political Communication, Vol. 25, 4 (2008), 345--365.Google ScholarCross Ref
Marianne Bertrand and Emir Kamenica. 2018. Coming apart? Cultural distances in the United States over time . Technical Report. National Bureau of Economic Research.Google Scholar
David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research, Vol. 3, Jan (2003), 993--1022. Google ScholarDigital Library
Adam Bonica. 2014. Mapping the ideological marketplace. American Journal of Political Science, Vol. 58, 2 (2014), 367--386.Google ScholarCross Ref
Adam R Brown. 2011. Wikipedia as a data source for political scientists: Accuracy and completeness of coverage. PS: Political Science & Politics, Vol. 44, 2 (2011), 339--343.Google ScholarCross Ref
Ceren Budak, Sharad Goel, and Justin M Rao. 2016. Fair and balanced? Quantifying media bias through crowdsourced content analysis. Public Opinion Quarterly, Vol. 80, S1 (2016), 250--271.Google ScholarCross Ref
Barry C Burden, Gregory A Caldeira, and Tim Groseclose. 2000. Measuring the ideologies of US senators: The song remains the same. Legislative Studies Quarterly (2000), 237--258.Google Scholar
Minmin Chen, Zhixiang Xu, Kilian Q Weinberger, and Fei Sha. 2012. Marginalized denoising autoencoders for domain adaptation. In Proceedings of the International Conference on Machine Learning. 767--774. Google ScholarDigital Library
Joshua Clinton, Simon Jackman, and Douglas Rivers. 2004. The statistical analysis of roll call data. American Political Science Review, Vol. 98, 2 (2004), 355--370.Google ScholarCross Ref
Raviv Cohen and Derek Ruths. 2013. Classifying political orientation on Twitter: It's not easy!. In Proceedings of the 7th International AAAI Conference on Web and Social Media .Google Scholar
Sanmay Das and Allen Lavoie. 2014. Automated inference of point of view from user interactions in collective intelligence venues. In Proceedings of the International Conference on Machine Learning. 82--90. Google ScholarDigital Library
Sanmay Das, Allen Lavoie, and Malik Magdon-Ismail. 2016. Manipulation among the arbiters of collective intelligence: How Wikipedia administrators mold public opinion. ACM Transactions on the Web (TWEB), Vol. 10, 4 (2016), 24:1--24:25. Google ScholarDigital Library
Sanmay Das and Malik Magdon-Ismail. 2010. Collective wisdom: Information growth in wikis and blogs. In Proceedings of the ACM Conference on Electronic Commerce. 231--240. Google ScholarDigital Library
Stefano DellaVigna and Ethan Kaplan. 2007. The Fox News effect: Media bias and voting. The Quarterly Journal of Economics, Vol. 122, 3 (2007), 1187--1234.Google ScholarCross Ref
Daniel Diermeier, Jean-Francc ois Godbout, Bei Yu, and Stefan Kaufmann. 2012. Language and ideology in Congress . British Journal of Political Science, Vol. 42, 1 (2012), 31--55.Google ScholarCross Ref
Robert M Entman. 1989. How the media affect what people think: An information processing approach. The Journal of Politics, Vol. 51, 2 (1989), 347--370.Google ScholarCross Ref
Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, Francc ois Laviolette, Mario Marchand, and Victor Lempitsky. 2016. Domain-adversarial training of neural networks. The Journal of Machine Learning Research, Vol. 17, 1 (2016), 2096--2030. Google ScholarDigital Library
Matthew Gentzkow and Jesse M Shapiro. 2010. What drives media slant? Evidence from U.S. daily newspapers. Econometrica, Vol. 78, 1 (2010), 35--71.Google ScholarCross Ref
Matthew Gentzkow, Jesse M Shapiro, and Matt Taddy. 2019. Measuring polarization in high-dimensional data: Method and application to Congressional speech. Econometrica (2019). Forthcoming.Google Scholar
Sean Gerrish and David M Blei. 2011. Predicting legislative roll calls from text. In Proceedings of the 28th International Conference on Machine Learning. 489--496. Google ScholarDigital Library
Xavier Glorot, Antoine Bordes, and Yoshua Bengio. 2011. Domain adaptation for large-scale sentiment classification: A deep learning approach. In Proceedings of the 28th International Conference on Machine Learning. 513--520. Google ScholarDigital Library
Shane Greenstein and Feng Zhu. 2012. Is Wikipedia biased? American Economic Review, Vol. 102, 3 (2012), 343--348.Google ScholarCross Ref
J. Grimmer and B. M. Stewart. 2013. Text as data: The promise and pitfalls of automatic content analysis methods for political texts. Political Analysis (2013), 1--31.Google Scholar
Tim Groseclose and Jeffrey Milyo. 2005. A measure of media bias. The Quarterly Journal of Economics, Vol. 120, 4 (2005), 1191--1237.Google ScholarCross Ref
Justin H Gross, Brice Acree, Yanchuan Sim, and Noah A Smith. 2013. Testing the Etch-a-Sketch hypothesis: A computational analysis of Mitt Romney's ideological makeover during the 2012 Primary vs. General Elections. In APSA Annual Meeting .Google Scholar
Daniel E Ho, Kevin M Quinn, et almbox. 2008. Measuring explicit political positions of media. Quarterly Journal of Political Science, Vol. 3, 4 (2008), 353--377.Google ScholarCross Ref
Kosuke Imai, James Lo, and Jonathan Olmsted. 2016. Fast estimation of ideal points with massive data. American Political Science Review, Vol. 110, 4 (2016), 631--656.Google ScholarCross Ref
Mohit Iyyer, Peter Enns, Jordan Boyd-Graber, and Philip Resnik. 2014. Political ideology detection using recursive neural networks. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Vol. 1. 1113--1122.Google ScholarCross Ref
Keith Krehbiel. 2010. Pivotal politics: A theory of US lawmaking .University of Chicago Press.Google Scholar
Jure Leskovec, Lars Backstrom, and Jon Kleinberg. 2009. Meme-tracking and the dynamics of the news cycle. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 497--506. Google ScholarDigital Library
Wei-Hao Lin, Eric Xing, and Alexander Hauptmann. 2008. A joint topic and perspective model for ideological discourse. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 17--32.Google ScholarDigital Library
Nolan McCarty, Keith T Poole, and Howard Rosenthal. 2016. Polarized America: The Dance of Ideology and Unequal Riches .MIT Press.Google Scholar
T. Mikolov, K. Chen, G. S. Corrado, and J. Dean. 2013. Efficient estimation of word representations in vector space. In Proceedings of the 1st International Conference on Learning Representations . http://arxiv.org/abs/1301.3781Google Scholar
Sendhil Mullainathan and Jann Spiess. 2017. Machine learning: An applied econometric approach. Journal of Economic Perspectives, Vol. 31, 2 (2017), 87--106.Google ScholarCross Ref
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, Vol. 12 (October 2011), 2825--2830.Google ScholarDigital Library
Keith T Poole and Howard Rosenthal. 1997. Congress: A political-economic history of roll call voting .Oxford University Press.Google Scholar
Richard Socher, Brody Huval, Christopher D Manning, and Andrew Y Ng. 2012. Semantic compositionality through recursive matrix-vector spaces. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics, 1201--1211. Google ScholarDigital Library
Richard Socher, Jeffrey Pennington, Eric H Huang, Andrew Y Ng, and Christopher D Manning. 2011. Semi-supervised recursive autoencoders for predicting sentiment distributions. In Proceedings of the Conference on Empirical Methods in Natural Language Processing . Association for Computational Linguistics, 151--161. Google ScholarDigital Library
Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D Manning, Andrew Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing . 1631--1642.Google Scholar
Matt Thomas, Bo Pang, and Lillian Lee. 2006. Get out the vote: Determining support or opposition from Congressional floor-debate transcripts. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 327--335. Google ScholarDigital Library

Index Terms

The Congressional Classification Challenge: Domain Specificity and Partisan Intensity
1. Applied computing
  1. Law, social and behavioral sciences
    1. Economics
2. Computing methodologies
  1. Machine learning

Recommendations

E-State: Realistic or Utopian?

Information and Communication Technology ICT is known to facilitate governance and citizen participation in States' decision making processes. However, e-governance researchers have argued that beyond the current use of ICT to facilitate already ...
Read More
The representative's problem: how legislators use communication to secure constituent support
PLEAD '13: Proceedings of the 2nd workshop on Politics, elections and data

In a representative democracy elected officials face what I call the representative's problem. Elected officials work in Washington to provide representation, yet constituents lack the incentive and capacity to track what their representative does in ...
Read More
Institutional Investor Activism and Employee Safety: The Role of Activist and Board Political Ideology
Although prior research on shareholder activism has highlighted how such activism can economically benefit the shareholders of targeted firms, recent studies also suggest that shareholder activism can economically disadvantage nonshareholder stakeholders, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
EC '19: Proceedings of the 2019 ACM Conference on Economics and Computation
June 2019
947 pages
ISBN:9781450367929
DOI:10.1145/3328526
General Chair:
Anna Karlin
University of Washington, USA
,
Program Chairs:
Nicole Immorlica
Microsoft Research, USA
,
Ramesh Johari
Stanford University, USA
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 June 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
domain adaptation
partisanship
political ideology
political science
text classification
Qualifiers
- research-article
Conference

Acceptance Rates
EC '19 Paper Acceptance Rate106of382submissions,28%Overall Acceptance Rate664of2,389submissions,28%
More
Upcoming Conference
EC '24

Sponsor:

sigecom

The 25th ACM Conference on Economics and Computation

July 8 - 11, 2024

New Haven , CT , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 259
  Total Downloads
- Downloads (Last 12 months)31
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

The Congressional Classification Challenge: Domain Specificity and Partisan Intensity

EC '19: Proceedings of the 2019 ACM Conference on Economics and Computation

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

E-State: Realistic or Utopian?

The representative's problem: how legislators use communication to secure constituent support

Institutional Investor Activism and Employee Safety: The Role of Activist and Board Political Ideology