Introduction

We propose a general method for identifying cohesive groups in signed networks (networks with positive and negative edges), and apply it to political networks, which have become a common focus in complex network analysis1,2,3. Specifically, we examine signed networks of political collaboration and opposition to identify the members of polarized coalitions in the US Congress, then use these coalitions to examine the impact of polarization on effectiveness in passing bills.

In legislative bodies where most pairs of legislators co-sponsor bills, a network of who co-sponsors with whom becomes a highly dense network which would not be suitable for studying political alliances, coalitions, and polarization. Instead, we use signed networks4 created based on significantly many and significantly few co-sponsorships as two types of edges with opposite nature where a stochastic degree sequence model (SDSM)5 is used as the null model, to define thresholds of “many” and “few”. Previous research6 on the same data has shown an increase in polarization in the US Congress when measured by the triangle index, which provides a locally-aggregated index of polarization based on structural balance7. However, the triangle index only measures the level of balance and polarization, but does not identify the members of the political coalitions that are polarized. For this we turn to the frustration index8,9,10 (also known as the line index of balance11), which optimally partitions a signed graph into two opposing but internally cohesive “coalitions”12. Substantively, these coalitions13 represent groups of legislators who sponsor significantly many bills with each other (i.e. are political allies), but who sponsor significantly few bills with those in the other coalition (i.e. are political enemies). In our analyses of legislative effectiveness, we focus on the level of partisanship within the largest, and therefore controlling, coalition.

Computing the frustration index is an NP-hard problem14, and so is the equivalent partitioning problem that deals with minimizing the total number of intra-group negative and inter-group positive edges. The optimality of a numerical solution to an instance of an optimization problem depends on the function under optimization. Most studies on this topic use heuristic methods for partitioning signed networks under similar objectives15,16,17,18. These methods are not guaranteed to provide the optimal solution or even its approximation within a constant factor14,19, but can potentially be implemented on larger networks.

Computing the exact value of the frustration index, in principle, involves searching among all possible ways to partition a given signed network into \(k\le 2\) groups in order to find the partitioning which minimizes the total number of intra-group negative and inter-group positive edges. We propose a new method for tackling the intensive computations by providing upper and lower bounds for this number, then solving an optimization model which closes the gap between the two bounds and returns the exact value of frustration index alongside the optimal partitioning of vertices.

Signed Graph and Balance Theory Preliminaries

In this section, we recall some basic definitions of signed graphs and balance theory.

Signed graphs

We consider an undirected signed graph \(G=(V,E,\sigma )\) where V and E are the sets of vertices and edges respectively, and σ is the sign function \(\sigma :E\to \{\,-\,1,+\,1\}\). Graph G contains \(|V|=n\) nodes. The set E of edges contains \({m}^{-}\) negative edges and \({m}^{+}\) positive edges adding up to a total of \(|E|=m={m}^{+}+{m}^{-}\) edges. The signed adjacency matrix and the unsigned adjacency matrix are denoted by A and |A| respectively. Their entries are defined in Eqs. (1) and (2).

$${a}_{uv}=\left\{\begin{array}{ll}{\sigma }_{(u,v)} & {\rm{if}}\,(u,v)\in E\\ 0 & {\rm{if}}\,(u,v)\notin E\end{array}\right.$$
(1)
$$|{a}_{uv}|=\left\{\begin{array}{ll}1 & {\rm{if}}\,(u,v)\in E\\ 0 & {\rm{if}}\,(u,v)\notin E\end{array}\right.$$
(2)

Balance and cycles

A cycle of length \(k\) in \(G\) is a sequence of nodes \({v}_{0},{v}_{1},\ldots ,{v}_{k-1},{v}_{k}\) such that for each \(i=1,2,\ldots ,k\) there is an edge from \({v}_{i-1}\) to \({v}_{i}\) and the nodes in the sequence except for \({v}_{0}={v}_{k}\) are distinct. The sign of a cycle is the product of the signs of its edges. A cycle with negative (positive) sign is unbalanced (balanced). A balanced network (graph) is one with no negative cycles.

Balance theory is conceptualized by Heider in the context of social psychology20. It was then formulated as a set of graph-theoretic conditions by Cartwright and Harary21 which define a signed graph to be balanced if all its cycles are positive. Cartwright and Harary also introduce measuring the level of balance using, among other indices, the fraction of positive cycles [21, page 288]. Three years later, Harary suggested using frustration index11 (under a different name); a measure which satisfies key axiomatic properties10, but has been underused for decades due to the complexity involving its computation19,22,23.

Evaluating Balance and Frustration

In this section, we explain our computational approach to analyzing signed networks by providing brief definitions and discussions on measuring balance, frustration and partitioning, and graph optimization models.

Measuring partial balance

Signed networks representing real data are often unbalanced, which motivates measuring the intermediate level of partial balance10. The first measure we use is triangle index denoted by \(T(G)\) which equals the fraction of positive cycles of length 321,24. We use Eq. (3) suggested in7 for computing triangle index, \(T(G)\), in which \({\rm{Tr}}({\bf{A}})\) denotes the trace (sum of diagonal entries) of A.

$$T(G)=\frac{{\rm{Tr}}({{\bf{A}}}^{3})+{\rm{Tr}}(|{\bf{A}}{|}^{3})}{2{\rm{Tr}}(|{\bf{A}}{|}^{3})}$$
(3)

The other measure we use is the normalized frustration index10 denoted by \(F(G)\) which is based on normalizing the minimum number of edges whose removal results in a balanced graph8,11,25.

Figure 1(A) shows an example signed graph in which the three dotted lines represent negative edges and the four solid lines represent positive edges. The level of balance in this signed graph can be evaluated using triangles (B) or frustration (C). The former approach, (B), involves identifying triangle 1-4-5 as unbalanced and triangle 1-3-4 as balanced leading to the numeric index \(T(G)=1\)/2. The latter approach, (C), involves finding a partitioning of vertices \(\{\{1,2,3\},\{4,5\}\}\) (shown by green and purple colors in Fig. 1) which minimizes the total number of intra-group negative and inter-group positive edges to 1 (only edge \((1,5)\) according to this partitioning). Note that removing edge \((1,5)\) leads to a balanced signed graph.

Figure 1
figure 1

(A) An example signed network. (B) Evaluating balance using triangles. (C) Evaluating balance using frustration.

Frustration and partitioning

Given signed graph \(G=(V,E,\sigma )\), we can partition V into two subsets: X and \(V\)\\(X\). We let binary variable \({x}_{i}\) denote the subset which node \(i\) belongs to under partitioning \(\{X,V\)\\(X\}\), where \({x}_{i}=1\) if \(i\in X\) and \({x}_{i}=0\) otherwise.

A positive edge \((i,j)\in {E}^{+}\) is said to be frustrated if its endpoints \(i\) and \(j\) belong to different subsets (\({x}_{i}\ne {x}_{j}\)). A negative edge \((i,j)\in {E}^{-}\) is said to be frustrated if its endpoints \(i\) and \(j\) belong to the same subset (\({x}_{i}={x}_{j}\)). We define the frustration count \({f}_{G}(X)\) as the number of frustrated edges of G under partitioning \(\{X,V\)\\(X\}\):

$${f}_{G}(X)=\sum _{(i,j)\in E}\,{f}_{ij}(X)$$

where \({f}_{ij}(X)\) is the frustration state of edge \((i,j)\), given by

$${f}_{ij}(X)=\left\{\begin{array}{ll}0, & {\rm{if}}\,{x}_{i}={x}_{j}\,{\rm{and}}\,(i,j)\in {E}^{+}\\ 1, & {\rm{if}}\,{x}_{i}={x}_{j}\,{\rm{and}}\,(i,j)\in {E}^{-}\\ 0, & {\rm{if}}\,{x}_{i}\ne {x}_{j}\,{\rm{and}}\,(i,j)\in {E}^{-}\\ 1, & {\rm{if}}\,{x}_{i}\ne {x}_{j}\,{\rm{and}}\,(i,j)\in {E}^{+}.\end{array}\right.$$
(4)

The frustration index of a graph \(G\) can be computed exactly by finding partitioning \({X}^{\ast },V\)\\({X}^{\ast }\subseteq V\) of \(G\) that minimizes the frustration count \({f}_{G}(X)\), i.e. solving Eq. (5)19,22.

$$L(G)=\mathop{{\rm{\min }}}\limits_{X\subseteq V}\,{f}_{G}(X)$$
(5)

The normalized frustration index, \(F(G)\), is computed based on \(L(G)\) and according to Eq. (6) which allows measuring the level of partial balance based on numerical values within the unit interval (m denotes the number of edges).

$$F(G)=1-2L(G)/m$$
(6)

One may notice some similarities between the problem of finding communities in unsigned networks26,27,28,29,30,31 and that of partitioning signed networks to minimize the frustration count. One key difference is that in the latter problem for every pair of vertices there are three cases (as opposed to two): a positive edge, a negative edge, or no edge between the two vertices. Due to the differences between objectives of these two problems (minimizing frustration count as opposed to maximizing modularity or other quantities), the partitioning obtained from running community detection algorithms on positive edges of a signed graph will not generally minimize the frustration count.

Recent studies on frustration index and signed networks suggest19,22 and implement23 efficient graph optimization models to compute the frustration index of relatively large (up to 105 edges) sparse networks. However, the signed networks we analyze have substantially higher densities compared to the instances in19,22,23. This requires developing new computational models for tackling the intensive computations involved in obtaining the frustration index of dense graphs.

Bounding the frustration index

In this subsection, we discuss obtaining lower and upper bounds for the frustration index. Using these bounds is a way of substantially reducing the running time, but theoretically they are not required.

The linear programming relaxation (LP relaxation) of the binary optimization models in19,22 can be used to compute a lower bound for the frustration index. The linear programming model in Eq. (7) is developed for this purpose.

$$\begin{array}{ccc} & \mathop{min}\limits_{{x}_{i},{x}_{ij}}\,Y\,= & \sum _{(i,j)\in {E}^{+}}\,{x}_{i}+{x}_{j}-2{x}_{ij}+\,\sum _{(i,j)\in {E}^{-}}\,1-({x}_{i}+{x}_{j}-2{x}_{ij})\\ {\rm{s}}.{\rm{t}}. & {x}_{ij}\le ({x}_{i}+{x}_{j})/2 & {\rm{\forall }}(i,j)\in {E}^{+}\\ & {x}_{ij}\ge {x}_{i}+{x}_{j}-1 & {\rm{\forall }}(i,j)\in {E}^{-}\\ & {x}_{i}+{x}_{jk}\ge {x}_{ij}+{x}_{ik} & {\rm{\forall }}(i,j,k)\in T\\ & {x}_{j}+{x}_{ik}\ge {x}_{ij}+{x}_{jk} & {\rm{\forall }}(i,j,k)\in T\\ & {x}_{k}+{x}_{ij}\ge {x}_{ik}+{x}_{jk} & {\rm{\forall }}(i,j,k)\in T\\ & 1+{x}_{ij}+{x}_{ik}+{x}_{jk}\ge {x}_{i}+{x}_{j}+{x}_{k} & {\rm{\forall }}(i,j,k)\in T\\ & {x}_{i}\in [0,1] & {\rm{\forall }}i\in V\\ & {x}_{ij}\in [0,1] & {\rm{\forall }}(i,j)\in E\end{array}$$
(7)

In Eq. (7), \(T=\{(i,j,k)\in {V}^{3}\)|\((i,j),(i,k),(j,k)\in E\}\) is the set which contains ordered 3-tuples of nodes whose edges form a triangle in \(G\). The continuous linear programming model in Eq. (7) is developed by combining the LP relaxation of the 0/1 linear model in22 [Subsection 4.3] and the triangle constraints in22 [Subsection 4.4]. It follows from the LP relaxation that the optimal solution \({Y}^{\ast }\) to the model in Eq. (7) is a lower bound for the frustration index \({Y}^{\ast }\le L(G)\).

Any given partitioning \(\{X,V\)\\(X\}\) for signed graph \(G\) is associated with a frustration count \({f}_{G}(X)\) which is by definition (as in Eq. (5)) an upper bound for the frustration index

$${f}_{G}({X}^{\ast })=L(G)\le {f}_{G}(X)\,\forall X\subseteq V.$$

We use a specific partitioning \(\{X{\prime} ,V\)\\(X{\prime} \}\) as a starting point to “warm-start” the algorithm for computing the frustration index. Partitioning \(\{X{\prime} ,V\)\\(X{\prime} \}\) groups nodes into two subsets based on the party affiliation of legislators. To be more precise, for node \(i\) which represents a legislator, decision variable \({x}_{i}\) is given initial value 0 if the reciprocal legislator is a Democrat and \({x}_{i}\) is given initial value 1 otherwise.

Computing the frustration index

After bounding the frustration index, we use the binary linear programming model in Eq. (8) which minimizes the number of frustrated edges.

$$\begin{array}{rrl} & \mathop{{\rm{\min }}}\limits_{{x}_{i},{f}_{ij}}\,Z\,= & \sum _{(i,j)\in E}\,{f}_{ij}\\ {\rm{s}}.{\rm{t}}. & {f}_{ij}\ge {x}_{i}-{x}_{j} & \forall (i,j)\in {E}^{+}\\ & {f}_{ij}\ge {x}_{j}-{x}_{i} & \forall (i,j)\in {E}^{+}\\ & {f}_{ij}\ge {x}_{i}+{x}_{j}-1 & \forall (i,j)\in {E}^{-}\\ & {f}_{ij}\ge 1-{x}_{i}-{x}_{j} & \forall (i,j)\in {E}^{-}\\ & \sum _{(i,j)\in E}\,{f}_{ij}\ge {Y}^{\ast } & \\ & {x}_{i}\in \{0,1\} & \forall i\in V\\ & {f}_{ij}\in \{0,1\} & \forall (i,j)\in E\end{array}$$
(8)

The binary variables of the model are \({f}_{ij}\,\forall (i,j)\in E\) which denotes frustration of edge \((i,j)\) and \({x}_{i}\,\forall i\in V\) which denotes the subset of node i. To warm-start the algorithm which solves Eq. (8), we initialize \({x}_{i}\) variables based on partitioning \(\{X{\prime} ,V\)\\(X{\prime} \}\). The model in Eq. (8) is developed by combining the XOR model in19 [Subsection 3.2] with an additional constraint to incorporate the lower bound \({Y}^{\ast }\) obtained from Eq. (7). We implement the speed-up techniques discussed in19 and solve the binary linear programming model in Eq. (8) using Gurobi solver (version 8.0)32 on a virtual machine with 32 Intel Xeon CPU E5-2698 v3 @ 2.30 GHz processors and 32 GB of RAM running 64-bit Microsoft Windows Server 2012 R2 Standard.

Results

In this section, we provide the results of analyzing balance and frustration in signed networks of US Congress legislators.

Partial balance, frustration, and optimal partitioning

We evaluate the level of partial balance using two different methods. Figure 2 illustrates partial balance in the signed networks of the US Congress over time measured by the triangle index and normalized frustration index. Values of the two measures, the triangle index \(T(G)\) and the normalized frustration index \(F(G)\), are highly correlated (correlation coefficients are 0.95 and 0.91 respectively for House and Senate networks) and both show relatively high levels of partial balance which have increased in the time period 1979–2016. The results in Fig. 2 indicate an increase in the polarization of both chambers of US Congress, which is in accordance with the literature6,33,34,35,36. Although the triangle index T(G) and the normalized frustration index F(G) capture very similar information concerning the level of partial balance, only the computation of F(G) also provides the partitioning that minimizes the sum of intra-group negative and inter-group positive edges.

Figure 2
figure 2

Two measures of partial balance indicating an overall increase in political polarization in (A) US House of Representatives and (B) US Senate over the time period 1979–2016.

Solving the continuous optimization model in Eq. (7) and the discrete (binary) optimization model in Eq. (8) requires intensive computations for large instances such as signed networks of the House. Given the size and density of these instances, the models in Eqs. (7) and (8) have thousands of variables and possibly millions of constraints requiring a high performance computer taking advantage of parallel computing capabilities23.

For example, the signed graph instance of the 113th House session has m = 75,771 edges and |T| = 7,102,625 triangles which result in a total of m + 4|T| = 28,486,271 constraints for the model in Eq. (7). Gurobi solver takes 5300 seconds (around 1.5 hours) to solve the model in Eq. (7) to global optimality and return \({Y}^{\ast }\), the lower bound for the frustration index. For the same instance, the discrete optimization model in Eq. (8) has n + m = 76,218 binary variables and 2m + 1 = 151,543 constraints. This large instance takes 43,523 seconds (around 12 hours) for Gurobi to reach global optimality and return the frustration index and the partitioning of the nodes. In total for the 113th session of the House, it takes 48,823 seconds (around 13.5 hours) to compute the exact value of the frustration index which is the longest solve time among all instances. The average computation time for House instances is 17,763 seconds (around 5 hours) and the standard deviation is 15,411 seconds (around 4.5 hours). For Senate instances, the average computation time is 4 seconds and the standard deviation is 6 seconds.

Using the optimal values of the \({x}_{i}\) variables obtained by solving the discrete optimization model in Eq. (8), we partition nodes of each network into two groups (subsets \({X}^{\ast }\), \(V\)\\({X}^{\ast }\)). For each signed network, either \({X}^{\ast }\) or \(V\)\\({X}^{\ast }\) has the larger set cardinality and therefore represents the largest coalition for the corresponding session.

We evaluate the composition of the largest and therefore controlling coalitions in each session and chamber based on the party affiliation of its legislators. Figure 3 illustrates the number of legislators from the two main political parties in the controlling coalitions of the US Congress. As it can be seen in Fig. 3, the controlling coalitions have become more homogeneous (i.e. partisan) over the time period 1979–2016.

Figure 3
figure 3

The number of legislators from the two main parties who belong to the controlling coalition indicating an increase in the partisan homogeneity of the controlling coalition in (A) US House of Representatives and (B) US Senate over the time period 1979–2016.

Legislative effectiveness and polarization in the US congress

Within the field of comparative US politics, two topics attract particular attention at the federal level: legislative effectiveness and political polarization. Legislative effectiveness refers to the ability of individual legislators37,38, or of an entire legislative body39, to advance their agenda, typically by facilitating the passage of legislation. Political polarization (when applied to elected officials or “elites”) refers to the formation of non-overlapping ideologically homogeneous groups6,33. When these groups mirror political party affiliations, it is also called partisan polarization. For several decades, legislative effectiveness in the US has declined (as illustrated in Fig. 4), while partisan polarization has increased6. These trends have led many to hypothesize that they are related, and specifically that “unified party control has [not] been legislatively more productive than divided party control” [40, xii]. Based on the legislative process used by the US Congress, it might be expected that a chamber’s bills are more likely to become law when the controlling party holds a larger majority, because its members can form a voting bloc. However, the analysis in the next section suggests that that changes in bill passage rates are better explained by the partisanship of a chamber’s largest coalition.

Figure 4
figure 4

The fraction of bills that eventually become law (bill passage rate) indicating legislative effectiveness in (A) US House of Representatives and (B) US Senate over the time period 1979–2016.

Mediation in bill passage

Using a bivariate linear regression, we find that the percentage of bills introduced in a chamber that become law (passage rate) significantly declines over time. The passage rate has declined in the House by an average of 0.11 percentage points each session (\(\beta =-\,0.528,p < 0.05\)), and in the Senate by an average of 0.35 percentage points each session (\(\beta =-\,0.852,p < 0.01\); see Fig. 5(A)). We report standardized coefficients (β) to facilitate cross-model comparisons. The percentage point changes are unstandardized bivariate regression coefficients, reported here for context. These variations in passage rate are not simply a function of the total number of bills introduced (House: \(r=-\,0.29,p=0.22\); Senate: \(r=-\,0.08,p=0.74\)), which exhibit no association with time (House: \(r=-\,0.34,p=0.15\); Senate: \(r=0.19,p=0.43\); see Tables S2 and S4).

Figure 5
figure 5

Predicting the rate of bill passage in the US House of Representatives and Senate; Standardized coefficients are reported; **p < 0.01, *p < 0.05. (A) Over time, the passage rate has declined. (B) The decline is not mediated by changes in the size of a party’s majority in a chamber. (C) In the House, but not the Senate, it is mediated by the partisan homogeneity in the controlling coalition.

To investigate possible explanations for variations in passage rate, we estimate two separate structural equation models for each chamber. A commonsense model tests the expectation that when the majority party holds a larger numerical majority, they should have greater success passing bills41. The key variable in this model, party control, is defined as the absolute difference between the number of Republicans and Democrats. Computing party control does not require any information about the legislators’ network. We find no support for this model; party control does not mediate the relationship between time and passage rate (see Fig. 5(B)).

A more nuanced model tests the expectation that when the controlling coalition is more partisan and thus more ideologically unified, it will have greater success passing bills. The key variable in this model, coalition partisanship, is defined as the fraction of non-independent members in the largest coalition that affiliate with that coalition’s dominant political party (see Fig. 3). We compute coalition partisanship by applying the partition method described above to a signed network of political collaboration and opposition. We find support for this model in the US House, but not the US Senate (see Fig. 5(C)). Specifically, we find that in the House, the partisan homogeneity of the controlling coalition has increased over time (\(\beta =0.771,p < 0.01\)), which is consistent with past findings concerning polarization6, and that coalition homogeneity increases passage rates (\(\beta =0.661,p < 0.05\)), which is consistent with our expectation about the impact of ideological unity. Together, these effects imply a significant and positive indirect effect of time on the passage rate (\(\beta =0.510,p < 0.05\)), mediated by coalition partisanship. Thus, the observed decline in bill passage rates in the US House would have been worse (direct effect: \(\beta =-\,1.038,p < 0.01\)), but was mitigated by increasingly ideologically homogeneous coalitions, which are a protective factor against declines in legislative effectiveness.

Summary and Conclusions

In this study we proposed a general method for identifying internally cohesive opposing coalitions in signed networks of legislators based on structural balance theory, then applied this method to identify opposing coalitions in the US Congress, showing that these coalitions’ partisanship can explain changes in legislative effectiveness better than political parties. Based on this analysis, we offer a series of substantive and methodological conclusions.

Consistent with prior studies6,33,34,35,36, we find that polarization has increased in both the US Senate and US House of Representatives, and that this polarization has largely mirrored partisan divisions along political party lines. We operationalized polarization using the level of a signed graph’s structural balance, and therefore measure what6 calls “strong polarization,” but have used two different measures of balance. We find that the two measures are highly correlated and both support the conclusion of increasing polarization.

The triangle index is easy to compute, but provides only a locally-aggregated measure of a graph’s level of balance. In contrast, computing the frustration index is difficult, but it provides not only a global measure of a graph’s level of balance, but also the optimal partitioning of vertices into internally cohesive but mutually antagonist groups. We have demonstrated a practical method for computing the exact value of frustration index and identifying the optimal partition in dense graphs of \(|E|\gg 50000\) that involves first obtaining upper and lower bounds, using exogenous node properties (e.g. legislators’ political party affiliations), and solving a large-scale binary linear programming model. In the context of legislative networks, this method allows us to identify the most cohesive coalitions of legislators under conditions of balance theory.

Although our computational innovations make the identification of internally cohesive opposing coalitions practically feasible, we must also demonstrate that these coalitions are more informative than other simpler grouping possibilities. In the legislative context, we show that the partisan composition of these cohesive coalitions better explains the declining legislative effectiveness in the US House of Representatives than simply examining legislators’ political party affiliations. This affirms Mayhew’s claim that “no theoretical treatment of the United States Congress that posits parties as analytic units will go very far” [42, p.27] but goes a step further by identifying an alternative analytic unit – internally cohesive opposing coalitions – that does have explanatory power. Importantly, coalitions appear useful only for explaining the legislative effectiveness of the House, but not the Senate. However, this is also consistent with existing political science theory that “the lack of majority control of [procedural] processes in the Senate negates the possibility of significant party [or other group-based] effects in that body” [43, p.7]. Therefore, in general terms, our empirical findings suggest that in legislative bodies where a sufficiently large group of legislators can influence procedural processes, the composition of the largest coalition is more important than the size of the majority party’s majority. This is perhaps obvious in parliamentary systems where multi-party coalition forming is essential, but is noteworthy in the non-parliamentary US Congress.

These conclusions have some significant implications for both the future study of signed networks, and of the link between polarization and legislative effectiveness. First, by providing a practical method for computing the frustration index of relatively dense graphs, we hope to move the study of signed graphs beyond merely determining the level of balance, and toward the study of how the composition of mostly opposing groups impact other network dynamics. Second, our empirical findings suggest that research on polarization and its impact on the legislative process should look beyond political parties and partisanship to more subtle but influential forms of coordination, such as internally cohesive coalitions which are antagonist towards one another.

Materials and Methods

Relations of collaboration and opposition between elected officials are difficult to collect directly because politicians have limited time to participate in surveys and have good reasons to conceal their true political relations. Therefore, studies of elected officials’ political networks typically measure these relations indirectly, using bipartite projections focusing on their co-sponsorship of bills44, co-voting on bills36,45,46, co-membership on committees47, and co-attendance at press events48. For a range of substantive reasons noted by6 (e.g. relatively few bills are actually voted on, committee memberships are driven by such non-ideological factor such as seniority), we examine political relations from bill co-sponsorship.

Specifically, we use a signed network of inferred political relations among the members of the US House of Representatives, and among the members of the US Senate, in each session of Congress from 1979 to 2016 (96th session – 114th session). The process for creating these signed networks is described in detail by6 and they are available in a public Figshare data repository4.

Inferring signed networks from co-sponsorship data

Importantly, all pairs of legislators co-sponsor at least some of the same bills, so we know that the mere existence of some co-sponsorships does not imply they collaborate, and that some number of co-sponsorships can actually indicate avoidance. In previous work6, a stochastic degree sequence model (SDSM)5 is used to define thresholds of significantly few and significantly many co-sponsorships by building the empirical sampling distribution of two legislators’ joint co-sponsorships under a null model in which each legislator co-sponsored approximately the same number of bills and each bill received approximately the same number of co-sponsorships (i.e. holding approximately constant the legislator and bill degree sequence). To be more specific, given a bipartite graph B, Monte Carlo methods can be used to generate probability distributions \(B{B{\prime} }_{ij}\) when \({\Pr }({B}_{ij}=1)\) is a function of the row and column marginals of B6. Decisions about whether a given dyad represents significantly few or significantly many co-sponsorships are made by comparing their observed number of joint co-sponsorships to the empirical sampling distribution using a two-tailed \(\alpha =0.05\) threshold. For example, Fig. 6 shows that Rep. Earl Blumenhauer (D-OR3) and Rep. Sheila Jackson-Lee (D-TX18) were observed to have co-sponsored 242 of the same bills (dashed vertical line). The magnitude of joint co-sponsorships, and the fact that both representatives are Democrats, might lead one to conclude that they are collaborating. However, the shaded distribution shows the expected number of joint co-sponsorships under the SDSM null model in which each representative randomly chooses which bills to co-sponsor. Comparing these representatives’ observed number of joint co-sponsorships to the null model expectation, we find that they co-sponsor significantly fewer of the same bills than would be expected at random and therefore define the edge between them as negative.

Figure 6
figure 6

The observed number of co-sponsorships and the null model sampling distribution obtained using the SDSM for Rep. Earl Blumenhauer (D-OR3) and Rep. Sheila Jackson-Lee (D-TX18).

This approach differs from other methods of reducing weighted graphs to binary or signed graphs49,50 because it explicitly incorporates information from the original bipartite data (i.e. legislators linked to bills), thereby ensuring it is not lost when these data are projected as a unipartite graph. Additionally,6 extracts signed backbone networks rather than the weighted bipartite projections because the weights in those projections are distorted by heterogeneity in the bipartite degree sequences (i.e. some legislators sponsor many bills, others sponsor few5,51).

Although data on earlier sessions are available, they were excluded because prior to the 96th session, House rules imposed a limit of 25 co-sponsors per bill, which artificially distorts co-sponsorship patterns and limits the usefulness of these data for inferring political networks52. These data do not distinguish between a bill’s “sponsor” and its “co-sponsors” because the former is simply the legislator whose name appears first in a potentially long list of legislators responsible for the bill’s introduction.

Although these data represent a time-series of legislative interactions, we examine the networks cross-sectionally for two reasons. First, there are a large number of joiners and leavers in each new session as incumbents lose their seats, freshmen join Congress, or representatives become senators, making most dynamic models impractical to estimate. Second, although some political relationships develop over long periods of time, the effectiveness of any particular session of Congress can be evaluated independently.