Paper The following article is Open access

Geometric randomization of real networks with prescribed degree sequence

, and

Published 29 May 2019 © 2019 The Author(s). Published by IOP Publishing Ltd on behalf of the Institute of Physics and Deutsche Physikalische Gesellschaft
, , Citation Michele Starnini et al 2019 New J. Phys. 21 053039 DOI 10.1088/1367-2630/ab1e1c

Download Article PDF
DownloadArticle ePub

You need an eReader or compatible software to experience the benefits of the ePub3 file format.

1367-2630/21/5/053039

Abstract

We introduce a model for the randomization of complex networks with geometric structure. The geometric randomization (GR) model assumes a homogeneous distribution of the nodes in a hidden similarity space and uses rewirings of the links to find configurations that maximize a connection probability akin to that of the popularity-similarity geometric network models. The rewiring preserves exactly the original degree sequence, thus preventing fluctuations in the degree cutoff. The GR model is manifestly simple as it relies upon a single free parameter controlling the clustering of the rewired network, and it does not require the explicit estimation of hidden degree variables. We demonstrate the applicability of GR by implementing it as a null model for the analysis of community structure. As a result, we find that geometric and topological communities detected in real networks are consistent, while topological communities are also detected in randomized counterparts as an effect of structural constraints.

Export citation and abstract BibTeX RIS

Original content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

1. Introduction

The practice of testing hypotheses against a properly specified control case, or null model, is at the heart of the scientific method. In network science [1], null models take typically the form of generative models that produce maximally random graph ensembles given some specific features [2, 3]. Beyond the unrealistic Erdös-Rényi random graph [4], these models were directed to generate random networks replicating specific features of real systems, like heterogeneous degree sequences [59], high levels of clustering [8, 10], communities [1113], and other additional properties in unweighted [1417] and weighted networks [1823]. Such models played a major role in discerning relevant patterns in the fabric of networks which are not attributable to specific constraints. Many successful applications include the detection of over-represented motifs in networks [24], the quantification of communities using modularity [25], the detection of rich-club ordering [21, 26] and other degree–degree and higher order correlations [27], and the characterization of structural correlations in weighted networks [22].

However, null models for networks that incorporate geometric information are scarce and mainly focused on spatial networks [2830]. In fact, a geometric approach to the structure of complex networks has only started to be developed recently. A class of these models in hidden metric spaces [31, 32] explains many pivotal features of real networks simultaneously—including the small world property, heterogeneous degree distributions, high levels of clustering, and self-similarity—based only on three parameters controlling the average degree, the exponent of the power-law degree distribution and the clustering coefficient. In those models, the probability of connecting two nodes is determined by their distance in an underlying latent space. This distance is defined along two dimensions representing popularity and similarity features of the nodes, such that the more popular and the more similar two nodes are, the greater the chance to interact and be linked. Specifically, in the ${{ \mathcal S }}^{1}$ model [31], the hidden degree of a node is a proxy for its popularity, and nodes are assigned angular positions in a one-dimensional sphere (or circle) such that the angular separation between nodes provides a measure of similarity. The hidden degree can be reinterpreted as a radial coordinate in a hyperbolic plane [33], leading to the formulation of an isomorphic version, the ${{ \mathcal H }}^{2}$ model, that is purely geometric. In the ${{ \mathcal H }}^{2}$ model, networks are represented in the hyperbolic disk, where higher degree nodes are placed closer to the center, while the angular coordinate remains as in the ${{ \mathcal S }}^{1}$ similarity space, and the probability of connection decreases with the hyperbolic distance.

However, in both models, ${{ \mathcal S }}^{1}$ and ${{ \mathcal H }}^{2}$, the angular coordinate is uniformly distributed, at odds with the heterogeneous angular distributions observed in hyperbolic maps of real networks [3436]. Clusters of nodes lying nearby in the similarity space form indeed geometric communities [35, 36] (named soft or latent communities), that can be modeled in the geometric framework [37, 38]. This observation opens the door to the use of geometric models with homogeneous similarity distribution as null models for the investigation of the community organization and other structural properties of real networks.

In this paper, we present a rewiring procedure [39] based in the well-known popularity-similarity ${{ \mathcal S }}^{1}$ network model. The geometric randomization (GR) model, as we named it, preserves exactly the degree sequence of the input network while completely randomizes the angular coordinates of the nodes. Such randomization of the similarity coordinate supports the use of the GR as a null model for the analysis of the topological properties of real networks, including community structure. The GR model assumes the same form of the connection probability as in the ${{ \mathcal S }}^{1}$ model, and a uniform distribution for the similarity coordinate as well. In contrast, it is fit with a given degree-sequence. Gainfully, the use of prescribed degrees allows to skip the delicate task of estimating hidden degree variables from real data. This attribute can help, for instance, in the analysis of features which are specially sensitive to fluctuations in the degree cutoff, like the behavior of dynamical processes such as epidemic spreading or synchronization, or for high-fidelity reproduction of real network topologies. Based on the premises mentioned above, we propose an algorithm that homogenizes the angular distribution and rewires the links in a network preserving the given degrees and maximizing the likelihood that the new topology is generated by the geometric model. Moreover, we analyze the effects of the GR model on the topological properties of real and synthetic geometric networks, including community structure.

2. The GR model

The GR model operates on networks where nodes have an observed degree and exist in a similarity space. The similarity space is taken to be a circle, as in the ${{ \mathcal S }}^{1}$ or ${{ \mathcal H }}^{2}$ models (see appendix A). In those models every node i is characterized by a popularity-similarity pair of coordinates (κi, θi), where κi is the node's hidden degree (expected to be proportional to the observed degree ki) and θi its angular or similarity coordinate.

In the GR model, instead, only angular coordinates are assigned to the nodes, chosen uniformly at random from [0, 2π]. The network is then rewired in order to maximize the likelihood that the new topology is generated by the ${{ \mathcal S }}^{1}$ model while preserving the observed degrees, and thus the total number of edges E. The rewiring procedure is conducted by executing a Metropolis–Hastings algorithm, aimed at finding the network connectivity, i.e. the adjacency matrix aij, that preserves the observed degrees in the network while the congruency (measured in terms of the likelihood function, see appendix B) between the rewired topology and the ${{ \mathcal S }}^{1}$ model is maximized. The rewiring algorithm proceeds by repeating the following steps.

  • Choose two links at random (between nodes i and j and between nodes l and m).
  • Compute the probability of rewiring (connecting i and l and j and m) as the ratio ${p}_{r}={{ \mathcal L }}_{n}/{{ \mathcal L }}_{c}$, where ${{ \mathcal L }}_{c}$ corresponds to the value of the likelihood function before the swap and ${{ \mathcal L }}_{n}$ after the swap. Notice that pr can be calculated as ${{ \mathcal L }}_{n}$/${{ \mathcal L }}_{c}={\left(\tfrac{{\rm{\Delta }}{\theta }_{{ij}}{\rm{\Delta }}{\theta }_{{lm}}}{{\rm{\Delta }}{\theta }_{{il}}{\rm{\Delta }}{\theta }_{{jm}}}\right)}^{\beta }$ using only information about the angular coordinates of nodes (see also appendix B).
  • If ${p}_{r}\geqslant 1$ accept the link swap.
  • Otherwise, accept the link swap with probability pr.

The rewiring algorithm is terminated after a number E2 of edges are chosen to be swapped, ensuring that the likelihood has reached a plateau. Notice that at the end of the rewiring procedure the degrees of the nodes have not changed, but the resulting network might not be connected. Moreover, the GR model does not require to estimate the hidden degrees of the nodes because they do not enter in any step of the algorithm. Therefore, the GR model simply needs to assign uniformly distributed angular coordinates and give a value for the clustering parameter β, as discussed in detail in the next section.

GRs of networks can be also obtained using the ${{ \mathcal S }}^{1}$ model with parameters γ, β and μ—controlling the exponent of the power-law hidden degree distribution, the clustering coefficient, and the average degree, respectively—estimated from the empirical network. This alternative however, requires the explicit estimation of the hidden degree sequence P(κ) or of the exponent of the hidden degree distribution, and, thus, it may introduce undesired fluctuations in the degree cutoff which can induce relevant differences between the topological properties of real and ${{ \mathcal S }}^{1}$ generated networks.

3. Tuning clustering through parameter β

In order to apply the GR model to a real or synthetic network one simply needs to fix parameter β, which controls the level of clustering in the network [31]. Clustering is a signature of the metricity of geometric networks [40] and gives the connection between the observed topology and the underlying metric space, as a reflection of the triangle inequality.

Note that the value of β affects the probability to accept a link swap (see equation (B7)) so it determines the final network's structure. We address the role of β by applying the GR model to synthetic networks generated by the geometric preferential attachment (GPA) model [37] and the soft communities in similarity space (SCSS) model [38]. Both models are intended to produce synthetic networks with tunable community structure.

The GPA model generates geometric networks with soft-communities using a growing mechanism in the hyperbolic plane. The probability of connection depends on parameter Λ controlling the initial attractiveness of the different angular regions, such that the heterogeneity of the angular coordinate is a decreasing function of Λ, with ${\rm{\Lambda }}\to \infty $ recovering the homogeneous distribution. Notice that the degree distribution and the clustering coefficient in networks generated by the GPA model are independent of Λ. However, $\beta \to \infty $ by construction and, thus, the level of clustering is always the maximum possible. The SCSS model consists in an ${{ \mathcal S }}^{1}$ version for the generation of soft communities that allows to change the generated level of clustering as a function of β.

Figure 1(a) shows the average clustering coefficient $\langle c\rangle $ of a GPA network compared with the randomizations obtained by applying the GR model using different values of β. As expected, the average clustering of the rewired networks strongly depends on the value of β: the lower β, the lower $\langle c\rangle $ in the resulting network. A level of clustering similar to GPA values can be obtained in GR networks by using large values of β, such as $\beta =10$.

Figure 1.

Figure 1. (a) Average clustering $\langle c\rangle $ of a network generated by the GPA model (dashed line) and rewired versions (orange) obtained by applying the GR model with different values of β. The networks have size N = 103, exponent of the degree distribution γ = 2.5, number of links per node m = 4, and the initial attractiveness Λ = 0.1. (b) Average clustering of two networks generated with the SCSS model (dashed line) with attractiveness Λ = 0.1 and β0 = 1.5 in (b) and β0 = 3.5 in (c). Green bars indicate the $\langle c\rangle $ of networks obtained by applying the GR with β0 and with β, respectively.

Standard image High-resolution image

In figures 1(b), (c), we report the average clustering coefficient obtained by applying the GR model to synthetic networks generated with the SCSS model. The SCSS networks are produced using two different generating values, referred as β0. Figures 1(b), (c) show that it is possible to fine tune the value of β used by the GR networks so that they reproduce the same average clustering $\langle c\rangle $ as the original networks. If the generation value β0 is used for the rewiring, the level of clustering in the GR instances does not reach that in the original networks and remains smaller. This observation can be understood by noticing the following two points. First, for SCSS networks the $\langle c\rangle $ is independent of the level of angular clusterization, so any two SCSS networks with equal β0 and the same distribution of hidden degrees, P(κ), will have equal $\langle c\rangle $. Second, a GR instance of a SCSS network obtained using β0 would be one with homogeneous P(θ) and the same observed degree distribution P(k) as in the SCSS network. That is, if P(k) = P(κ) exactly, then the average clustering $\langle c\rangle $ reached by the GR instance with β0 would need to match that of the SCSS network. Since we do not observe this matching in figures 1(b), (c), we conclude it is due to differences between the distribution of observed and hidden degrees of the SCSS network.

4. Effects of GR in empirical networks

In the following, we apply the GR model to real networks. We consider six empirical networks from different domains: the network of flows of goods and services exchanged between USA industrial sectors in 2007 (Commodities) [48], the network of chords transitions in western popular music (Music) [41], the one-mode projection onto metabolites of the human metabolic network at the cell level (Metabolic) [35], the word adjacency network in Darwin's book On the Origin of Species (Words) [42], the email communication network within the Enron company (Enron) [43], and the Internet at the autonomous system level (Internet) [34, 44], see table 1 and appendix C for details.

Table 1.  Properties of the data sets under consideration: N, size of the network; G, fraction of the giant connected component of the GR network over N; γ, exponent of the power-law form fitting the degree distribution, P(k) ∼ kγ; parameter β0 estimated from the embedding of the real network; parameter β that preserves the level of clustering in the GR network; $\langle k\rangle $, average degree; and the D score (95% CI) of the KS test performed between the P(θ) distributions of the original networks and networks obtained by applying the GR model (see main text).

Data set N G γ β0 β $\langle k\rangle $ $\langle c\rangle $ DKS
Enron 33696 0.998 2.14 2.70 2.60 10.73 0.71 0.027
Comms. 374 0.994 2.50 1.06 1.25 5.83 0.22 0.144
Metabolic 1436 0.999 2.60 2.13 2.50 6.57 0.54 0.092
Words 7377 0.999 2.25 1.01 1.00 11.98 0.47 0.116
Internet 23748 0.977 2.16 1.88 2.20 4.92 0.61 0.123
Music 2476 0.999 2.27 2.50 2.65 16.66 0.82 0.072

As described in the previous section, β is the only free parameter of the model, and can be used to tune the clustering coefficient. In the following, we will show results by using a value of β ensuring that the average clustering of the rewired network is equal to that of the real one. Another possible choice for β is the value estimated when embedding the real network into the underlying metric space [34], which we indicate as β0 in table 1. The embedding method estimates the coordinates of the nodes in the underlying geometry by maximizing the likelihood that the observed topology has been produced by the model. In the process, β0 is estimated such that the expected clustering coefficient of the embedded network matches the observed clustering coefficient of the network topology. As explained in the previous section for synthetic networks, using β0 as the input in GR does not produce in general rewired networks with the same average clustering $\langle c\rangle $ as in the original networks. For real networks, the two values of β are very similar but not always identical, see table 1. The small difference is related with the fact that, for some real networks, the GR model cannot adjust simultaneously the empirical connection probability and the observed clustering using a single value of β, see figure 2. The rewired networks obtained by applying the GR model are mostly connected, the disconnected parts are extremely small compared with the rest. In table 1, we report the size of the giant connected component of the GR networks as a fraction of the original size of the network N, averaged over 10 realizations. In what follows, the analysis is carried out over the giant connected component of GR networks.

Figure 2.

Figure 2. Empirical connection probability for original (blue dots) and GR (orange dots) networks. Fraction of connected pairs of nodes as a function of χij = Δθij R/(μ κi κj). The black line shows the theoretical curve, equation (B.2).

Standard image High-resolution image

4.1. Clustering and degree correlations

Figure 3 shows the average clustering $\langle c\rangle $ of the empirical networks under consideration as compared to the randomized versions obtained by the GR model. We consider both values β and β0 (the corresponding networks are indicated by GR and GR0, respectively), and we include also a comparison with real network replicas generated by the S1 model [31] (see appendix A). As expected, GR networks show an average clustering practically identical to that of the original data, while GR0 networks present mild deviations, and differences are usually more important for S1 networks due to deviations in the obtained degrees. One exception to the preservation of clustering in GR instances is the Words data set. This empirical network has a β0 extremely close to the minimal threshold of β0 = 1 defined in hidden metric space network models. The β value necessary to ensure that the GR network has the same level of clustering as the empirical one cannot be achieved since it would need to be lower than 1. In general, an embedding value of β0 ≃ 1 suggests that clustering is due to finite size effects, since β0 = 1 corresponds to absence of clustering in the thermodynamic limit of the geometric network models.

Figure 3.

Figure 3. Average clustering $\langle c\rangle $ of empirical networks (blue), networks obtained from the GR (red) and S1 (light blue) models. GR networks obtained with β0 (green) are indicated as GR0. Error bars are calculated over 10 realizations of the GR and S1 models.

Standard image High-resolution image

Graphs on the top row of figure 4 show the clustering spectrum $c(k)$ for empirical networks and networks obtained by the GR and S1 models. In all cases, the functional form of c(k) is similar, a decreasing function of k with a broad tail. The clustering spectrum of the GR networks is always very close to the original data, while the S1 networks present important departures in some systems, as a result of the lack of preservation of the empirical degrees. This is especially evident for the S1 versions of the Music and Words networks, with the clustering spectrum much lower than that of the original data.

Figure 4.

Figure 4. Clustering c(k) (top) and average degree of nearest neighbors ${\bar{k}}_{{nn}}(k)$ (bottom) as a function of the degree, for empirical networks (dots), and networks obtained from the GR (continuous orange line) and S1 (black dashed line) models.

Standard image High-resolution image

On the other hand, the real networks under consideration are generally disassortative, as revealed by the decreasing form of the average degree of nearest neighbors, ${\bar{k}}_{{nn}}(k)$ function, figure 4 (bottom). Internet, Music and Words show a decay with power law form, while other data sets show milder degree correlations. In all cases, GR networks have ${\bar{k}}_{{nn}}(k)$ distributions very similar to the original data, while S1 networks exhibit strong deviations, with the exception of the Internet.

4.2. Community structure

So far, GR randomized versions of real and synthetic geometric networks seem to be able to preserve topological features beyond the degree distribution, including clustering and the average nearest neighbors degree. However, the GR randomization homogenizes the distribution of nodes in similarity space, while nodes in real networks are typically heterogeneously distributed, as they are more concentrated in some specific regions [35, 36]. This denotes the presence of communities of similar nodes, named soft communities [37]. Top row of figure 5 shows the representations of the empirical networks embedded in the hyperbolic plane, with coordinates (r, θ) (see appendix A for the relationship between r and the degree, and appendix C for references to the sources of the empirical maps). One can clearly see that the angular coordinates θ are heterogeneously distributed in [0, 2π]. A different perspective is shown in the bottom row in figure 5, displaying the probability density function P(θ) of the similarity coordinate of the nodes for the six empirical networks.

Figure 5.

Figure 5. Top row: empirical networks embedded in the hyperbolic disk. Distinct communities are indicated by different colors. Bottom row: probability distribution of the angular coordinate, P(θ), of the empirical networks.

Standard image High-resolution image

The heterogeneity of the angular coordinate can be quantified by performing a Kolmogorov–Smirnov (KS) test between the probability density functions P(θ) and PGR(θ). The KS statistic measures the difference between two probability distributions, and it is defined as the maximum difference between the values of the distributions P(θ) and PGR(θ). The larger the KS score, the more heterogeneous the angular distribution. Thus, it can be used to discard the null hypothesis that the empirical P(θ) and synthetic PGR(θ) samples (with uniform distribution by construction) present the same angular distribution. The KS distance DKS for empirical networks under consideration is reported in table 1. One can see that the null hypothesis is strongly rejected for all real networks.

Soft communities in the geometric domain can then be detected using geometric methods. We use the definition of soft communities given in [37], where they are defined as group of nodes in similarity space separated from the rest by two angular gaps that exceed a certain critical value, Δθc. The critical gap Δθc is calculated as the expected value of the largest gap between two nodes when the angular coordinates are distributed uniformly at random: Δθc ≃ 2π ln(N)/N. In the top row of figure 5, we highlight the soft community deterministic partition detected by the critical gap method in the real networks using different colors.

Next, we compare the community structure of the real networks with their randomized counterparts. To quantify their topological community structure, we apply the widely used Louvain method [45], aimed at maximizing the modularity Q ∈ [−1, 1], that compares the fraction of links inside communities with the expected fraction for a random distribution of edges with the same node degree distribution as the given network. Interestingly, figure 6(a) shows that in real networks, albeit the Louvain method identifies topological communities with higher modularity, the soft communities discovered by the CG display large Q values, in some cases (e.g. Metabolic or Music data sets) comparable to the modularities given by the purely topological LM.

Figure 6.

Figure 6. (a), (b) Modularity Q as detected by the Louvain method (purple) and the critical gap (yellow), for real (plot a) and GR (plot b)) networks. Error bars in plot (b) are obtained by 10 realizations of the GR model. (c) Normalized mutual information between the partition detected by the Louvain and the critical gap methods, for empirical (blue) and GR (red) networks. Error bars are obtained by 10 realizations of the GR model.

Standard image High-resolution image

This picture is completely different for GR networks, reported in figure 6(b). GR networks show strong community organization at the topological level, resulting in large values of Q as measured by the Louvain method, which is induced by structural constraints imposed by the geometric models [46]. However, as expected, the critical gap does not detect soft communities, as demonstrated by the non-significant values of the modularity, compatible with zero, over different realizations of the randomization process.

We study in more detail the relationship between soft communities and topological ones by comparing the partition obtained by the Louvain method with the partition generated by the critical gap. The overlap between the two partitions can be quantified by the normalized mutual information [47]. Figure 6(c) shows that the overlap between geometric and topological communities is quite large for real networks, specially for Metabolic and Internet data sets, meaning that communities identified by purely (deterministic) geometric methods are meaningful, though subject to the degree of congruency of the real network with the hidden metric space. On the contrary, figure 6(c) shows that the overlap between soft and topological communities in GR networks is very low due to the complete randomization of the angular coordinate operated by GR.

5. Conclusions

The rewiring process preserving degrees in the GR of real networks gives an alternative to their replication using directly the popularity-similarity model as a topology generator. The GR offers the advantage of avoiding the delicate task of estimating the hidden degree distribution, and it can be especially useful in problems responsive to fluctuations of the degree cutoff, like the behavior of some dynamical processes including epidemic spreading processes.

As a model, GR depends on a single parameter controlling the level of clustering in the resulting networks, so that the clustering coefficient of real networks can be chosen to be replicated or not. Interestingly, the discrepancies between hidden and observed degrees in embedded networks, have an effect on the clustering level achieved by the GR. In particular, the parameter value suggested by the embedding of the original data is, in general, not far but not totally coincident with the needed value for replicating the clustering coefficient of the original network. Our results also indicate that, in some networks, degree–degree correlations can only be replicated by the geometric network models if the observed degrees are preserved.

As a null model, GR can be used to investigate the relevance of geometric communities in real networks. Taken together, our results indicate that geometric communities are meaningful in the real networks analyzed here. At the same time, topological communities, like those detected in GR networks, are not always reliable and can be a result of constraints induced by the underlying geometric architecture. The fact that an underlying geometric organization imposes structural constraints on complex networks, which are strong enough for recreating detectable topological communities even in the absence of geometric ones, is an interesting subject by itself and will be investigated in future work.

Acknowledgments

We thank Marián Boguñá and Guillermo García-Pérez for helpful discussions. We acknowledge support from a James S McDonnell Foundation Scholar Award in Complex Systems; Ministerio de Ciencia, Innovación y Universidades of Spain project no. FIS2016-76830-C2-2-P (AEI/FEDER, UE); and the project Mapping Big Data Systems: embedding large complex networks in low-dimensional hidden metric spaces—Ayudas Fundación BBVA a Equipos de Investigación Científica 2017.

Appendix A.: The ${{ \mathcal S }}^{1}$ and ${{ \mathcal H }}^{2}$ models

In the ${{ \mathcal S }}^{1}$ model [31], every node is characterized by hidden degrees and angular coordinates (κi, θi) representing the popularity (related to the degrees), and similarity dimensions. The N nodes of the network are distributed at random in the similarity space, which is taken to be a one-dimensional sphere or circle of radius ${R}_{{{ \mathcal S }}^{1}}=N/2\pi ,$ adjusted to have a density of nodes equal to 1. Every pair of nodes is connected with a probability

Equation (A1)

where Δθij stands for the angular separation between nodes i and j in the similarity circle, and the parameters μ and β control the average degree of the network and the level of clustering, respectively.

There exists an isomorphism between the ${{ \mathcal S }}^{1}$ model and a version in hyperbolic space, the ${{ \mathcal H }}^{2}$ model [33], where the hidden degrees κ are transformed into a radial coordinate, r, in a hyperbolic disk of radius ${R}_{{{ \mathcal H }}^{2}}$ such that

Equation (A2)

Consequently, nodes closer to the center of the hyperbolic disk have a higher expected degree and every node i has then a radial and an angular coordinate $({r}_{i},{\theta }_{i})$. A link between two nodes i and j exists with a probability p(dij) that depends on their distance dij, measured in the hyperbolic hidden metric space, such that nodes with higher probabilities of being connected are closely positioned in that space. Therefore, the connection probability must be a decreasing function of distance between nodes and, specifically, it can be chosen to be

Equation (A3)

where the parameter β still controls the network's clustering coefficient. The distance dij in the hyperbolic plane is calculated using the hyperbolic law of cosines,

Equation (A4)

where ${\rm{\Delta }}{\theta }_{{ij}}$ is the minimum angular distance between nodes i and j.

To produce replicas of the real networks using the ${{ \mathcal S }}^{1}$ model, we extracted the parameters from the empirical networks, namely the size N and the exponent γ of the degree distribution, and used the exponent β0 given by the embedding of the network into the hyperbolic disk. In order to generate the hidden degree sequence P(κ) we adjusted parameter μ to obtain the observed average degree $\langle k\rangle $, see table 1.

Appendix B.: Likelihood maximization preserving degrees

To produce network topologies that are maximally congruent with the ${{ \mathcal S }}^{1}$ geometric network model one has to maximize the standard likelihood function defined in terms of the probability of connection

Equation (B.1)

where Δθij stands for the angular distance between nodes i and j, and the ${{ \mathcal S }}^{1}$ connection probability p(κi, κj, Δθij) reads

Equation (B.2)

Parameter μ depends on the observed average degree $\langle k\rangle $ of the network, and R is the radius of the circle (adjusted to have a density of nodes equal to 1, see appendix A) .

Due to the fact that during the rewiring process the hidden degrees are kept constant (independently of their values), the probability of swapping links between nodes i and j and between nodes l and m simply reads

Equation (B.3)

Therefore, as exposed by equation (B.3), the GR model is independent of the hidden degrees and relies in just a single free parameter, β, controlling the resulting level of clustering in the randomized network.

Appendix C.: Empirical data sets

US Commodities. This network represents the flows of goods and services exchanged (in USD) between industrial sectors in USA during year 2007. The hyperbolic embedding was obtained from [48].

Enron. It is the network of email messaging activity within employees from the Enron company. We use the network obtained in [43, 49] and the hyperbolic embedding constructed in [50].

Internet. This network consists of the connectivity data of the Internet at the autonomous systems level collected by the Archipelago project[44] during June 2009 and embedded in hyperbolic space in [34].

Human metabolic. This network is the one-mode projection of metabolites of the bipartite metabolic network of human cell metabolisms, as spatially embedded in [35].

Music. In this network nodes are chords-sets of musical notes played in a single beat and links represent observed transitions among them, see [41]. We use the hyperbolic embedding of a sparser and undirected version of such network as reconstructed in [50].

Words. This is the network of adjacency between words in the book 'The Origin of Species' by Darwin, see [51]. We use the embedding presented in [50].

Please wait… references are loading.