Full reconstruction of simplicial complexes from binary contagion and Ising data

Wang, Huan; Ma, Chuang; Chen, Han-Shuang; Lai, Ying-Cheng; Zhang, Hai-Feng

doi:10.1038/s41467-022-30706-9

Download PDF

Article
Open access
Published: 01 June 2022

Full reconstruction of simplicial complexes from binary contagion and Ising data

Nature Communications volume 13, Article number: 3043 (2022) Cite this article

6361 Accesses
30 Citations
4 Altmetric
Metrics details

Subjects

An Author Correction to this article was published on 27 June 2022

This article has been updated

Abstract

Previous efforts on data-based reconstruction focused on complex networks with pairwise or two-body interactions. There is a growing interest in networks with higher-order or many-body interactions, raising the need to reconstruct such networks based on observational data. We develop a general framework combining statistical inference and expectation maximization to fully reconstruct 2-simplicial complexes with two- and three-body interactions based on binary time-series data from two types of discrete-state dynamics. We further articulate a two-step scheme to improve the reconstruction accuracy while significantly reducing the computational load. Through synthetic and real-world 2-simplicial complexes, we validate the framework by showing that all the connections can be faithfully identified and the full topology of the 2-simplicial complexes can be inferred. The effects of noisy data or stochastic disturbance are studied, demonstrating the robustness of the proposed framework.

Simplicial complex entropy for time series analysis

Article Open access 20 December 2023

Hypergraph reconstruction from uncertain pairwise observations

Article Open access 04 December 2023

Autonomous inference of complex network dynamics from incomplete and noisy data

Article 24 March 2022

Introduction

In network science and engineering, a subfield of research is to find the network topology and nodal dynamical equations from data¹. This is important because networks are ubiquitous in the real world but the details of their connection topology and the intrinsic dynamical systems governing the properties and physical observables of the network are often unknown. The details are desired not only for understanding but also for protecting, disabling, or controlling the network dynamical behaviors (depending on the specific applications), and a viable way is to solve the inverse problem of determining the network details through observational data if they are available. As for any inverse problems in mathematics and physical sciences, the network inverse problem is challenging. Previous works in this area focused on “conventional” networks with pairwise interactions only^{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16}. Existing methods include those which are based on drive-response^3,5, adaptive synchronization^2,11, noise correlation^6,15, compressive sensing^7,9,17, maximum likelihood estimation^13,14,16, and Granger causality^4,8. The data can be from continuous- or discrete-time dynamical processes. For example, the drive-response and adaptive synchronization methods use data from continuous-time nonlinear coupled systems^2,3,5,11, while the maximum likelihood estimation method is suitable for data from discrete-time dynamics^13,14,16. In this paper, motivated by the fact that higher-order networks have become a state-of-the-art subfield of research in network science^{18,19,20,21,22,23,24}, we develop a reconstruction framework for finding from time-series data network topology with higher-order interactions.

While pairwise or node-to-node interactions are the familiar type in networks, it has been recognized that higher-order interactions are also ubiquitous and important. For example, in a social network, the collective recommendation of multiple friends can often be more persuasive than the recommendation of a single friend to convince the individual to buy a new product. In a rumor spreading process, a piece of false news is likely to be accepted by an individual if it is shared or promoted simultaneously by many people^25,26,27. A similar situation occurs in neuronal networks, where a firing event is often the result of excitatory and inhibitory interactions among many neurons. In all these cases, the interaction arises simultaneously among a group of nodes in the network, and to describe the network by the conventional pairwise interactions is no longer adequate²⁸: higher-order interactions beyond the pairwise relationship must be taken into account. Mathematically, higher-order interactions can be described as hypergraphs or simplicial complexes²⁹, i.e., networks containing higher-order simplexes. In particular, a k-simplex describes the simultaneous interaction among (k + 1) nodes, where a zero-simplex specifies an isolated node (i.e., without any interaction), a 1-simplex represents the conventional pairwise interaction, a 2-simplex underlies the simultaneous interaction among three nodes, and so on.

The past three years have witnessed a growing interest in higher-order networks³⁰. For example, random walks on hypergraphs were studied, where a walker chooses the next destination depending on the number and the size of the shared hyperedges³¹. A family of random walks on hypergraphs with a parameter controlling the bias of the dynamics towards hyperedges of small or large size was constructed and the impacts of walk strategy and walk time on community detection were elucidated³². The stability conditions of the general dynamical processes on hypergraphs were found¹⁸, and a social contagion model on hypergraphs was constructed which presents dynamical phenomena such as first- and second-order transitions, bistability and hysteresis³³. A simplicial model of social contagion was proposed and it was demonstrated that the reinforcement mechanisms in 2-simplex can lead to a discontinuous phase transition³⁴. The impacts of the heterogeneity of simplicial complexes on the SIS (susceptible-infected-susceptible) spreading model with collective and individual contagion were analyzed³⁵, and a pair approximation theory to study the SIS dynamics in simplicial complexes was developed, which was argued to be more accurate than the Markov-chain and mean field methods³⁶. A social communication model including idea integration and information transmission in simplicial complexes was proposed and the critical condition leading to the outbreak of information was identified³⁷. A simplicial activity driven model was proposed and the impact of both simplicial and temporally evolving interactions were analyzed³⁸. In terms of network reconstruction, a statistical method to detect higher-order interactions from network data of pairwise links has recently been developed²¹.

In this paper, we develop a framework to reconstruct complex networks with higher-order interactions from data. To be concrete, we focus on networks with 2-simplexes and assume that the dynamical processes on the network are social contagion and simplicial Ising dynamics that generate binary time-series data. Our method is of the statistical inference type pivoted on maximum likelihood estimation, with the aim to fully reconstruct both pairwise interactions (links) and 2-simplexes at the same time, thereby distinguishing our work from the recent method based on link data²¹. In particular, the central task is to estimate the probabilities of each node connecting to the reconstruction or target node (pairwise interaction) and of any two nodes forming a three-body 2-simplex with the target node. We articulate a two-step process to greatly enhance the computational efficiency and an effective truncation process to determine the final reconstructed structure of the simplicial complex. Using three synthetic and four real-world simplicial complexes, we demonstrate the accuracy of our reconstruction method and establish its robustness with respect to variations in the average degree of the network and stochastic fluctuations. Our work represents an initial effort in reconstructing complex networks with higher-order interactions based on observed time-series data.

Results

Simplicial complexes

A k-simplex σ is formed by a filled clique of a set of k + 1 nodes $\left[{v}_{0},\cdots \,,{v}_{k}\right]$, which defines a (k + 1)-body interaction³⁹. A 1-simplex is two nodes connected by an edge, a 2-simplex is three nodes connected pairwisely by edges and with an additional single face, i.e., a triangle, and a 3-simplex is four vertices connected pairwisely by edges and joined by four faces, which are filled in to form a solid tetrahedron, and so on. A simplicial complex ${{{{{{{\mathcal{K}}}}}}}}$ composed of a set of nodes ${{{{{{{\mathcal{V}}}}}}}}$ is a collection of simplexes, with the additional requirement^39,40 that if a simplex is in ${{{{{{{\mathcal{K}}}}}}}}$ ($\sigma \,\in\, {{{{{{{\mathcal{K}}}}}}}}$), then any simplex ϱ composed of subsets of simplex σ should also be included in ${{{{{{{\mathcal{K}}}}}}}}$. For example, a 2-simplicial complex ${{{{{{{\mathcal{K}}}}}}}}$ is a collection of 0-, 1- and 2-simplexes.

Social contagion dynamics

Peer influence and reinforcement mechanisms are ubiquitous in the dynamical process of social contagion⁴¹, from which higher-order interactions in the network are originated. A social contagion model taking reinforcement into account on 2-simplicial complexes was proposed³⁴, which exploits the SIS type of spreading dynamics with binary-state dynamical variables. In particular, let ${S}_{i}^{t}$ be the state of node i at time t. Each node has two possible states: susceptible (${S}_{i}^{t}\,=\,0$) or infected (${S}_{i}^{t}\,=\,1$). At the initial time, a fraction ρ₀ of nodes is infected. A susceptible node i can get infection from an infected neighbor j through their pairwise interaction (i, j) with probability β₁. Node i can also be infected through a 2-simplex (i, j, k), where both j and k have already been infected, with the probability β₂, and this event can be understood as a synergistic reinforcement effect. For convenience, we set β₁ = α/k₁ and β₂ = ω/k₂, where α and ω are two non-zero positive constants, k₁ and k₂ are the average degrees of the two-body and three-body interactions in a 2-simplicial complex, respectively. In general, we have α < ω so as to ensure that the role of 2-simplex is embodied in the spreading dynamics. Each infected node recovers to the susceptible state with the probability μ. In our work, the values of β₁ and β₂ are selected near their respective thresholds to facilitate efficient and accurate reconstruction of 2-simplicial complexes. The effects of varying the values of β₁ and β₂ on the reconstruction accuracy are also studied (Sec. I of Supplementary Information (SI)).

Simplicial Ising dynamics

The Ising model arises in many fields due to its fundamental role in phase transitions in statistical physics. It has also been applied to many social systems^42,43. While the Ising dynamics on networks have been extensively studied^44,45,46, previous studies were exclusively conducted for networks with pairwise interactions only. To our knowledge, Ising dynamics on networks with higher-order interactions have not been studied.

To address the synergistic reinforcement effect of 2-simplex, we define a simplicial Ising dynamics on 2-simplicial complexes. Each node has two possible states: spin-down (${S}_{i}^{t}\,=\,-1$) or spin-up (${S}_{i}^{t}\,=\,+1$). At the initial time, the state of each node i is randomly assigned as +1 or −1 with equal probability. Defining the Hamiltonian as:

$$H(t)\,=\,-{J}_{1}\mathop{\sum}\limits_{(i,j)}{S}_{i}^{t}{S}_{j}^{t}\,-\,{J}_{2}\mathop{\sum}\limits_{(i,j,k)}{S}_{i}^{t}{S}_{j}^{t}{S}_{k}^{t},$$

(1)

where J₁ and J₂ are the strengths of two-body and three-body interactions, and (i, j) and (i, j, k) denote the two-body and the three-body connections in the 2-simplicial complex, respectively. The first term in the Hamiltonian characterizes the interaction between the edges (i.e., two-body connections) and the second term contains three-body interactions from the 2-simplex. At each time step, the spin-flipping probability of each node i is given by ${(1\,+\,{e}^{\delta {{\Delta }}{E}_{i}^{t}})}^{-1}$, where δ is the inverse temperature. The quantity

$${{\Delta }}{E}_{i}^{t}\,=\,2{J}_{1}\mathop{\sum}\limits_{(i,j)\,\in\, {\partial }_{i}}{S}_{i}^{t}{S}_{j}^{t}\,+\,2{J}_{2}\mathop{\sum}\limits_{(i,j,k)\,\in\, {\nabla }_{i}}{S}_{i}^{t}{S}_{j}^{t}{S}_{k}^{t}$$

represents the change in the energy caused by a flipping of node i at time t, where ∂_i and ∇_i are the 1-simplex set and the 2-simplex set containing node i, respectively.

Statistical inference framework

For SIS and Ising processes taking place on a 2-simplicial complex of size N, the available time-series data representing the states of nodes at different time steps can be stored in a data matrix S, where each row is a time string representing all nodes’ states at that time step and each column is a node’s state at different time steps. We reconstruct 2-simplicial complexes from the data matrix S with our statistical inference framework. This task consists of three steps: (1) establishing the likelihood function based on the available data matrix S; (2) obtaining the connection probabilities of two- and three-body interactions by maximizing the likelihood function according to the idea of the expectation maximization (EM) method, and (3) executing an improved two-step reconstruction strategy to significantly increase the computational efficiency. The details of the framework are described in Methods.

Quantification of reconstruction performance

We use F1 score to quantify the reconstruction accuracy⁴⁷, a global performance indicator defined as

$${{{{{{{\rm{F}}}}}}}}1\,=\,\frac{2P\,*\, R}{P\,+\,R},$$

(2)

where P = TP/(TP + FP) and R = TP/(TP + FN), with TP, FP, TN, FN being the numbers of true positive, false positive, true negative and false negative classes, respectively. A larger value of F1 corresponds to a higher accuracy and F1 = 1 indicates that the original network structure has been fully reconstructed with zero error.

Reconstructing synthetic and real-world simplicial complexes

For readability, the results from the social contagion dynamics are presented in the main text, while those from the simplicial Ising dynamics are presented in Sec. III of SI.

Figure 1 presents results of reconstructing three synthetic 2-simplicial complexes (see Sec. “Construction of synthetic and real-world 2-simplicial complexes” in Methods on how these networks are constructed), where squares, diamonds and circles denote the performance of reconstructing the two-body connections while triangles with different orientations demonstrate the performance of reconstructing three-body connections. Several features can be seen from Fig. 1. First, the reconstruction accuracy increases with the length T of the time series and can reach the unity value for T ≳ 8000. Second, the average degrees k₁ and k₂ of two-body and three-body simplexes, respectively, have different effects on the reconstruction accuracy. In particular, as shown in Fig. 1a–c, a small value of k₁ tends to increase the reconstruction accuracy of both types of simplexes. This can be understood by noting that a small value of k₁ means that there are fewer two-body connections that need to be reconstructed, thereby enhancing the accuracy of the two-body connections for the same length of the time series. At the same time, fewer two-body connections reduce the complexity in reconstructing three-body connections and thereby improving the reconstruction accuracy. Regarding the effects of k₂, Fig. 1d–f reveal that its value affects only the reconstruction accuracy of three-body connections and has little effect on the accuracy of reconstructing two-body connections that have no dependence on the three-body connections in a 2-simplicial complex. Third, the reconstruction accuracy of three-body interactions is lower than that of two-body interactions owing to the complicated structure of former and its dependence on the latter.

**Fig. 1: Reconstruction performance for synthetic 2-simplicial complexes.**

Figure 2 shows the results of reconstructing four real-world 2-simplicial complexes: Hypertext2009⁴⁸, Thiers12⁴⁹, InVS15⁵⁰, and LyonSchool^51,52 (see Sec. “Construction of synthetic and real-world 2-simplicial complexes” in Methods for the details of these real-world networks). The basic parameters of these 2-simplicial complexes constructed from the datasets are listed in Table 1. It can be seen from Fig. 2 that, as for the real-world networks, the reconstruction accuracies for both the two-body and three-body interactions increase with the length of the time series. Remarkably, these network structures are quite irregular, complicating the reconstruction. Nonetheless, for T = 20,000, the F1 score can exceed 80%.

**Fig. 2: Visualization and reconstruction performance for real-world 2-simplicial complexes.**

Table 1 Basic parameters of the four 2-simplicial complexes constructed from real-world datasets.

Full size table

An issue of practical significance is the robustness of our reconstruction framework against random perturbations. To address this issue, we randomly flip a fraction f of infected states and the same number of susceptible states in the data matrix S (see Sec. “Details of the statistical inference framework” in Methods) and investigate the effect of f on the reconstruction accuracy as characterized by F1. The results are shown in Fig. 3 for three synthetic 2-simplicial complexes and three real-world 2-simplicial complexes. It can be seen that increasing the fraction f of flipping leads to a reduction of F1. In particular, the value of F1 for the two-body connections can be as high as 50% even when 30% of the infected states have been flipped (f = 0.3), attesting to the robustness of our framework in reconstructing pairwise links against stochastic fluctuations in the data.

**Fig. 3: Effect of random flipping ratio f on reconstruction performance for synthetic and real-world simplicial complexes.**

Discussion

To find the network structure from observational data has been an active research field for more than fifteen years¹. In previous studies, the term “network structure” is largely referred to as the collection of pairwise connections as characterized by the adjacency matrix of the network. Since the goal is to figure out whether there is a link between any two nodes, the existing methods focused on measures that are suitable for ascertaining the “two-body” interactions, such as those based on pairwise correlation or synchronization. From the beginning of modern network science and engineering slightly over two decades ago, networks with only pairwise connections have represented the standard setting of study. Likewise, the inverse problem of data-based discovery of the network structure has been exclusively carried out in this setting. To our knowledge, in the current literature, the problem of finding higher-order connections in complex networks from time-series data has not been addressed.

Higher-order interactions are nonetheless ubiquitous in complex networks and its importance has been gradually recognized with an accumulating interest, eventually generating an explosive growth of research recently^{18,19,20,21,22,23,24}. The structure of networks with higher-order connections, also known as simplicial complexes, are represented by tensors of high orders. For example, three-body interactions or 2-simplexes in a network can be described by a tensor of rank 3. Structurally, simplicial complexes are significantly more sophisticated than the conventional networks with pairwise links only, and richer dynamics can be anticipated in the former, which have begun to be studied. From the point of view of inverse problem, to reconstruct simplicial complexes from time-series data represents a great challenge.

We have taken the first step to address this inverse problem. Focusing on complex networks with 2-simplexes, we have developed a statistical inference framework by which all two-body and three-body interactions in the network can be found simultaneously from binary time-series data only, i.e., no prior knowledge about the network to be reconstructed is required. The backbone of our reconstruction framework is maximum likelihood estimation that yields the probabilities of all the possible pairwise and three-body connections and a criterion to associate the probabilities with the actual interactions. To significantly increase the computational efficiency, we have proposed and tested a two-step process and a truncation process to determine the true structure of the simplicial complexes. The reconstruction framework has withstood tests on synthetic and real-world simplicial complexes with respect to accuracy and robustness against random fluctuations.

Many open problems remain. For example, our reconstruction framework is formulated in terms of binary time-series data from social contagion dynamics and simplicial Ising dynamics (Sec. III of SI). How to reconstruct higher-order networks from data generated by different dynamical processes needs to be studied. Also, our statistical inference method is developed for 2-simplicial complexes that are perhaps the “simplest” network structure beyond the conventional networks with pairwise interactions. To reconstruct networks with higher-order interactions such as 3-simplicial complexes and hypergraphs is worth pursuing. It is also necessary to develop methods to improve the reconstruction accuracy with shorter time series. We hope our work will stimulate further research in this emerging subfield of data-based reconstruction of complex networks with higher-order interactions.

Methods

Details of the statistical inference framework

We describe the details of our statistical inference framework through an illustrative example, as shown in Fig. 4, where a 2-simplicial complex with N = 30 nodes and its data matrix are illustrated in Fig. 4a, b, respectively. For such a network hosting SIS dynamics, the probability of a susceptible node i (i.e., ${S}_{i}^{t}\,=\,0$) being infected (i.e., ${S}_{i}^{t\,+\,1}\,=\,1$) is determined only by the infected neighbors and the infected 2-simplexes in which two other nodes in the 2-simplex are both infected, at time t. The transition probability from the infected state to the susceptible state does not depend on the states of the neighbors, so it is only necessary to consider the transition probability from the susceptible state to the infected state for constructing the network. We stress that the details of the dynamical process, such as the infection probabilities β₁ and β₂ as well as the recovery probability μ, are assumed to be unknown but only the binary time series of the nodal states are available. Figure 4 presents an illustrative example to describe the details of our method.

**Fig. 4: Schematic illustration of reconstructing a 2-simplicial complex based on binary social contagion.**

Establishing the likelihood function

Let j → i denote the event that node j has a direct impact on the state of node i. For example, node j can directly spread the virus or send a piece of information to node i, which means that node j is one of immediate neighbors of node i. Nodes i and j thus form a 1-simplex, a property that is independent of time t. Similarly, let jk → i denote the event that the synergistic reinforcement effect coming from nodes j and k has a direct impact on the state of node i, which is also independent of time. In the following, we determine the probabilities of node i and node j being connected and of three nodes i, j, k forming a three-body connection (i, j, k).

The conditional probability of ${S}_{i}^{t\,+\,1}\,=\,1$ and j → i given ${S}_{j}^{t}\,=\,1$ and ${S}_{i}^{t}\,=\,0$ can be written as

$$ {{{{{{{\rm{P}}}}}}}}\left({S}_{i}^{t\,+\,1}\,=\,1,j\to i\left|{S}_{i}^{t}\,=\,0,{S}_{j}^{t}\,=\,1\right.\right)\\ \,=\, {{{{{{{\rm{P}}}}}}}}\left(j\to i\left|{S}_{i}^{t}\,=\,0,{S}_{j}^{t}\,=\,1,{S}_{i}^{t\,+\,1}\,=\,1\right.\right)\,*\, {{{{{{{\rm{P}}}}}}}}\left({S}_{i}^{t\,+\,1}\,=\,1\left|{S}_{i}^{t}\,=\,0,{S}_{j}^{t}\,=\,1\right.\right)\,\\ =\, {{{{{{{{\rm{P}}}}}}}}}_{j\to i}{{{{{{{{\rm{P}}}}}}}}}_{j}^{i},$$

(3)

where ${{{{{{{{\rm{P}}}}}}}}}_{j\,\to\, i}\triangleq {{{{{{{\rm{P}}}}}}}}(j\to i|{S}_{i}^{t}\,=\,0,{S}_{j}^{t}\,=\,1,{S}_{i}^{t\,+\,1}\,=\,1)$ is the probability of node i being infected by node j under the conditions ${S}_{i}^{t}\,=\,0$, ${S}_{j}^{t}\,=\,1$ and ${S}_{i}^{t\,+\,1}\,=\,1$, P_j→i > 0 indicates that node j is a neighbor of node i, and ${{{{{{{{\rm{P}}}}}}}}}_{j}^{i}\triangleq {{{{{{{\rm{P}}}}}}}}({S}_{i}^{t\,+\,1}\,=\,1|{S}_{i}^{t}\,=\,0,{S}_{j}^{t}\,=\,1)$ is the probability of ${S}_{i}^{t\,+\,1}\,=\,1$ under the conditions ${S}_{i}^{t}\,=\,0$ and ${S}_{j}^{t}\,=\,1$, which can be estimated from the data matrix S. Take the matrix in Fig. 4b as an example and suppose we wish to estimate the value of ${{{{{{{{\rm{P}}}}}}}}}_{7}^{13}$, where nodes 13 (i.e., node 13 is the target node) and 7 are highlighted by red and green frames, respectively. It is necessary to extract each pair including the time string with ${S}_{13}^{t}\,=\,0$ and its next time strings at t + 1. It can be seen that seven pairs of time strings can be extracted: (t, t + 1), (t + 1, t + 2), (t + 2, t + 3), (t + 3, t + 4), (t + 4, t + 5), (t + 6, t + 7), and (t + 8, t + 9). It can also be seen that two time moments: t + 1 and t + 8, satisfy the conditions that node 13 is in the susceptible state and node 7 is in the infected state. The only time at which node 13 can be infected at the next time step is t + 8. As a result, we have ${{{{{{{{\rm{P}}}}}}}}}_{7}^{13}\,=\,1/2$.

Similarly, the conditional probability of ${S}_{i}^{t+1}=1$ and jk → i given ${S}_{j}^{t}{S}_{k}^{t}\,=\,1$ and ${S}_{i}^{t}\,=\,0$ can be written as

$$ {{{{{{{\rm{P}}}}}}}}\left({S}_{i}^{t\,+\,1}\,=\,1,jk\to i\left|{S}_{i}^{t}\,=\,0,{S}_{j}^{t}{S}_{k}^{t}\,=\,1\right.\right)\\ \, =\, {{{{{{{\rm{P}}}}}}}}\left(jk\to i\left|{S}_{i}^{t}\,=\,0,{S}_{j}^{t}{S}_{k}^{t}\,=\,1\right.,{S}_{i}^{t\,+\,1}\,=\,1\right)\,*\, {{{{{{{\rm{P}}}}}}}}\left({S}_{i}^{t\,+\,1}\,=\,1\left|{S}_{i}^{t}\,=\,0,{S}_{j}^{t}{S}_{k}^{t}\,=\,1\right.\right)\,\\ =\, {{{{{{{{\rm{P}}}}}}}}}_{jk\to i}{{{{{{{{\rm{P}}}}}}}}}_{jk}^{i},$$

(4)

where ${{{{{{{{\rm{P}}}}}}}}}_{jk\to i}\,\triangleq\, {{{{{{{\rm{P}}}}}}}}(jk\to i|{S}_{i}^{t}\,=\,0,{S}_{j}^{t}{S}_{k}^{t}\,=\,1,{S}_{i}^{t\,+\,1}\,=\,1)$ is the probability of node i being infected through the synergistic interaction from nodes j and k, under the conditions ${S}_{i}^{t}\,=\,0$, ${S}_{j}^{t}{S}_{k}^{t}\,=\,1$, and ${S}_{i}^{t\,+\,1}\,=\,1$, and P_jk→i > 0 indicates that the three nodes i, j, k form a 2-simplex. The probability ${{{{{{{{\rm{P}}}}}}}}}_{jk}^{i}\,\triangleq\, ({S}_{i}^{t\,+\,1}\,=\,1|{S}_{i}^{t}\,=\,0,{S}_{j}^{t}{S}_{k}^{t}\,=\,1)$ can be estimated from the data matrix S in a similar way. Again, take the three nodes 13, 28, and 30 in Fig. 4b as an example. It can be seen that the time instants at which ${S}_{13}^{t}\,=\,0$, ${S}_{28}^{t}\,=\,1$ and ${S}_{30}^{t}\,=\,1$ are fulfilled are t + 6 and t + 8. Because ${S}_{13}^{t\,+\,7}\,=\,1$ and ${S}_{13}^{t\,+\,9}\,=\,1$, we have ${{{{{{{{\rm{P}}}}}}}}}_{28,30}^{13}\,=\,1$.

According to Eqs. 3, 4, the expected number of susceptible node i being infected at t_m + 1 is given by

$${{{{{{{{\rm{E}}}}}}}}}_{i}^{{t}_{m}\,+\,1}\, = \mathop{\sum}\limits_{j\left(j\,\ne\, i\right)}{{{{{{{\rm{P}}}}}}}}\left({S}_{i}^{{t}_{m}\,+\,1}\,=\,1,j\,\to\, i\left|{S}_{i}^{{t}_{m}}\,=\,0,{S}_{j}^{{t}_{m}}\,=\,1\right.\right){{{\Psi }}}_{j}^{{t}_{m}}\\ \quad\ +\,\mathop{\sum}\limits_{j,k\left(j\,\ne\, k\,\ne\, i\right)}{{{{{{{\rm{P}}}}}}}}\left({S}_{i}^{{t}_{m}\,+\,1}\,=\,1,jk\,\to\, i\left| {S}_{i}^{{t}_{m}}\,=\,0,\,\right.\left.{S}_{j}^{{t}_{m}}{S}_{k}^{{t}_{m}}\,=\,1\right.\right){{{\Psi }}}_{jk}^{{t}_{m}}\,+\,{\varepsilon }_{i}\\ \, = \mathop{\sum}\limits_{j\left(j\,\ne\, i\right)}{{{{{{{{\rm{P}}}}}}}}}_{j\,\to\, i}{{{{{{{{\rm{P}}}}}}}}}_{j}^{i}{{{\Psi }}}_{j}^{{t}_{m}}\,+\,\mathop{\sum}\limits_{j,k\left(j\,\ne\, k\,\ne\, i\right)}{{{{{{{{\rm{P}}}}}}}}}_{jk\,\to\, i}{{{{{{{{\rm{P}}}}}}}}}_{jk}^{i}{{{\Psi }}}_{jk}^{{t}_{m}}\,+\,{\varepsilon }_{i},$$

(5)

where ${{{\Psi }}}_{j}^{{t}_{m}}$ represents the events that nodes j are infected at time t_m, similarly, ${{{\Psi }}}_{jk}^{{t}_{m}}$ is the events that both nodes j and k are infected at time t_m, and their values are zero or one. For example, if ${{{\Psi }}}_{j}^{{t}_{m}}\,=\,1$, it means that node j is infected at time t_m; otherwise, ${{{\Psi }}}_{j}^{{t}_{m}}\,=\,0$ when it is not infected at time t_m. The quantity ε_i represents the noise due to the errors from the collected data.

In general, the probability of a given number of events occurring in a fixed interval of time is characterized by the Poisson distribution, so we use it to capture the random nature of the times that node i is infected. An advantage of the Poisson distribution is that it makes a mathematical analysis and computations with the EM algorithm feasible^53,54,55,56. Specifically, the likelihood function can be described as

$${{{{{{{\rm{P}}}}}}}}\left({\left\{{{{\Psi }}}_{i}^{{t}_{m}\,+\,1}\right\}}_{m \,=\, 1,\cdots ,M}\left|{{\Theta }},{\left\{{{{\Psi }}}_{j}^{{t}_{m}}\right\}}_{m \,=\, 1,\cdots ,M;j \,=\, 1,\cdots ,N}\right.\right)\,=\,\mathop{\prod}\limits_{m\left({{{\Psi }}}_{i}^{{t}_{m}}\,=\,0\right)}\frac{{e}^{-{{{{{{{{\rm{E}}}}}}}}}_{i}^{{t}_{m}\,+\,1}}{\left({{{{{{{{\rm{E}}}}}}}}}_{i}^{{t}_{m}\,+\,1}\right)}^{{{{\Psi }}}_{i}^{{t}_{m}\,+\,1}}}{{{{\Psi }}}_{i}^{{t}_{m}\,+\,1}!},$$

(6)

where Θ denotes the set of variables P_j→i, P_jk→i and ε_i. We have ${{{\Psi }}}_{i}^{{t}_{m}\,+\,1}!\,\equiv\, 1$ since ${{{\Psi }}}_{i}^{{t}_{m}\,+\,1}$ is either zero or one.

Maximizing the likelihood function based on EM algorithm

We next use the EM method to maximize the likelihood function⁵⁷ for determining the parameter Θ in Eq. 6. Taking the logarithm form of Eq. 6, we get

$$L\left({{\Theta }}\right)\,=\,\mathop{\sum}\limits_{m\left({{{\Psi }}}_{i}^{{t}_{m}}\,=\,0\right)}\left({{{\Psi }}}_{i}^{{t}_{m}\,+\,1}\log {{{{{{{{\rm{E}}}}}}}}}_{i}^{{t}_{m}\,+\,1}\,-\,{{{{{{{{\rm{E}}}}}}}}}_{i}^{{t}_{m}\,+\,1}\right)\,=\,\mathop{\sum}\limits_{m\left({{{\Psi }}}_{i}^{{t}_{m}}\,=\,0\right)}\left[\begin{array}{l}{{{\Psi }}}_{i}^{{t}_{m}\,+\,1}\log \left(\mathop{\sum}\limits_{j\left(j\,\ne\, i\right)}{{{{{{{{\rm{P}}}}}}}}}_{j\,\to\, i}{{{{{{{{\rm{P}}}}}}}}}_{j}^{i}{{{\Psi }}}_{j}^{{t}_{m}}\right.\\ \left.+\mathop{\sum}\limits_{j,k\left(j\ne k\ne i\right)}{{{{{{{{\rm{P}}}}}}}}}_{jk\,\to\, i}{{{{{{{{\rm{P}}}}}}}}}_{jk}^{i}{{{\Psi }}}_{jk}^{{t}_{m}}\,+\,{\varepsilon }_{i}\right)\\ \, -\,\left(\mathop{\sum}\limits_{j\left(j\,\ne\, i\right)}{{{{{{{{\rm{P}}}}}}}}}_{j\,\to\, i}{{{{{{{{\rm{P}}}}}}}}}_{j}^{i}{{{\Psi }}}_{j}^{{t}_{m}}\right.\\ \left.\,+\,\mathop{\sum}\limits_{j,k\left(j\,\ne\, k\,\ne\, i\right)}{{{{{{{{\rm{P}}}}}}}}}_{jk\,\to\, i}{{{{{{{{\rm{P}}}}}}}}}_{jk}^{i}{{{\Psi }}}_{jk}^{{t}_{m}}\,+\,{\varepsilon }_{i}\right)\end{array}\right].$$

(7)

Applying the Jensen’s inequality to the logarithmic term on the right side of Eq. 7 yields

$$ \log \left(\mathop{\sum}\limits_{j\left(j\ne i\right)}{{{{{{{{\rm{P}}}}}}}}}_{j\to i}{{{{{{{{\rm{P}}}}}}}}}_{j}^{i}{{{\Psi }}}_{j}^{{t}_{m}}+\mathop{\sum}\limits_{j,k\left(j\ne k\ne i\right)}{{{{{{{{\rm{P}}}}}}}}}_{jk\to i}{{{{{{{{\rm{P}}}}}}}}}_{jk}^{i}{{{\Psi }}}_{jk}^{{t}_{m}}+{\varepsilon }_{i}\right)\\ =\log \left(\mathop{\sum}\limits_{j\left(j\ne i\right)}{\rho }_{j}^{{t}_{m}}\frac{{{{{{{{{\rm{P}}}}}}}}}_{j\to i}{{{{{{{{\rm{P}}}}}}}}}_{j}^{i}{{{\Psi }}}_{j}^{{t}_{m}}}{{\rho }_{j}^{{t}_{m}}}\right.\left.+\mathop{\sum}\limits_{j,k\left(j\ne k\ne i\right)}{\rho }_{jk}^{{t}_{m}}\frac{{{{{{{{{\rm{P}}}}}}}}}_{jk\to i}{{{{{{{{\rm{P}}}}}}}}}_{jk}^{i}{{{\Psi }}}_{jk}^{{t}_{m}}}{{\rho }_{jk}^{{t}_{m}}}+{\rho }_{{\varepsilon }_{i}}^{{t}_{m}}\frac{{\varepsilon }_{i}}{{\rho }_{{\varepsilon }_{i}}^{{t}_{m}}}\right)\\ \ge \mathop{\sum}\limits_{j\left(j\ne i\right)}{\rho }_{j}^{{t}_{m}}\log \frac{{{{{{{{{\rm{P}}}}}}}}}_{j\to i}{{{{{{{{\rm{P}}}}}}}}}_{j}^{i}{{{\Psi }}}_{j}^{{t}_{m}}}{{\rho }_{j}^{{t}_{m}}}+\mathop{\sum}\limits_{j,k\left(j\ne k\ne i\right)}{\rho }_{jk}^{{t}_{m}}\log \frac{{{{{{{{{\rm{P}}}}}}}}}_{jk\to i}{{{{{{{{\rm{P}}}}}}}}}_{jk}^{i}{{{\Psi }}}_{jk}^{{t}_{m}}}{{\rho }_{jk}^{{t}_{m}}}+{\rho }_{{\varepsilon }_{i}}^{{t}_{m}}\log \frac{{\varepsilon }_{i}}{{\rho }_{{\varepsilon }_{i}}^{{t}_{m}}}\\ =\mathop{\sum}\limits_{j\left(j\ne i\right)}{\rho }_{j}^{{t}_{m}}\log \left({{{{{{{{\rm{P}}}}}}}}}_{j\to i}{{{{{{{{\rm{P}}}}}}}}}_{j}^{i}{{{\Psi }}}_{j}^{{t}_{m}}\right)+\mathop{\sum}\limits_{j,k\left(j\ne k\ne i\right)}{\rho }_{jk}^{{t}_{m}}\log \left({{{{{{{{\rm{P}}}}}}}}}_{jk\to i}{{{{{{{{\rm{P}}}}}}}}}_{jk}^{i}{{{\Psi }}}_{jk}^{{t}_{m}}\right)\\ +{\rho }_{{\varepsilon }_{i}}^{{t}_{m}}\log {\varepsilon }_{i}-\mathop{\sum}\limits_{j\left(j\ne i\right)}{\rho }_{j}^{{t}_{m}}\log {\rho }_{j}^{{t}_{m}}-\mathop{\sum}\limits_{j,k\left(j\ne k\ne i\right)}{\rho }_{jk}^{{t}_{m}}\log {\rho }_{jk}^{{t}_{m}}-{\rho }_{{\varepsilon }_{i}}^{{t}_{m}}\log {\rho }_{{\varepsilon }_{i}}^{{t}_{m}},$$

(8)

where the equality holds if

$${\rho }_{j}^{{t}_{m}}\,=\,\frac{{{{{{{{{\rm{P}}}}}}}}}_{j\,\to\, i}{{{{{{{{\rm{P}}}}}}}}}_{j}^{i}{{{\Psi }}}_{j}^{{t}_{m}}}{{\varepsilon }_{i}\,+\,\left(\begin{array}{l}\mathop{\sum}\limits_{{j}^{\prime}\left({j}^{\prime}\,\ne\, i\right)}{{{{{{{{\rm{P}}}}}}}}}_{{j}^{\prime}\,\to\, i}{{{{{{{{\rm{P}}}}}}}}}_{{j}^{\prime}}^{i}{{{\Psi }}}_{{j}^{\prime}}^{{t}_{m}}\,+\,\\ \mathop{\sum}\limits_{{j}^{\prime},{k}^{\prime}\left({j}^{\prime}\,\ne\, {k}^{\prime}\,\ne\, i\right)}{{{{{{{{\rm{P}}}}}}}}}_{{j}^{\prime}{k}^{\prime}\,\to\, i}{{{{{{{{\rm{P}}}}}}}}}_{{j}^{\prime}{k}^{\prime}}^{i}{{{\Psi }}}_{{j}^{\prime}{k}^{\prime}}^{{t}_{m}}\end{array}\right)},$$

(9)

$${\rho }_{jk}^{{t}_{m}}\,=\,\frac{{{{{{{{{\rm{P}}}}}}}}}_{jk\,\to\, i}{{{{{{{{\rm{P}}}}}}}}}_{jk}^{i}{{{\Psi }}}_{jk}^{{t}_{m}}}{{\varepsilon }_{i}\,+\,\left(\begin{array}{l}\mathop{\sum}\limits_{{j}^{\prime}\left({j}^{\prime}\,\ne\, i\right)}{{{{{{{{\rm{P}}}}}}}}}_{{j}^{\prime}\,\to\, i}{{{{{{{{\rm{P}}}}}}}}}_{{j}^{\prime}}^{i}{{{\Psi }}}_{{j}^{\prime}}^{{t}_{m}}\,+\,\\ \mathop{\sum}\limits_{{j}^{\prime},{k}^{\prime}\left({j}^{\prime}\ne {k}^{\prime}\,\ne\, i\right)}{{{{{{{{\rm{P}}}}}}}}}_{{j}^{\prime}{k}^{\prime}\,\to\, i}{{{{{{{{\rm{P}}}}}}}}}_{{j}^{\prime}{k}^{\prime}}^{i}{{{\Psi }}}_{{j}^{\prime}{k}^{\prime}}^{{t}_{m}}\end{array}\right)},$$

(10)

and

$${\rho }_{{\varepsilon }_{i}}^{{t}_{m}}\,=\,\frac{{\varepsilon }_{i}}{{\varepsilon }_{i}\,+\,\left(\begin{array}{l}\mathop{\sum}\limits_{{j}^{\prime}\left({j}^{\prime}\,\ne\, i\right)}{{{{{{{{\rm{P}}}}}}}}}_{{j}^{\prime}\,\to\, i}{{{{{{{{\rm{P}}}}}}}}}_{{j}^{\prime}}^{i}{{{\Psi }}}_{{j}^{\prime}}^{{t}_{m}}\,+\,\\ \mathop{\sum}\limits_{{j}^{\prime},{k}^{\prime}\left({j}^{\prime}\,\ne\, {k}^{\prime}\,\ne\, i\right)}{{{{{{{{\rm{P}}}}}}}}}_{{j}^{\prime}{k}^{\prime}\,\to\, i}{{{{{{{{\rm{P}}}}}}}}}_{{j}^{\prime}{k}^{\prime}}^{i}{{{\Psi }}}_{{j}^{\prime}{k}^{\prime}}^{{t}_{m}}\end{array}\right)}.$$

(11)

The maximization problem of Eq. 7 can then be transformed into maximizing the following equation:

$$\hat{L}\left({{\Theta }}\right)\, = \,\mathop{\sum}\limits_{m\left({{{\Psi }}}_{i}^{{t}_{m}}\,=\,0\right)}\mathop{\sum}\limits_{j\left(j\,\ne\, i\right)}\left({{{\Psi }}}_{i}^{{t}_{m}\,+\,1}{\rho }_{j}^{{t}_{m}}\log \left({{{{{{{{\rm{P}}}}}}}}}_{j\,\to\, i}{{{{{{{{\rm{P}}}}}}}}}_{j}^{i}{{{\Psi }}}_{j}^{{t}_{m}}\right)\right.\left.\,-\,{{{\Psi }}}_{i}^{{t}_{m}\,+\,1}{\rho }_{j}^{{t}_{m}}\log {\rho }_{j}^{{t}_{m}}\,-\,{{{{{{{{\rm{P}}}}}}}}}_{j\,\to\, i}{{{{{{{{\rm{P}}}}}}}}}_{j}^{i}{{{\Psi }}}_{j}^{{t}_{m}}\right)\\ \,+\,\mathop{\sum}\limits_{m\left({{{\Psi }}}_{i}^{{t}_{m}}\,=\,0\right)}\mathop{\sum}\limits_{j,k\left(j\,\ne\, k\,\ne\, i\right)}\left({{{\Psi }}}_{i}^{{t}_{m}\,+\,1}{\rho }_{jk}^{{t}_{m}}\log \left({{{{{{{{\rm{P}}}}}}}}}_{jk\,\to\, i}{{{{{{{{\rm{P}}}}}}}}}_{jk}^{i}{{{\Psi }}}_{jk}^{{t}_{m}}\right)\right.\left.\,-\,{{{\Psi }}}_{i}^{{t}_{m}\,+\,1}{\rho }_{jk}^{{t}_{m}}\log {\rho }_{jk}^{{t}_{m}}\,-\,{{{{{{{{\rm{P}}}}}}}}}_{jk\,\to\, i}{{{{{{{{\rm{P}}}}}}}}}_{jk}^{i}{{{\Psi }}}_{jk}^{{t}_{m}}\right)\\ \,+\,\mathop{\sum}\limits_{m\left({{{\Psi }}}_{i}^{{t}_{m}}\,=\,0\right)}\left({{{\Psi }}}_{i}^{{t}_{m}\,+\,1}{\rho }_{{\varepsilon }_{i}}^{{t}_{m}}\log {\varepsilon }_{i}\,-\,{{{\Psi }}}_{i}^{{t}_{m}\,+\,1}{\rho }_{{\varepsilon }_{i}}^{{t}_{m}}\log {\rho }_{{\varepsilon }_{i}}^{{t}_{m}}\,-\,{\varepsilon }_{i}\right).$$

(12)

Calculating the partial derivative of $\hat{L}\left({{\Theta }}\right)$ with respect to P_j→i, P_jk→i and ε_i and setting them to zero, we get

$$\frac{\partial \hat{L}\left({{\Theta }}\right)}{\partial {{{{{{{{\rm{P}}}}}}}}}_{j\,\to\, i}}\,=\,\mathop{\sum}\limits_{m\left({{{\Psi }}}_{i}^{{t}_{m}}\,=\,0\right)}\left(\frac{{{{\Psi }}}_{i}^{{t}_{m}\,+\,1}{\rho }_{j}^{{t}_{m}}}{{{{{{{{{\rm{P}}}}}}}}}_{j\,\to\, i}}\,-\,{{{{{{{{\rm{P}}}}}}}}}_{j}^{i}{{{\Psi }}}_{j}^{{t}_{m}}\right)\,=\,0,$$

(13)

$$\frac{\partial \hat{L}\left({{\Theta }}\right)}{\partial {{{{{{{{\rm{P}}}}}}}}}_{jk\,\to\, i}}\,=\,\mathop{\sum}\limits_{m\left({{{\Psi }}}_{i}^{{t}_{m}}\,=\,0\right)}\left(\frac{{{{\Psi }}}_{i}^{{t}_{m}\,+\,1}{\rho }_{jk}^{{t}_{m}}}{{{{{{{{{\rm{P}}}}}}}}}_{jk\,\to\, i}}\,-\,{{{{{{{{\rm{P}}}}}}}}}_{jk}^{i}{{{\Psi }}}_{jk}^{{t}_{m}}\right)\,=\,0,$$

(14)

$$\frac{\partial \hat{L}\left({{\Theta }}\right)}{\partial {\varepsilon }_{i}}\,=\,\mathop{\sum}\limits_{m\left({{{\Psi }}}_{i}^{{t}_{m}}\,=\,0\right)}\left(\frac{{{{\Psi }}}_{i}^{{t}_{m}\,+\,1}{\rho }_{{\varepsilon }_{i}}^{{t}_{m}}}{{\varepsilon }_{i}}\,-\,1\right)\,=\,0,$$

(15)

which give

$${{{{{{{{\rm{P}}}}}}}}}_{j\,\to\, i}\,=\,\frac{\mathop{\sum}\limits_{m\left({{{\Psi }}}_{i}^{{t}_{m}}\,=\,0\right)}\left({{{\Psi }}}_{i}^{{t}_{m}\,+\,1}{\rho }_{j}^{{t}_{m}}\right)}{\mathop{\sum}\limits_{m\left({{{\Psi }}}_{i}^{{t}_{m}}\,=\,0\right)}\left({{{{{{{{\rm{P}}}}}}}}}_{j}^{i}{{{\Psi }}}_{j}^{{t}_{m}}\right)},$$

(16)

$${{{{{{{{\rm{P}}}}}}}}}_{jk\,\to\, i}\,=\,\frac{\mathop{\sum}\limits_{m\left({{{\Psi }}}_{i}^{{t}_{m}}\,=\,0\right)}\left({{{\Psi }}}_{i}^{{t}_{m}\,+\,1}{\rho }_{jk}^{{t}_{m}}\right)}{\mathop{\sum}\limits_{m\left({{{\Psi }}}_{i}^{{t}_{m}}\,=\,0\right)}\left({{{{{{{{\rm{P}}}}}}}}}_{jk}^{i}{{{\Psi }}}_{jk}^{{t}_{m}}\right)},$$

(17)

$${\varepsilon }_{i}\,=\,\frac{\mathop{\sum}\limits_{m\left({{{\Psi }}}_{i}^{{t}_{m}}\,=\,0\right)}\left({{{\Psi }}}_{i}^{{t}_{m}\,+\,1}{\rho }_{{\varepsilon }_{i}}^{{t}_{m}}\right)}{\mathop{\sum}\limits_{m\left({{{\Psi }}}_{i}^{{t}_{m}}\,=\,0\right)}\left(1\right)}.$$

(18)

The six equations Eqs. 9–11 and Eqs. 16–18 can be used to solve P_j→i, P_jk→i and ε_i. In particular, by initializing all values of ${{{{{{{{\rm{P}}}}}}}}}_{j\,\to\, i}{{{{{{{\rm{,}}}}}}}}{{{{{{{{\rm{P}}}}}}}}}_{jk\,\to\, i}{{{{{{{\rm{,}}}}}}}}{\varepsilon }_{i}\left(\forall j\,\ne\, k\,\ne\, i\right)$ to be one and then calculating the values of ${\rho }_{j}^{{t}_{m}}$, ${\rho }_{jk}^{{t}_{m}}$ and ${\rho }_{{\varepsilon }_{i}}^{{t}_{m}}$ in Eqs. 9–11, we substitute them into Eqs. 16–18 to find the values of P_j→i, P_jk→i, and ε_i. We repeat this process until convergence is achieved. Since a single iterative process does not ensure global optimization, we carry out the above iteration process a number of times and choose the proper values that give the maximum of the quantity in Eq. 12.

As an example, as shown in Fig. 4c, the values of P_j→13 and P_jk→13 are given according to this iteration process, where P_j→13 > 0 and the top 10 values of P_jk→13 are demonstrated. Similarly, all the values of P_j→i and P_jk→i can be calculated for each node i. As presented in Fig. 4d, each column above the abscissa corresponds to the predicted 1-simplex probabilities (the left subgraph of Fig. 4d) and 2-simplex probabilities (the right subgraph of Fig. 4d) of a node, and the blue and red dots denote the actual and nonexistent two-body or three-body connections, respectively.

An improved two-step reconstruction strategy

For a 2-simplicial complex structure with N nodes, when predicting the 2-simplexes of a node i, we randomly choose two nodes (e.g., j and k) and calculate the probability P_jk→i, which requires calculating $\left(\genfrac{}{}{0.0pt}{}{N-1}{2}\right)$ values. To reduce the computational load and increase the reconstruction accuracy, we articulate an improved two-step strategy. The particularity of simplicial complexes stipulates that the other two nodes forming a 2-simplex with node i must be the neighbors of node i, so it is not necessary to calculate the probability P_jk→i if node j or node k is not a neighbor of node i. The reconstruction process can then be divided into two steps. At the first step, the “approximate” neighborhood of each node is predicted and their corresponding columns in the data matrix S are extracted, leading to a compressed data matrix. At the second step, based on the compressed data matrix, the values of P_j→i and P_jk→i for each node i are predicted by iterating Eqs. 9–11 and 16–18. Our two-step method was not designed for the general challenging task of consistently inferring all the subfaces for arbitrarily higher-order simplices. In fact, our method requires the closure condition of simplicial complexes: it is necessary to know in advance that the network under reconstruction is a 2-simplicial complex. Given this premise, the two-step strategy infers first the two-body and then the three-body interactions (i.e., 2-simplex) from the inferred two-body interactions. While the two-step method is efficient to reconstruct 2-simplicial complexes, at the present it cannot be used to reconstruct the hypergraphs because its second step is to find the triangles from the neighbors (i.e., edges).

For the first step, the predicted neighbors are not accurate because the three-body interactions have been ignored. In fact, the main purpose of this step is to determine an approximate range of neighbors to reduce the time for calculating ${{{{{{{{\rm{P}}}}}}}}}_{jk\,\to\, i}\left(\forall j\,\ne\, k\,\ne\, i\right)$. Without taking into account three-body interactions, the expected number of susceptible nodes being infected at t_m + 1 can simply be expressed as

$${\tilde{{{{{{{{\rm{E}}}}}}}}}}_{i}^{{t}_{m}\,+\,1}\, =\,\mathop{\sum}\limits_{j\left(j\,\ne\, i\right)}{{{{{{{\rm{P}}}}}}}}\left({S}_{i}^{{t}_{m}\,+\,1}\,=\,1,j\,\to\, i\left|{S}_{i}^{{t}_{m}}\,=\,0,{S}_{j}^{{t}_{m}}\,=\,1\right.\right)\,*\, {{{\Psi }}}_{j}^{{t}_{m}}\,+\,{\varepsilon }_{i}\,\\ =\,\mathop{\sum}\limits_{j\left(j\,\ne\, i\right)}{{{{{{{{\rm{P}}}}}}}}}_{j\,\to\, i}^{0}{{{{{{{{\rm{P}}}}}}}}}_{j}^{i}{{{\Psi }}}_{j}^{{t}_{m}}\,+\,{\varepsilon }_{i},$$

(19)

where the notation ${{{{{{{{\rm{P}}}}}}}}}_{j\to i}^{0}$ is used to emphasize that node j is only an “approximate” neighbor of node i. Assuming that the number Ψ_i of times of node i being infected in each time period obeys the Poisson distribution, we obtain the likelihood function as

$$ {{{{{{{\rm{P}}}}}}}} \left({\left\{{{{\Psi }}}_{i}^{{t}_{m}\,+\,1}\right\}}_{m \,=\, 1,\cdots ,M}\left|\tilde{{{\Theta }}},{\left\{{{{\Psi }}}_{j}^{{t}_{m}}\right\}}_{m \,=\, 1,\cdots ,M; \,j \,=\, 1,\cdots ,N}\right.\right)\\ =\,\mathop{\prod}\limits_{m\left({{{\Psi }}}_{i}^{{t}_{m}}\,=\,0\right)}\frac{{e}^{-{\tilde{{{{{{{{\rm{E}}}}}}}}}}_{i}^{{t}_{m}\,+\,1}}{\left({\tilde{{{{{{{{\rm{E}}}}}}}}}}_{i}^{{t}_{m}\,+\,1}\right)}^{{{{\Psi }}}_{i}^{{t}_{m}\,+\,1}}}{{{{\Psi }}}_{i}^{{t}_{m}\,+\,1}!},$$

(20)

where $\tilde{{{\Theta }}}$ denotes the set of variables ${{{{{{{{\rm{P}}}}}}}}}_{j\to i}^{0}$ and ε_i. Taking the logarithm of Eq. 20, we have

$$L\left(\tilde{{{\Theta }}}\right)\,=\,\mathop{\sum}\limits_{m\left({{{\Psi }}}_{i}^{{t}_{m}}\,=\,0\right)}\left({{{\Psi }}}_{i}^{{t}_{m}\,+\,1}\log {\tilde{{{{{{{{\rm{E}}}}}}}}}}_{i}^{{t}_{m}\,+\,1}\,-\,{\tilde{{{{{{{{\rm{E}}}}}}}}}}_{i}^{{t}_{m}\,+\,1}\right)\,=\,\mathop{\sum}\limits_{m\left({{{\Psi }}}_{i}^{{t}_{m}}\,=\,0\right)}\left[\begin{array}{l}{{{\Psi }}}_{i}^{{t}_{m}\,+\,1}\log \left(\mathop{\sum}\limits_{j\left(j\,\ne\, i\right)}{{{{{{{{\rm{P}}}}}}}}}_{j\to i}^{0}{{{{{{{{\rm{P}}}}}}}}}_{j}^{i}{{{\Psi }}}_{j}^{{t}_{m}}\,+\,{\varepsilon }_{i}\right)\\ \,-\,\left(\mathop{\sum}\limits_{j\left(j\,\ne\, i\right)}{{{{{{{{\rm{P}}}}}}}}}_{j\,\to\, i}^{0}{{{{{{{{\rm{P}}}}}}}}}_{j}^{i}{{{\Psi }}}_{j}^{{t}_{m}}\,+\,{\varepsilon }_{i}\right)\end{array}\right].$$

(21)

Using the EM method to maximize the likelihood function, we obtain the final parameters $\tilde{{{\Theta }}}$ as

$${{{{{{{{\rm{P}}}}}}}}}_{j\,\to\, i}^{0}\,=\,\frac{\mathop{\sum}\limits_{m\left({{{\Psi }}}_{i}^{{t}_{m}}\,=\,0\right)}\left({{{\Psi }}}_{i}^{{t}_{m}\,+\,1}{\rho }_{j}^{{t}_{m}}\right)}{\mathop{\sum}\limits_{m\left({{{\Psi }}}_{i}^{{t}_{m}}\,=\,0\right)}\left({{{{{{{{\rm{P}}}}}}}}}_{j}^{i}{{{\Psi }}}_{j}^{{t}_{m}}\right)},$$

(22)

$${\varepsilon }_{i}\,=\,\frac{\mathop{\sum}\limits_{m\left({{{\Psi }}}_{i}^{{t}_{m}}\,=\,0\right)}\left({{{\Psi }}}_{i}^{{t}_{m}\,+\,1}{\rho }_{{\varepsilon }_{i}}^{{t}_{m}}\right)}{\mathop{\sum}\limits_{m\left({{{\Psi }}}_{i}^{{t}_{m}}\,=\,0\right)}\left(1\right)},$$

(23)

where

$${\rho }_{j}^{{t}_{m}}\,=\,\frac{{{{{{{{{\rm{P}}}}}}}}}_{j\,\to\, i}^{0}{{{{{{{{\rm{P}}}}}}}}}_{j}^{i}{{{\Psi }}}_{j}^{{t}_{m}}}{\mathop{\sum}\limits_{{j}^{\prime}\left({j}^{\prime}\,\ne\, i\right)}{{{{{{{{\rm{P}}}}}}}}}_{{j}^{\prime}\,\to\, i}^{0}{{{{{{{{\rm{P}}}}}}}}}_{{j}^{\prime}}^{i}{{{\Psi }}}_{{j}^{\prime}}^{{t}_{m}}\,+\,{\varepsilon }_{i}},$$

(24)

$${\rho }_{{\varepsilon }_{i}}^{{t}_{m}}\,=\,\frac{{\varepsilon }_{i}}{\mathop{\sum}\limits_{{j}^{\prime}\left({j}^{\prime}\,\ne\, i\right)}{{{{{{{{\rm{P}}}}}}}}}_{{j}^{\prime}\,\to\, i}^{0}{{{{{{{{\rm{P}}}}}}}}}_{{j}^{\prime}}^{i}{{{\Psi }}}_{{j}^{\prime}}^{{t}_{m}}\,+\,{\varepsilon }_{i}}.$$

(25)

With the initial conditions for ${{{{{{{{\rm{P}}}}}}}}}_{j\to i}^{0}$ and ε_i, the values of ${{{{{{{{\rm{P}}}}}}}}}_{j\to i}^{0}$ and ε_i can be obtained by iterating Eqs. 22–25 until convergence is achieved. It is worth noting that ${{{{{{{{\rm{P}}}}}}}}}_{j\to i}^{0}$ is a probability and we need to determine the “approximate” neighbors of the node under reconstruction. Theoretically, the “approximate” neighbors can be determined by testing whether ${{{{{{{{\rm{P}}}}}}}}}_{j\to i}^{0}$ is non-zero. However, practically this is not feasible due to noise or deviations from the assumptions. For example, as shown in Fig. 4f, nodes 6 and 14 are not neighbors of node 13 even though ${{{{{{{{\rm{P}}}}}}}}}_{6\to 13}^{0}\,=\,0.0002$ and ${{{{{{{{\rm{P}}}}}}}}}_{14\to 13}^{0}\,=\,0.0006$. To overcome this difficulty, we articulate a truncation method for determining the neighbors of node i, as follows.

First, note that the time complexity of the second step can be significantly reduced when fewer neighbors are predicted, but too few predicted neighbors can lead to missing neighbors. On the contrary, too many neighbors would increase the time complexity. A solution is to use a reasonable truncation to determine the “approximate” neighbors of each node. To this end, we re-rank the probability ${{{{{{{{\rm{P}}}}}}}}}_{j\,\to\, i}^{0}\left(\forall j\,\ne\, i\right)$ in a descending order and place a threshold Δ_i in the maximum gap defined as¹⁴:

$${{{\Delta}} _i} \,=\, \mathop{{{{{\mathrm{arg}}}}}\,{{{{{\mathrm{max}}}}}}}\limits_l \left[ {\frac{{{{{{{{\rm{P}}}}}}}_l^{'}}}{{{{{{{{\rm{P}}}}}}}_{l \,+\, 1}^{'}}}\left( {{{{{{{\rm{P}}}}}}}_l^{'} \,-\, {{{{{{\rm{P}}}}}}}_{l \,+\, 1}^{'}} \right)} \right].$$

(26)

Next, we use Eq. 26 again to find a new threshold ${\bar{{{\Delta }}}}_{i}$ which is smaller than Δ_i. Finally, node j is regarded as an “approximate” neighbor of node i if ${{{{{{{{\rm{P}}}}}}}}}_{j\,\to\, i}^{0} \, > \, {\bar{{{\Delta }}}}_{i}$. The truncation method can ensure the detection of all real neighbors and 2-simplexes.

Once the “approximate” neighbors of node i have been obtained, the time series of these neighbors can be extracted (Fig. 4f, g). The neighbors of node i and its 2-simplexes can be quickly re-predicted based on the second step, i.e., by iterating Eqs. 9–11 and Eqs. 16–18 based on the compressed data matrix. For example, the prediction results for node 13 are shown in Fig. 4h and the values of ${{{{{{{{\rm{P}}}}}}}}}_{j\,\to\, i}\left(\forall j\,\ne\, i\right)$ and ${{{{{{{{\rm{P}}}}}}}}}_{jk\,\to\, i}\left(\forall j\,\ne\, k\,\ne\, i\right)$ for each node are presented in Fig. 4i. The actual two- and three-body connections of each node can then be determined based on the results in Fig. 4i. Because the identification of two-body connections has been refined in the second step, we simply assume that node j is a neighbor of node i if P_j→i > 0. Following previous work^14,58, we assume that nodes i and j are connected when P_j→i > 0 or P_i→j > 0.

The case of three-body interactions is more complicated and the solution is sensitive to noise or errors. In fact, using the condition P_jk→i > 0 as a criterion to detect (i, j, k) as a 2-simplex can lead to many false positives. Our solution is to re-rank ${{{{{{{{\rm{P}}}}}}}}}_{jk\,\to\, i}\left(\forall j\,\ne\, k\,\ne\, i\right)$ in a descending order and obtain a new threshold ${\hat{{{\Delta }}}}_{i}$ by using Eq. 26 again. As a result, an actual 2-simplex (i, j, k) is formed when ${{{{{{{{\rm{P}}}}}}}}}_{jk\,\to\, i}\,\ge\, {\hat{{{\Delta }}}}_{i}$. To remove the conflicts in the prediction, we assume that there exists a 2-simplex (i, j, k) when two of three conditions hold at least, e.g., ${{{{{{{{\rm{P}}}}}}}}}_{jk\,\to\, i}\,\ge\, {\hat{{{\Delta }}}}_{i}$, ${{{{{{{{\rm{P}}}}}}}}}_{ik\,\to\, j}\,\ge\, {\hat{{{\Delta }}}}_{j}$, and ${{{{{{{{\rm{P}}}}}}}}}_{ij\,\to\, k} \, < \, {\hat{{{\Delta }}}}_{k}$, but a three-body cannot form when ${{{{{{{{\rm{P}}}}}}}}}_{jk\,\to\, i}\,\ge\, {\hat{{{\Delta }}}}_{i}$, ${{{{{{{{\rm{P}}}}}}}}}_{ik\,\to\, j} \, < \, {\hat{{{\Delta }}}}_{j}$, and ${{{{{{{{\rm{P}}}}}}}}}_{ij\,\to\, k} \, < \, {\hat{{{\Delta }}}}_{k}$. Implementing the two-step strategy, we can reconstruct the whole 2-simplicial complexes. As shown in Fig. 4a, the 2-simplicial complex has been accurately reconstructed. Overall, the two-step strategy not only greatly reduces the computational time but also significantly improves the reconstruction accuracy (more details in Sec. II in SI).

Construction of synthetic and real-world 2-simplicial complexes

Synthetic 2-simplicial complexes

Here we describe the main steps of constructing synthetic 2-simplicial complexes of size N, average degrees of two-body and three-body interactions k₁ and k₂, respectively.

Random simplicial complex (ERSC)

First, a random graph is generated by connecting any two nodes with the probability p₁. We then add 2-simplexes between any three nodes with the probability p₂, where the formulas of p₁ and p₂ are³⁴:

$${p}_{1}\,=\,\frac{{k}_{1}\,-\,2{k}_{2}}{\left(N\,-\,1\right)\,-\,2{k}_{2}},$$

(27)

$${p}_{2}\,=\,\frac{2{k}_{2}}{\left(N\,-\,1\right)\left(N\,-\,2\right)}.$$

(28)

A random 2-simplicial complex with the specified average degrees can then be constructed using the probabilities p₁ and p₂.

Scale-free simplicial complex (SFSC)

First, a scale-free network is generated, in which each new node connects m edges to the old nodes with degree preference⁵⁹. We then add 2-simplexes between any three nodes according to probability p₂ in Eq. 28. The average degree of 1-simplexes can be calculated as

$${k}_{1}\,=\,2m\,+\,2{k}_{2}\left(1\,-\,\frac{2m}{N}\right).$$

(29)

Small-world simplicial complex (SWSC)

First, a small-world network⁶⁰ is generated from a regular lattice (all the nodes have the same degree 2m) with rewiring probability p. We then add 2-simplexes between any three nodes according to probability p₂ in Eq. 28. The average degree of 1-simplexes is given by Eq. 29.

2-simplicial complexes from real-world data

In each real-world data set, the face-to-face interactions have been measured with a temporal resolution of 20 s. First, we generate a weighted network according to the data, where a weight represents the number of interactions between a pair of nodes in the whole time window. Second, we remove any link whose weight is less than a given threshold ζ and set the weights of retained links to one to generate an unweighted network. Finally, we cut the data into multiple segments with a temporal window of 5 min and record all the 2-simplexes. In particular, if three nodes communicate with each other in a short time, they are regarded as constituting a three-body connection. We record the frequencies of the 2-simplexes in each segment. According to the total frequency in all segments, we retain the first 50% of the 2-simplexes with the highest frequencies and count them as the actual 2-simplexes. The visualization of four real-world 2-simplicial complexes is shown in Fig. 2a–d.

Data availability

The SocioPatterns datasets were downloaded from http://www.sociopatterns.org/datasets/⁶¹. The source data generated in this study have been deposited on GitHub at: https://github.com/HuanWang2022/reconstruct_simplicial_complex.

Code availability

The code and datasets are available at: https://github.com/HuanWang2022/reconstruct_simplicial_complex⁶².

Change history

27 June 2022
A Correction to this paper has been published: https://doi.org/10.1038/s41467-022-31470-6

References

Wang, W.-X., Lai, Y.-C. & Grebogi, C. Data based identification and prediction of nonlinear and complex dynamical systems. Phys. Rep. 644, 1–76 (2016).
Article MathSciNet MATH ADS Google Scholar
Yu, D., Righero, M. & Kocarev, L. Estimating topology of networks. Phys. Rev. Lett. 97, 188701 (2006).
Article PubMed ADS Google Scholar
Timme, M. Revealing network connectivity from response dynamics. Phys. Rev. Lett. 98, 224101 (2007).
Article PubMed ADS Google Scholar
Marinazzo, D., Pellicoro, M. & Stramaglia, S. Kernel method for nonlinear granger causality. Phys. Rev. Lett. 100, 144103 (2008).
Article PubMed ADS Google Scholar
Yu, D. & Parlitz, U. Inferring local dynamics and connectivity of spatially extended systems with long-range links based on steady-state stabilization. Phys. Rev. E 82, 026108 (2010).
Article MathSciNet ADS Google Scholar
Ren, J., Wang, W.-X., Li, B. & Lai, Y.-C. Noise bridges dynamical correlation and topology in coupled oscillator networks. Phys. Rev. Lett. 104, 058701 (2010).
Article PubMed ADS Google Scholar
Wang, W.-X., Yang, R., Lai, Y.-C., Kovanis, V. & Harrison, M. A. F. Time-series based prediction of complex oscillator networks via compressive sensing. EPL 94, 48006 (2011).
Article ADS Google Scholar
Wu, X., Wang, W. & Zheng, W. X. Inferring topologies of complex networks with hidden variables. Phys. Rev. E 86, 046106 (2012).
Article ADS Google Scholar
Shen, Z., Wang, W.-X., Fan, Y., Di, Z. & Lai, Y.-C. Reconstructing propagation networks with natural diversity and identifying hidden sources. Nat. Commun. 5, 1–10 (2014).
Article ADS Google Scholar
Timme, M. & Casadiego, J. Revealing networks from dynamics: an introduction. J. Phys. A Math. Theo. 47, 343001 (2014).
Article MathSciNet MATH Google Scholar
Wu, X., Zhao, X., Lü, J., Tang, L. & Lu, J.-A. Identifying topologies of complex dynamical networks with stochastic perturbations. IEEE Trans. Control Net. Sys. 3, 379–389 (2015).
Article MathSciNet MATH Google Scholar
Casadiego, J., Nitzan, M., Hallerberg, S. & Timme, M. Model-free inference of direct network interactions from nonlinear collective dynamics. Nat. Commun. 8, 1–10 (2017).
Article CAS Google Scholar
Xiang, B.-B., Ma, C., Chen, H.-S. & Zhang, H.-F. Reconstructing signed networks via Ising dynamics. Chaos 28, 123117 (2018).
Article MathSciNet PubMed ADS Google Scholar
Ma, C., Chen, H.-S., Lai, Y.-C. & Zhang, H.-F. Statistical inference approach to structural reconstruction of complex networks from binary time series. Phys. Rev. E 97, 022301 (2018).
Article CAS PubMed ADS Google Scholar
Zhang, Z., Chen, Y., Mi, Y. & Hu, G. Reconstruction of dynamic networks with time-delayed interactions in the presence of fast-varying noises. Phys. Rev. E 99, 042311 (2019).
Article CAS PubMed ADS Google Scholar
Ma, C., Chen, H.-S., Li, X., Lai, Y.-C. & Zhang, H.-F. Data based reconstruction of duplex networks. SIAM J. Appl. Dyn. Sys. 19, 124–150 (2020).
Article MathSciNet MATH Google Scholar
Candes, E. J. & Tao, T. Near-optimal signal recovery from random projections: Universal encoding strategies? IEEE Trans. Inf. Theor. 52, 5406–5425 (2006).
Article MathSciNet MATH Google Scholar
de Arruda, G. F., Tizzani, M. & Moreno, Y. Phase transitions and stability of dynamical processes on hypergraphs. Commun. Phys. 4, 24 (2021).
Article Google Scholar
Skardal, P. S. & Arenas, A. Higher order interactions in complex networks of phase oscillators promote abrupt synchronization switching. Commun. Phys. 3, 218 (2021).
Article Google Scholar
Burgio, G., Arenas, A., Gómez, S. & Matamalas, J. T. Network clique cover approximation to analyze complex contagions through group interactions. Commun. Phys. 4, 111 (2021).
Article Google Scholar
Young, J.-G., Petri, G. & Peixoto, T. P. Hypergraph reconstruction from network data. Commun. Phys. 4, 135 (2021).
Article Google Scholar
Eriksson, A., Edler, D., Rojas, A., Domenico, M. D. & Rosvall, M. How choosing random-walk model and network representation matters for flow-based community detection in hypergraphs. Commun. Phys. 4, 133 (2021).
Article Google Scholar
Kovalenko, K. et al. Growing scale-free simplices. Commun. Phys. 4, 43 (2021).
Article Google Scholar
Ghorbanchian, R., Restrepo, J. G., Torres, J. J. & Bianconi, G. Higher-order simplicial synchronization of coupled topological signals. Commun. Phys. 4, 120 (2021).
Article Google Scholar
Watts, D. J. & Dodds, P. S. Influentials, networks, and public opinion formation. J. Consum. Res. 34, 441–458 (2007).
Article Google Scholar
Chierichetti, F., Lattanzi, S. & Panconesi, A. Rumor spreading in social networks. Theo. Comp. Sci. 412, 2602–2610 (2011).
Article MathSciNet MATH Google Scholar
Lehmann, S. & Ahn, Y.-Y. Complex Spreading Phenomena in Social Systems (Springer, 2018).
Zhang, Z.-K. et al. Dynamics of information diffusion and its applications on complex networks. Phys. Rep. 651, 1–34 (2016).
Article MathSciNet ADS Google Scholar
Battiston, F. et al. Networks beyond pairwise interactions: structure and dynamics. Phys. Rep. 874, 1–92 (2020).
Article MathSciNet MATH ADS Google Scholar
Battiston, F. et al. The physics of higher-order interactions in complex systems. Nat. Phys. 17, 1093–1098 (2021).
Article CAS Google Scholar
Carletti, T., Battiston, F., Cencetti, G. & Fanelli, D. Random walks on hypergraphs. Phys. Rev. E 101, 022308 (2020).
Article MathSciNet CAS PubMed ADS Google Scholar
Carletti, T., Fanelli, D. & Lambiotte, R. Random walks and community detection in hypergraphs. J. Phys. Complex. 2, 015011 (2021).
Article ADS Google Scholar
de Arruda, G. F., Petri, G. & Moreno, Y. Social contagion models on hypergraphs. Phys. Rev. Research 2, 023032 (2020).
Article ADS Google Scholar
Iacopini, I., Petri, G., Barrat, A. & Latora, V. Simplicial models of social contagion. Nat. Commun. 10, 1–9 (2019).
Article Google Scholar
Landry, N. W. & Restrepo, J. G. The effect of heterogeneity on hypergraph contagion models. Chaos 30, 103117 (2020).
Article MathSciNet CAS PubMed PubMed Central MATH ADS Google Scholar
Matamalas, J. T., Gómez, S. & Arenas, A. Abrupt phase transition of epidemic spreading in simplicial complexes. Phys. Rev. Res. 2, 012049 (2020).
Article CAS Google Scholar
Wang, D., Zhao, Y., Leng, H. & Small, M. A social communication model based on simplicial complexes. Phys. Lett. A 384, 126895 (2020).
Article MathSciNet CAS MATH Google Scholar
Petri, G. & Barrat, A. Simplicial activity driven model. Phys. Rev. Lett. 121, 228301 (2018).
Article CAS PubMed ADS Google Scholar
Hatcher, A. Algebraic Topology (Cambridge University Press, 2005).
Zhao, Y. & Maletić, S. Simplicial Complexes in Complex Systems: In Search for Alternatives (World Scientific, 2021).
Dakin, R. & Ryder, T. B. Dynamic network partnerships and social contagion drive cooperation. Proc. R. Soc. B 285, 20181973 (2018).
Article PubMed PubMed Central Google Scholar
Cipra, B. A. An introduction to the Ising model. Am. Ma. Mon. 94, 937–959 (1987).
Article MathSciNet Google Scholar
Stauffer, D. Social applications of two-dimensional Ising models. Am. J. Phys 76, 470–473 (2008).
Article ADS Google Scholar
Aleksiejuk, A., Hołyst, J. A. & Stauffer, D. Ferromagnetic phase transition in Barabási–Albert networks. Physica A 310, 260–266 (2002).
Article MATH ADS Google Scholar
Bianconi, G. Mean field solution of the Ising model on a barabási–albert network. Phys. Letts. A 303, 166–168 (2002).
Article CAS MATH ADS Google Scholar
Biswas, S. & Sen, P. Effect of the nature of randomness on quenching dynamics of the Ising model on complex networks. Phy. Rev. E 84, 066107 (2011).
Article ADS Google Scholar
Powers, D. M. Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. J. Mach. Learn. Tech. 2, 37–63 (2011).
Google Scholar
Isella, L. et al. What’s in a crowd? Analysis of face-to-face behavioral networks. J. Theo. Biol. 271, 166–180 (2011).
Article MathSciNet MATH ADS Google Scholar
Fournet, J. & Barrat, A. Contact patterns among high school students. PloS One 9, e107878 (2014).
Article PubMed PubMed Central ADS Google Scholar
Génois, M. & Barrat, A. Can co-location be used as a proxy for face-to-face contacts? EPJ Data Sci. 7, 11 (2018).
Article Google Scholar
Gemmetto, V., Barrat, A. & Cattuto, C. Mitigation of infectious disease at school: targeted class closure vs school closure. BMC Infect. Dis. 14, 1–10 (2014).
Article Google Scholar
Stehlé, J. et al. High-resolution measurements of face-to-face contact patterns in a primary school. PloS ONE 6, e23176 (2011).
Article PubMed PubMed Central ADS Google Scholar
Karrer, B. & Newman, M. E. Stochastic blockmodels and community structure in networks. Phys. Rev. E 83, 016107 (2011).
Article MathSciNet ADS Google Scholar
Newman, M. E. Equivalence between modularity optimization and maximum likelihood methods for community detection. Phys. Rev. E 94, 052315 (2016).
Article CAS PubMed ADS Google Scholar
Newman, M. E. & Reinert, G. Estimating the number of communities in a network. Phys. Rev. Lett. 117, 078301 (2016).
Article CAS PubMed ADS Google Scholar
De Bacco, C., Power, E. A., Larremore, D. B. & Moore, C. Community detection, link prediction, and layer interdependence in multilayer networks. Phys. Rev. E 95, 042317 (2017).
Article PubMed ADS Google Scholar
Dempster, A. P., Laird, N. M. & Rubin, D. B. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B Method. 39, 1–22 (1977).
MathSciNet MATH Google Scholar
Ma, L., Han, X., Shen, Z., Wang, W.-X. & Di, Z. Efficient reconstruction of heterogeneous networks from time series via compressed sensing. PloS ONE 10, e0142837 (2015).
Article PubMed PubMed Central Google Scholar
Barabási, A.-L. & Albert, R. Emergence of scaling in random networks. Science 286, 509–512 (1999).
Article MathSciNet PubMed MATH ADS Google Scholar
Watts, D. J. & Strogatz, S. H. Collective dynamics of small-world networks. Nature 393, 440–442 (1998).
Article CAS PubMed MATH ADS Google Scholar
SocioPatterns Collaboration. http://www.sociopatterns.org/. Accessed Dec 2020.
Wang, H., Ma, C., Chen, H.-S., Lai, Y.-C.& Zhang, H.-F. Full reconstruction of simplicial complexes from binary contagion and Ising data. HuanWang2022/reconstruct_simplicial_complex, https://doi.org/10.5281/zenodo.6501187 (2022).

Download references

Acknowledgements

H.W., C.M., H.-S.C. and H.-F.Z. acknowledge supports from the National Natural Science Foundation of China (61973001, 12005001, 11875069), the Natural Science Foundation of Anhui Province (2008085QF299), and the University Synergy Innovation Program of Anhui Province (GXXT-2021-032). Y.-C.L. acknowledges support from the Office of Naval Research under Grant No. N00014-21-1-2323.

Author information

These authors contributed equally: Huan Wang, Chuang Ma.

Authors and Affiliations

The Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Mathematical Science, Anhui University, Hefei, 230601, China
Huan Wang & Hai-Feng Zhang
School of Internet, Anhui University, Hefei, 230601, China
Chuang Ma
School of Physics and Material Science, Anhui University, Hefei, 230601, China
Han-Shuang Chen
School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, AZ, 85287, USA
Ying-Cheng Lai

Authors

Huan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Chuang Ma
View author publications
You can also search for this author in PubMed Google Scholar
Han-Shuang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Ying-Cheng Lai
View author publications
You can also search for this author in PubMed Google Scholar
Hai-Feng Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

H.W. and C.M. have contributed equally to this work. H.W., C.M., H.-S.C., Y.-C.L., and H.-F.Z. designed the research project, the models and methods. H.W. and C.M. performed the numerical analysis. H.-F.Z. and Y.-C.L. wrote the paper.

Corresponding author

Correspondence to Hai-Feng Zhang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communicationsthanks Giovanni Petri, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Peer Review File

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Wang, H., Ma, C., Chen, HS. et al. Full reconstruction of simplicial complexes from binary contagion and Ising data. Nat Commun 13, 3043 (2022). https://doi.org/10.1038/s41467-022-30706-9

Download citation

Received: 23 August 2021
Accepted: 13 May 2022
Published: 01 June 2022
DOI: https://doi.org/10.1038/s41467-022-30706-9

This article is cited by

Hypergraph reconstruction from uncertain pairwise observations
- Simon Lizotte
- Jean-Gabriel Young
- Antoine Allard
Scientific Reports (2023)
Higher-order organization of multivariate time series
- Andrea Santoro
- Federico Battiston
- Enrico Amico
Nature Physics (2023)
Higher-order temporal interactions promote the cooperation in the multiplayer snowdrift game
- Yan Xu
- Juan Wang
- Zhen Wang
Science China Information Sciences (2023)
Tackling the subsampling problem to infer collective properties from limited data
- Anna Levina
- Viola Priesemann
- Johannes Zierenberg
Nature Reviews Physics (2022)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.