Influence maximization across heterogeneous interconnected networks based on deep learning
Introduction
People usually prefer to gather information from their acquaintances in the form of “word of mouth” than the other medias such as TV (Bakshy, Rosenn, Marlow & Adamic, 2012). With the growth of popularity of Online Social Networks (OSNs), a piece of information could quickly spread among people. Thus, OSNs can be considered as a platform for viral marketing. Companies target a small number of users, aka seed set to recommend and advertise their new products to their friends in such a way, maximal number of people adopt the products. The idea of information spreading through “word of mouth” on OSNs was firstly proposed as influence maximization (IM) by Domingos and Richardson (2001). There are many applications for IM such as viral marketing (Domingos & Richardson, 2001), rumor control (Budak, Agrawal & El Abbadi, 2011) and recommendation (Ye, Liu & Lee, 2012).
In traditional IM problem, one network is given to be explored to find optimal seed set. While in real world, users usually join to multiple social network simultaneously and information can spread across multiple OSNs via the bridge users who are member of different networks, simultaneously (Gaeta, 2018, Nguyen et al., 2013, Zhan et al., 2015, Zhang, Nguyen, Zhang and Thai, 2016). Hence, users can be influenced by the users of the other OSNs (Zhan, Zhang, Yu, Emery & Xie, 2016). Thus, the effects of external influence from users of other networks is an important factor which should be considered in IM problem. Unfortunately, most of the previous researches on IM, propose a method on a single network and the impact of external influence and the bridge users across multiple OSNs are ignored.
Kempe, Kleinberg and Tardos (2003) was firstly formulated IM as an optimization problem and proved NP-hardness of IM under two diffusion models including independent cascade (IC) and linear threshold (LT) model. Then, they proposed a greedy algorithm with approximation factor of (1−1/e) to find optimal seed set. Even though the accuracy of greedy algorithm is better than classical degree based approaches, however it suffers from scalability issue for large scale networks. Many researches have been done to overcome the scalability issue of the greedy algorithm. Leskovec et al. (2007) extended the sub modularity property of the influence function to speed up IM. LDAG is proposed to solve IM under LT model (Chen, Wang & Wang, 2010). While it could compute influence spread on DAGs in polynomial time, but the process of generating DAGs is NP-hard. In the previous researches, different heuristics has been made to reduce network size; then a solution is proposed based on the heuristics. Though, these algorithms have been made significant improvements in comparison to the greedy algorithm under IC and LT models, the cost of influence spreading over large networks is still inefficient.
Recently, researchers have investigated interconnected networks via bridge nodes (Liu et al., 2012, Shen, Dinh, Zhang and Thai, 2012, Yagan, Qian, Zhang and Cochran, 2012). Nguyen et al. (2013) illustrated influence can propagate on inside and across social networks. They integrate all the networks into one scheme which preserves the features of the users on the source networks. Then, traditional IM solution are employed to assess the influence spread on multiple OSNs. Shen et al. (2012) also combine all the networks to measure the influence spread of network users. Zhan et al. (2016) first extract different information channels in the network and construct a multi relational network by using these channels. Next, the seed sets are chosen through some measures that computes the influence of each node on the multi relational network.
In recent years, deep learning techniques have been made great impacts on different applications such as speech recognition (Mohamed, Yu & Deng, 2010), image processing (Krizhevsky, Sutskever & Hinton, 2012), information retrieval (Hinton & Salakhutdinov, 2011) and social network analysis (Perozzi, Al-Rfou & Skiena, 2014). Network embedding by using deep learning as a representation learning method, encodes the local and global features of the network into feature vectors (Grover & Leskovec, 2016; Keikha, Rahgozar & Asadpour, Keikha, Rahgozar and Asadpour, 2018, Perozzi, Al-Rfou and Skiena, 2014). Indeed, network embedding is a dimension reduction technique which can preserves all the structural features of the given network. The learned feature vectors can be applied on different applications such as clustering (Huang, Huang, Wang and Wang, 2014, Xie, Girshick and Farhadi, 2016) and link prediction (Grover & Leskovec, 2016).
In this paper, we propose a deep learning based algorithm named “DeepIM” for IM problem on interconnected networks by applying network embedding. Influence maximization across interconnected networks is highly challenging due to heterogeneous structural features, cross links and bridge nodes of the given networks. Furthermore, the complexity of IM problem on interconnected networks is more than the traditional IM because of increasing in problem size due to growth of network nodes. To the best of our knowledge, the proposed method is the first algorithm which has applied network embedding to solve IM problem.
We utilize CARE algorithm to extract global and local structural features of nodes on both networks (Keikha et al., 2018). We first generate a number of predefine customized paths for each node on both networks. These paths include both node's neighbors as the local and community information of the node as the global structure. Then, the customized paths are used to learn the best structural feature vector of the network nodes by using Word2vec framework (Mikolov, Chen, Corrado and Dean, 2013a, Mikolov, Chen, Corrado and Dean, 2013b). When the feature vectors are learned for each network node, we apply them to measure the extent of relevancy among users of interconnected networks. Next, the most influential nodes are chosen from the node who are related to more extent of the users inside and outside of the networks. Thus, we are able to find seed set by their feature vectors which are learned by network embedding. In contrast to the previous researches, in DeepIM algorithm, all the structural features of nodes are considered for influence spreading.
Extensive assessments of the proposed algorithm are performed on three datasets including DBLP networks (Tang et al., 2008), Twitter-Foursquare (Zhang, Kong & Yu, 2013) and NetHept (Kempe et al., 2003). Experimental evaluations indicate that bridge nodes of the input networks have a great impact on maximizing the influence inside and between the networks. The empirical analysis verifies the significant improvements of the proposed method in comparison to the previous researches on IM.
To summarize, we make the following contributions:
-
We present a novel algorithm for IM across interconnected networks that learns the best feature vector of nodes on different types of networks such as weighted, directed and complex networks.
-
To the best of our knowledge, the proposed method is the first algorithm that utilizes deep learning techniques to extract best structural features of the network nodes for IM.
-
We show the impact of bridge node to diffuse the influence across OSNs.
-
DeepIM finds most influential nodes inside and between networks for each node based on their local and global structural properties.
-
Network changes can be considered simply by the proposed method. So, the influence of the new nodes is computed without repeating the process of influence spreading for all the nodes.
-
We empirically evaluate the proposed method on two interconnected networks datasets and a single network. The experimental results indicate the scalability and efficiency DeepIM in contrast to the other IM approaches.
The rest of paper is organized as follows: Section 2 presents a formal definition of IM on interconnected networks. In Section 3, we summarize related works to IM methods and network embedding techniques. We explain the details of DeepIM algorithm in Section 4. Section 5 outlines the experimental results of the proposed method on different datasets. Finally, Section 6 presents conclusion and future works.
Section snippets
Influence maximization on interconnected networks
The goal of IM problem across interconnected networks is to select best seed set from both networks in which the maximum number of users have been influenced by the seed set in both networks. Suppose, two network graphs G1 = (V1, E1) and G2 = (V2, E2) are given, in which each edge ei = (ui, vi, wi) ∈ Ei represents a collaboration between ui and vi with weight wi in network i. With a budget K, we are going to find a seed set S of size K from users of both networks based on information
Related works
Kempe et al. (2003) proved IM is a NP-hard optimization problem and solve the problem in a greedy framework but their proposed method is inefficient over large networks. In the recent years, several IM approaches have been proposed on a single network by using different heuristics to improve the efficiency of the greedy framework. Li, Fan, Wang and Tan (2018) classified existing IM approaches based on their heuristics into three categories including simulation based, proxy based and sketch
Influence maximization on interconnected networks based on deep learning
In this section, we will describe the proposed algorithm for IM over interconnected networks. To find the best seed set on the networks, in the first step, a structural feature vector for each node is obtained. When the network model is learned by using deep learning techniques, we measure the extent of relatedness of any two users on the given networks. Afterwards, the most relevant users for each node are extracted from both networks and a vector of relevant users for each node is formed.
Experimental results
In this section, we have conducted several evaluations to measure the efficiency and performance of DeepIM algorithm on different datasets including interconnected networks and NetHept network. We compare our results with a number of baseline algorithms for IM, which are introduced in the following.
Conclusion
Recently, many people tend to join multiple online social networks which are considered as bridge users. In this paper, a novel influence maximization algorithm on interconnected networks is presented which leverages deep learning techniques. In the proposed algorithm which is named DeepIM, all the structural properties of networks are employed to maximize the influence. To learn the feature vector of nodes, we generate a number of custom paths for each user which contain neighbors of the
Declaration of Competing Interest
None.
CRediT authorship contribution statement
Mohammad Mehdi Keikha: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Resources, Data curation, Writing - original draft, Writing - review & editing, Visualization, Project administration. Maseud Rahgozar: Conceptualization, Methodology, Validation, Formal analysis, Investigation, Resources, Writing - original draft, Writing - review & editing, Supervision, Project administration. Masoud Asadpour: Conceptualization, Formal analysis, Writing - original
Acknowledgements
All persons who have made substantial contributions to the work reported in the manuscript, but who do not meet the criteria for authorship, are named in the Acknowledgements and have given us their written permission to be named. If we have not included an Acknowledgements, then that indicates that we have not received substantial contributions from non-authors.
References (44)
- et al.
The role of social networks in information diffusion
- et al.
Fast unfolding of communities in large networks
Journal of Statistical Mechanics
(2008) - et al.
Maximizing social influence in nearly optimal time
- et al.
Limiting the spread of misinformation in social networks
- et al.
Scalable influence maximization for prevalent viral marketing in large-scale social networks
- et al.
Efficient influence maximization in social networks
- et al.
Scalable influence maximization in social networks under the linear threshold model
- et al.
Sketch-based influence maximization and computation
- et al.
Mining the network value of customers
A model of information diffusion in interconnected online social networks
ACM Transactions on the Web
(2018)