Influence maximization across heterogeneous interconnected networks based on deep learning

https://doi.org/10.1016/j.eswa.2019.112905Get rights and content

Highlights

  • DeepIM is the first algorithm which employs deep learning techniques for IM problem.

  • Both local and global structure are considered during network feature learning.

  • DeepIM is scalable for large scale networks because of parallel network exploration.

  • Bridge users have an important role to transfer information across networks.

  • Node influence is computed in dynamic networks without recalculation of all nodes.

Abstract

With the fast development of online social networks, a large number of their members are involved in more than one social network. Finding most influential users is one of the interesting social network analysis tasks. The influence maximization (IM) problem aims to select a minimum set of users who maximize the influence spread on the underlying network. Most of the previous researches only focus on a single social networks, whereas in real world, users join to multiple social networks. Thus, influence can spread through common users on multiple networks. Besides, the existing works including simulation based, proxy based and sketch based approaches suffer from different issues including scalability, efficiency and feasibility due to the nature of these approaches for exploring networks and computation of their influence diffusion. Moreover, in the previous algorithms, several heuristics are employed to capture network topology for IM. But, these methods have information loss during network exploration because of their pruning strategies.

In this paper, a new research direction is presented for studying IM problem on interconnected networks. The proposed approach employs deep learning techniques to learn the feature vectors of network nodes while preserving both local and global structural information. To the best of our knowledge, network embedding has not yet been used to solve IM problem. Indeed, our algorithm leverages deep learning techniques for feature engineering to extract all the appropriate information related to IM problem for single and interconnected networks. Moreover, we prove that the proposed algorithm is monotone and submodular, thus, an optimal solution is guaranteed by the proposed approach. The experimental results on two interconnected networks including DBLP and Twitter-Foursquare illustrate the efficiency of the proposed algorithm in comparison to state of the art IM algorithms. We also conduct some experiments on NetHept dataset to evaluate the performance of the proposed approach on single networks.

Introduction

People usually prefer to gather information from their acquaintances in the form of “word of mouth” than the other medias such as TV (Bakshy, Rosenn, Marlow & Adamic, 2012). With the growth of popularity of Online Social Networks (OSNs), a piece of information could quickly spread among people. Thus, OSNs can be considered as a platform for viral marketing. Companies target a small number of users, aka seed set to recommend and advertise their new products to their friends in such a way, maximal number of people adopt the products. The idea of information spreading through “word of mouth” on OSNs was firstly proposed as influence maximization (IM) by Domingos and Richardson (2001). There are many applications for IM such as viral marketing (Domingos & Richardson, 2001), rumor control (Budak, Agrawal & El Abbadi, 2011) and recommendation (Ye, Liu & Lee, 2012).

In traditional IM problem, one network is given to be explored to find optimal seed set. While in real world, users usually join to multiple social network simultaneously and information can spread across multiple OSNs via the bridge users who are member of different networks, simultaneously (Gaeta, 2018, Nguyen et al., 2013, Zhan et al., 2015, Zhang, Nguyen, Zhang and Thai, 2016). Hence, users can be influenced by the users of the other OSNs (Zhan, Zhang, Yu, Emery & Xie, 2016). Thus, the effects of external influence from users of other networks is an important factor which should be considered in IM problem. Unfortunately, most of the previous researches on IM, propose a method on a single network and the impact of external influence and the bridge users across multiple OSNs are ignored.

Kempe, Kleinberg and Tardos (2003) was firstly formulated IM as an optimization problem and proved NP-hardness of IM under two diffusion models including independent cascade (IC) and linear threshold (LT) model. Then, they proposed a greedy algorithm with approximation factor of (1−1/e) to find optimal seed set. Even though the accuracy of greedy algorithm is better than classical degree based approaches, however it suffers from scalability issue for large scale networks. Many researches have been done to overcome the scalability issue of the greedy algorithm. Leskovec et al. (2007) extended the sub modularity property of the influence function to speed up IM. LDAG is proposed to solve IM under LT model (Chen, Wang & Wang, 2010). While it could compute influence spread on DAGs in polynomial time, but the process of generating DAGs is NP-hard. In the previous researches, different heuristics has been made to reduce network size; then a solution is proposed based on the heuristics. Though, these algorithms have been made significant improvements in comparison to the greedy algorithm under IC and LT models, the cost of influence spreading over large networks is still inefficient.

Recently, researchers have investigated interconnected networks via bridge nodes (Liu et al., 2012, Shen, Dinh, Zhang and Thai, 2012, Yagan, Qian, Zhang and Cochran, 2012). Nguyen et al. (2013) illustrated influence can propagate on inside and across social networks. They integrate all the networks into one scheme which preserves the features of the users on the source networks. Then, traditional IM solution are employed to assess the influence spread on multiple OSNs. Shen et al. (2012) also combine all the networks to measure the influence spread of network users. Zhan et al. (2016) first extract different information channels in the network and construct a multi relational network by using these channels. Next, the seed sets are chosen through some measures that computes the influence of each node on the multi relational network.

In recent years, deep learning techniques have been made great impacts on different applications such as speech recognition (Mohamed, Yu & Deng, 2010), image processing (Krizhevsky, Sutskever & Hinton, 2012), information retrieval (Hinton & Salakhutdinov, 2011) and social network analysis (Perozzi, Al-Rfou & Skiena, 2014). Network embedding by using deep learning as a representation learning method, encodes the local and global features of the network into feature vectors (Grover & Leskovec, 2016; Keikha, Rahgozar & Asadpour, Keikha, Rahgozar and Asadpour, 2018, Perozzi, Al-Rfou and Skiena, 2014). Indeed, network embedding is a dimension reduction technique which can preserves all the structural features of the given network. The learned feature vectors can be applied on different applications such as clustering (Huang, Huang, Wang and Wang, 2014, Xie, Girshick and Farhadi, 2016) and link prediction (Grover & Leskovec, 2016).

In this paper, we propose a deep learning based algorithm named “DeepIM” for IM problem on interconnected networks by applying network embedding. Influence maximization across interconnected networks is highly challenging due to heterogeneous structural features, cross links and bridge nodes of the given networks. Furthermore, the complexity of IM problem on interconnected networks is more than the traditional IM because of increasing in problem size due to growth of network nodes. To the best of our knowledge, the proposed method is the first algorithm which has applied network embedding to solve IM problem.

We utilize CARE algorithm to extract global and local structural features of nodes on both networks (Keikha et al., 2018). We first generate a number of predefine customized paths for each node on both networks. These paths include both node's neighbors as the local and community information of the node as the global structure. Then, the customized paths are used to learn the best structural feature vector of the network nodes by using Word2vec framework (Mikolov, Chen, Corrado and Dean, 2013a, Mikolov, Chen, Corrado and Dean, 2013b). When the feature vectors are learned for each network node, we apply them to measure the extent of relevancy among users of interconnected networks. Next, the most influential nodes are chosen from the node who are related to more extent of the users inside and outside of the networks. Thus, we are able to find seed set by their feature vectors which are learned by network embedding. In contrast to the previous researches, in DeepIM algorithm, all the structural features of nodes are considered for influence spreading.

Extensive assessments of the proposed algorithm are performed on three datasets including DBLP networks (Tang et al., 2008), Twitter-Foursquare (Zhang, Kong & Yu, 2013) and NetHept (Kempe et al., 2003). Experimental evaluations indicate that bridge nodes of the input networks have a great impact on maximizing the influence inside and between the networks. The empirical analysis verifies the significant improvements of the proposed method in comparison to the previous researches on IM.

To summarize, we make the following contributions:

  • ­

    We present a novel algorithm for IM across interconnected networks that learns the best feature vector of nodes on different types of networks such as weighted, directed and complex networks.

  • ­

    To the best of our knowledge, the proposed method is the first algorithm that utilizes deep learning techniques to extract best structural features of the network nodes for IM.

  • ­

    We show the impact of bridge node to diffuse the influence across OSNs.

  • ­

    DeepIM finds most influential nodes inside and between networks for each node based on their local and global structural properties.

  • ­

    Network changes can be considered simply by the proposed method. So, the influence of the new nodes is computed without repeating the process of influence spreading for all the nodes.

  • ­

    We empirically evaluate the proposed method on two interconnected networks datasets and a single network. The experimental results indicate the scalability and efficiency DeepIM in contrast to the other IM approaches.

The rest of paper is organized as follows: Section 2 presents a formal definition of IM on interconnected networks. In Section 3, we summarize related works to IM methods and network embedding techniques. We explain the details of DeepIM algorithm in Section 4. Section 5 outlines the experimental results of the proposed method on different datasets. Finally, Section 6 presents conclusion and future works.

Section snippets

Influence maximization on interconnected networks

The goal of IM problem across interconnected networks is to select best seed set from both networks in which the maximum number of users have been influenced by the seed set in both networks. Suppose, two network graphs G1  = (V1, E1) and G2  = (V2, E2) are given, in which each edge ei  =  (ui, vi, wi) ∈ Ei represents a collaboration between ui and vi with weight wi in network i. With a budget K, we are going to find a seed set S of size K from users of both networks based on information

Related works

Kempe et al. (2003) proved IM is a NP-hard optimization problem and solve the problem in a greedy framework but their proposed method is inefficient over large networks. In the recent years, several IM approaches have been proposed on a single network by using different heuristics to improve the efficiency of the greedy framework. Li, Fan, Wang and Tan (2018) classified existing IM approaches based on their heuristics into three categories including simulation based, proxy based and sketch

Influence maximization on interconnected networks based on deep learning

In this section, we will describe the proposed algorithm for IM over interconnected networks. To find the best seed set on the networks, in the first step, a structural feature vector for each node is obtained. When the network model is learned by using deep learning techniques, we measure the extent of relatedness of any two users on the given networks. Afterwards, the most relevant users for each node are extracted from both networks and a vector of relevant users for each node is formed.

Experimental results

In this section, we have conducted several evaluations to measure the efficiency and performance of DeepIM algorithm on different datasets including interconnected networks and NetHept network. We compare our results with a number of baseline algorithms for IM, which are introduced in the following.

Conclusion

Recently, many people tend to join multiple online social networks which are considered as bridge users. In this paper, a novel influence maximization algorithm on interconnected networks is presented which leverages deep learning techniques. In the proposed algorithm which is named DeepIM, all the structural properties of networks are employed to maximize the influence. To learn the feature vector of nodes, we generate a number of custom paths for each user which contain neighbors of the

Declaration of Competing Interest

None.

CRediT authorship contribution statement

Mohammad Mehdi Keikha: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Resources, Data curation, Writing - original draft, Writing - review & editing, Visualization, Project administration. Maseud Rahgozar: Conceptualization, Methodology, Validation, Formal analysis, Investigation, Resources, Writing - original draft, Writing - review & editing, Supervision, Project administration. Masoud Asadpour: Conceptualization, Formal analysis, Writing - original

Acknowledgements

All persons who have made substantial contributions to the work reported in the manuscript, but who do not meet the criteria for authorship, are named in the Acknowledgements and have given us their written permission to be named. If we have not included an Acknowledgements, then that indicates that we have not received substantial contributions from non-authors.

References (44)

  • E. Bakshy et al.

    The role of social networks in information diffusion

  • V.D. Blondel et al.

    Fast unfolding of communities in large networks

    Journal of Statistical Mechanics

    (2008)
  • C. Borgs et al.

    Maximizing social influence in nearly optimal time

  • C. Budak et al.

    Limiting the spread of misinformation in social networks

  • W. Chen et al.

    Scalable influence maximization for prevalent viral marketing in large-scale social networks

  • W. Chen et al.

    Efficient influence maximization in social networks

  • W. Chen et al.

    Scalable influence maximization in social networks under the linear threshold model

  • E. Cohen et al.

    Sketch-based influence maximization and computation

  • P. Domingos et al.

    Mining the network value of customers

  • R. Gaeta

    A model of information diffusion in interconnected online social networks

    ACM Transactions on the Web

    (2018)
  • J. Goldenberg et al.

    Talk of the network: A complex systems look at the underlying process of word-of-mouth

    Marketing Letters

    (2001)
  • A. Goyal et al.

    CELF++ : Optimizing the greedy algorithm for influence maximization in social networks

  • A. Goyal et al.

    SIMPATH: An efficient algorithm for influence maximization under the linear threshold model

  • M. Granovetter

    Threshold models of collective behavior

    American Journal of Sociology

    (1978)
  • A. Grover et al.

    Node2Vec: Scalable feature learning for networks

  • G. Hinton et al.

    Discovering binary codes for documents by learning deep generative models

    Topics in Cognitive Science

    (2011)
  • P. Huang et al.

    Deep embedding network for clustering

  • M.M. Keikha et al.

    Community aware random walk for network embedding

    Knowledge-Based Systems

    (2018)
  • D. Kempe et al.

    Maximizing the spread of influence through a social network

  • M. Kimura et al.

    Tractable models for information diffusion in social networks

    Knowledge Discovery in Databases

    (2006)
  • A. Krizhevsky et al.

    ImageNet classification with deep convolutional neural networks

    Advances in Neural Information Processing Systems

    (2012)
  • J. Leskovec et al.

    Cost-effective outbreak detection in networks

  • Cited by (0)

    View full text