Elsevier

Knowledge-Based Systems

Volume 187, January 2020, 104833
Knowledge-Based Systems

A discrete shuffled frog-leaping algorithm to identify influential nodes for influence maximization in social networks

https://doi.org/10.1016/j.knosys.2019.07.004Get rights and content

Abstract

Influence maximization problem aims to select a subset of k most influential nodes from a given network such that the spread of influence triggered by the seed set will be maximum. Greedy based algorithms are time-consuming to approximate the expected influence spread of given node set accurately and not well scalable to large-scale networks especially when the propagation probability is large. Conventional heuristics based on network topology or confined diffusion paths tend to suffer from the problem of low solution accuracy or huge memory cost. In this paper an effective discrete shuffled frog-leaping algorithm (DSFLA) is proposed to solve influence maximization problem in a more efficient way. Novel encoding mechanism and discrete evolutionary rules are conceived based on network topology structure for virtual frog population. To facilitate the global exploratory solution, a novel local exploitation mechanism combining deterministic and random walk strategies is put forward to improve the suboptimal meme of each memeplex in the frog population. The experimental results of influence spread in six real-world networks and statistical tests show that DSFLA performs effectively in selecting targeted influential seed nodes for influence maximization and is superior than several state-of-the-art alternatives.

Introduction

Social networks have become powerful platforms for information diffusion and viral marketing by expanding billions of loyal users. An underlying cause fostering the capabilities is the social influence, which maps the interactions between individuals in the network and can be evaluated based on trust and reputation [1]. One of the typical applications promoted by social network is the viral marketing [2], which appreciates the important effect of ‘word-of-mouth’ that indwells the interpersonal influence relationship of consumers and can reshape consumers’ attitudes and behaviors [3]. Influence maximization problem is targeted to select a subset of k influential seed nodes that can maximize the spread of influence into the network. The problem was coined by Domingos and Richardson [4] firstly in terms of network perspective through which the most potential customers are identified to maximize the expected profit of a product promotion activity.

As emphasized in [5], [6], there are two challenges in tackling influence maximization problem. The first difficulty is to estimate the influence spread of given node set accurately, which was proved to be P-hard. The second one is to provide effective and efficient algorithms for the selection of a subset influential nodes which can maximize the spread of influence into the network. Kempe et al. [7] firstly formulated influence maximization as a discrete optimization problem and proposed a greedy approach with guaranteed solution accuracy. However, experimental results [8], [9] showed that greedy algorithm is time-consuming especially in large-scale networks. This is because the algorithm has to run k rounds to select the targeted seed nodes. In each round, the algorithm needs to carry out R (R10,000) Monte-Carlo simulations to evaluate the marginal gain of each of the N nodes in the network approximately, and for each simulation the M edges of the network will be traversed inevitably. Consequently, the time complexity of the original greedy algorithm is O(kNMR).

Following up on the seminal work, novel influence estimators and influential node selecting approaches have emerged to solve influence maximization problem in a more efficient way. Chen [10] proposed an improved greedy algorithm by pruning the edges that hardly take part in influence spread in the network. Jiang et al. [11] proposed an expected diffusion value estimator to evaluate the spread of influence within the one-hop area of given candidate nodes. However, it performs less effective than the local influence estimator that optimizes the expected influence spread within the two-hop area of given candidate nodes [12]. Kimura et al. [13] assumed that influence only spreads along the shortest and second shortest paths, and proposed a shortest path-based influence maximization algorithm. Further more, by assuming that influence spreads on the paths independent of each other, Kim et al. [14] proposed a parallel influence path-based algorithm to identify the seed nodes in a faster way. Cao et al. [15] systematically studied the influence maximization problem based on community detection. They transformed influence maximization to an optimal resource allocation problem and proposed an optimal dynamic programming algorithm to find an optimal seed allocation. As demonstrated in [16], community-based influence maximization algorithms are generally faster than traditional greedy algorithms, but the accuracy and the scalability of the community-based algorithms need improved. Compared with the original simple greedy algorithm, those methods are more efficient by reducing or avoiding the number of Monte-Carlo simulations. However, sacrifices in solution accuracy and memory cost have to be made to compensate these novel influence maximization algorithms. Therefore, developing effective and efficient methods for influence maximization in large-scale networks still remains as an open research topic of social network analysis and is of great significance due to its promising applications in the spread of information, such as innovation diffusion, viral marketing, etc.

The effectiveness and robustness of meta-heuristic algorithms based on swarm intelligence have been widely validated by many applications on complex optimization problems such as symbolic regression problem [17], feature selection in data mining and machine learning [18], sports training sessions [19] as well as influence maximization problem [20], [21], etc. In this paper, a discrete shuffled frog-leaping algorithm (DSFLA) is proposed based on network topology characteristic to identify influential nodes for influence maximization. The main contributions of our paper are as follows.

Encoding mechanism for virtual frog individual and discrete evolutionary rules for frog population are conceived based on network topology structure, respectively. Then the framework of discrete shuffled frog-leaping algorithm for influence maximization problem is presented.

To facilitate the global exploratory solution during the evolutionary process, a local exploitation mechanism combining deterministic and random walk strategies is put forward to improve the suboptimal meme of each memeplex in the frog population.

The orthogonal experimental design method is adopted to optimize the parameter settings of DSFLA, and the experimental results and statistical tests in six real-world networks show that the proposed DSFLA is effective and efficient for influence maximization, and can be scalable to large-scale networks.

The remainder of this paper is organized as follows: Section 2 reviews related works. Influence maximization problem, the independent cascade model and an effective influence estimator used in this paper are introduced in Section 3. Section 4 gives the proposed discrete shuffled frog-leaping algorithm and the framework of DSFLA for influence maximization. Performance validation of DSFLA and statistical tests are provided in Section 5. Section 6 concludes this paper with future works.

Section snippets

Related works

Since the seminal work by Domingos and Richardson [4], great attention has been paid to the interesting problem. In general, the existing majority of influence maximization algorithms can be mainly categorized into the following three aspects: greedy based algorithms, reverse influence sampling algorithms and advanced heuristic algorithms.

Influence maximization problem

Definition 1

Let G=(V,E) be a network, where V is the node set and E is the edge set of the network. Influence maximization problem aims to select targeted k (1k<|V|) influential nodes as seed set S such that the number of influenced nodes triggered by the seed set S, denoted as influence spread σ(S), is maximum under a given propagation model. S=argmaxSV,|S|=kσ(S)where S is a candidate seed set, σ(S) is the expected number of influenced nodes that are triggered by S, and S is the best seed set that

Proposed method

As discussed above, the expected influence spread of given candidate nodes can be evaluated according to the local influence estimator, so optimization algorithms can be utilized to maximize the fitness value of the LIE function. Shuffled frog-leaping algorithm [40] is an advanced meta-heuristic algorithm, and its effectiveness on optimization problems has been validated in many studies [41], [42]. Inspired by the efficient evolutionary mechanism based on swarm intelligence, we try to make

Datasets and baseline algorithms

To validate the performance of the proposed DSFLA on influence maximization problem, experiments are carried out on six real-world social networks, as shown in Table 1.

AstroPh and CondMat [46] are two undirected collaboration networks which cover scientific collaborations between authors of papers submitted to Arxiv Astro Physics and Condensed Matter, respectively. Slashdot [47] is a technology-related news social network known for its specific user community, and it is treated as an undirected

Conclusions and future works

The shuffled frog-leaping algorithm which combines deterministic and random search strategies shows excellent performance on various complex optimization problems. In this paper, a discrete shuffled frog-leaping algorithm is proposed specially to identify influential nodes for influence maximization. In the proposed framework, discrete encoding mechanism and evolutionary rules are conceived based on network topology, and a local degree-based replacement strategy is presented to cooperate with

Acknowledgments

This work is supported by the National Natural Science Foundations of China (Grant No. 21503101 and No. 61702240) and the CERNET Innovation Project, China (NGII20170422).

References (49)

  • GuiC. et al.

    Overlapping communities detection based on spectral analysis of line graphs

    Physica A

    (2018)
  • ShangJ. et al.

    IMPC: Influence maximization based on multi-neighbor potential in community networks

    Physica A

    (2018)
  • TangJ. et al.

    Maximizing the spread of influence via the collective intelligence of discrete bat algorithm

    Knowl.-Based Syst.

    (2018)
  • NewmanM.J.

    A measure of betweenness centrality based on random walks

    Social Networks

    (2005)
  • Al-garadiM.A. et al.

    Identification of influential spreaders in online social networks using interaction weighted K-core decomposition method

    Physica A

    (2017)
  • Uren̄aR. et al.

    A social network based approach for consensus achievement in multiperson decision making

    Inf. Fusion

    (2019)
  • XueY. et al.

    Fuzzy Rough set algorithm with binary shuffled frog-leaping (BSFL-frsa): An innovative approach for identifying main drivers of carbon exchange in temperate deciduous forests

    Ecol. Indicators

    (2017)
  • LuoJ. et al.

    A new hybrid memetic multi-objective optimization algorithm for multi-objective optimization

    Inform. Sci.

    (2018)
  • MaoM. et al.

    Grid-connected modular PV-converter system with shuffled frog leaping algorithm based DMPPT controller

    Energy

    (2018)
  • BrownJ.J. et al.

    Social ties and word-of-mouth referral behavior

    J. Consum. Res.

    (1987)
  • P. Domingos, M. Richardson, Mining the network value of customers, in: ACM SIGKDD International Conference on Knowledge...
  • LeeJ.R. et al.

    A query approach for influence maximization on specific users in social networks

    IEEE Trans. Knowl. Data Eng.

    (2015)
  • D. Kempe, J. Kleinberg, Maximizing the spread of influence through a social network, in: ACM SIGKDD International...
  • J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. Vanbriesen, N. Glance, Cost-effective outbreak detection in...
  • Cited by (92)

    • TSIFIM: A three-stage iterative framework for influence maximization in complex networks

      2023, Expert Systems with Applications
      Citation Excerpt :

      Unlike some existing researches, IMUD effectively avoids non-target users in viral marketing networks, i.e., those who “hate” the promotion of an activity. Based on network topology characteristic, Tang et al. (2020) proposed the discrete shuffled frog-leaping algorithm for solving the IM problem. Calio and Tagarelli (2021) put forward the ADITUM algorithm to determine the influential spreaders in complex networks, which disperses the seeds as much as possible according to the side-information available at node level, where the side-information corresponds to the categorical attribute values.

    View all citing articles on Scopus

    No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.knosys.2019.07.004.

    View full text