Deprecation based greedy strategy for target set selection in large scale social networks

doi:10.1016/j.ins.2015.04.024

Information Sciences

Volume 316, 20 September 2015, Pages 107-122

https://doi.org/10.1016/j.ins.2015.04.024 Get rights and content

Abstract

The problem of target set selection for large scale social networks is addressed in the paper. We describe a novel deprecation based greedy strategy to be applied over a pre-ordered (as obtained with any heuristic influence function) set of nodes. The proposed algorithm runs in iteration and has two stages, (i) Estimation: where the performance of each node is evaluated and (ii) Marking: where the nodes to be deprecated in later iterations are marked. We have theoretically proved that for any monotonic and sub-modular influence function, the algorithm correctly identifies the nodes to be deprecated. For any finite set of input nodes it is shown that the algorithm can meet the ending criteria. The worst case performance of the algorithm, both in terms of time and performance, is also analyzed. Experimental results on seven un-weighted as well as weighted social networks show that the proposed strategy improves the ranking of the input seeds in terms of the total number of nodes influenced.

Introduction

Information diffusion over social networks in the form of “word-of-mouth” is studied in different fields of research including epidemiology [9], [34], sociology [2], [39] and economics [15], [16]. More recently, scholars of computer science got interested in the field due to the emergence of online social networks, like, Twitter, Facebook and YouTube, and their extreme popularity. Different research issues have been addressed in this direction [11], [18], [20], [21], [38], [41]. One of the important problems within the area of said research is target set selection.

A variant of the problem of target set selection is to select k-top influential nodes such that they maximize the influence on the network. There are other variants in the literature such as those in [3], [29] which we will not cover in this study. Solutions of the target set selection problem have endless applications. For example, they are useful in viral marketing through online social networks [10], [27], in identifying top stories in news network, in finding the highest influencing blogs in the blogger network [26], in providing personalized recommendation [17], [40], in determining the impact of an article from the scientists’ citation network, and in spreading social awareness through social media.

Diffusion of information, in a nutshell, is the process by which an innovation or idea is spread over the networks by means of communication among the social entities [36]. It is the newness of the information that drives the cascade over the networks. One of the simplest models of the diffusion process available for the computer science researchers is independent cascade model [15], [20]. The model runs in discrete steps. In each step, an active or influenced node tries to activate one of its inactive neighbors with a probability p, called propagation probability or diffusion speed. Irrespective of its success, the same node will never get a chance to activate the same neighbor. The process, however, is highly stochastic, and Kempe et al. [20] showed that the optimization problem is NP hard. They also provided a Greedy Hill Climbing algorithm, which gives $(1 - \frac{1}{e} - ∊)$ approximation to the optimal solution. However, the algorithm is time consuming, especially for large scale networks. For example, it takes days to compute on a network of size 30 K nodes [6]. Various improvements of the greedy algorithm in terms of computation time are described in [3], [12], [24]. On the other hand, several heuristic algorithms [3], [4], [6] are developed which run faster, but they provide sub-optimal results.

This paper addresses the aforesaid problem within the context of information diffusion on large scale social networks. We describe a deprecation based greedy strategy (DGS) for target set selection and apply it over a list of nodes which are pre-ordered based on some fast heuristic influence score. We theoretically prove that the method correctly identifies the nodes to be deprecated as well as provides a guaranteed solution to the target set selection problem when the influence function is monotonic and sub-modular. The convergence of the proposed algorithm is proved analytically. It is shown experimentally, with seven real life large scale social network data sets (both weighted and un-weighted) that applying DGS over a heuristic algorithm produces better solution for the target set selection problem.

The paper is organized as follows: Section 2 describes the preliminary concepts of networks related to this study. Problem statement and related investigations are briefly explained in Section 3. Section 4 illustrates the Deprecation based Greedy Strategy (DGS), and its proof of correctness, convergence and optimization guarantee. Experiments and results are reported in Section 5.

Section snippets

Preliminaries

We describe in this section some notations and definitions related to social networks.

Problem statement and related work

Consider an influence function $σ : 2^{V} \to N$ for a social network $G (V, E)$ . Given a set of initial active nodes $S \in 2^{V}, σ (S)$ returns the expected number of active nodes at the end of the information diffusion. In top-k nodes selection problem, we are interested to find the k number of influential nodes, which in turn produce the maximum influence in the network after diffusion of the information. So, this is a maximization problem defined as follows: $\begin{matrix} \underset{S}{maximize} & σ (S) \\ subject to | S | = & k, k > 0 . \end{matrix}$

Let us now describe some

Deprecation based Greedy Strategy (DGS)

In this section, we describe a new deprecation based greedy strategy, algorithm and the proof of its correctness. We also address the convergence of the algorithm and its approximation guarantees.

Experiments and results

Experiments have been conducted over seven real life social networks to validate the aforesaid mathematical findings. We have considered Independent Cascade Model of information diffusion and applied DGS over the list of nodes pre-ordered by high degree heuristic (HDH), diffusion degree heuristic (DiDH) and degree discount heuristic (DDH). Performance of DGS is evaluated in two ways. Firstly we measure the improvement made by DGS over the seeds selected by the corresponding heuristic methods.

Discussions and conclusion

In this paper, we proposed a new strategy, called deprecation based greedy strategy (DGS) and the corresponding algorithm for finding top-k influential nodes of social networks from a list of nodes pre-ordered through the help of heuristic measures. The algorithm searches for the lower contributory nodes in the list and marks them as deprecated. These deprecated nodes are then removed from any further consideration. We show analytically that when the influence function is monotonic and

Acknowledgments

The authors acknowledge the Department of Science and Technology, Govt. of India for funding the Center for Soft Computing Research at Indian Statistical Institute. S.K. Pal acknowledges the J.C. Bose National Fellowship and INAE Chair Professorship. The support provided by Prof. Debesh K. Das of Department of Computer Science and Engineering, Jadavpur University is greatly acknowledged.

References (41)

E. Berger
Dynamic monopolies of constant size
J. Comb. Theory, Ser. B
(2001)
L. Freeman
Centrality in social networks conceptual clarification
Soc. Networks
(1979)
J. Ha et al.
An analysis on information diffusion through BlogCast in a blogosphere
Inform. Sci.
(2015)
Y.-M. Li et al.
Discovering influencers for marketing in the blogosphere
Inform. Sci.
(2011)
S. Liu et al.
Identifying effective influencers based on trust for electronic word-of-mouth marketing: a domain-aware approach
Inform. Sci.
(2015)
T. Opsahl et al.
Clustering in weighted networks
Soc. Networks
(2009)
C. Wang et al.
A global optimization algorithm for target set selection problems
Inform. Sci.
(2014)
Z. Yu et al.
Friend recommendation with content spread enhancement in social networks
Inform. Sci.
(2015)
T. Zhu et al.
Maximizing the spread of influence ranking in social networks
Inform. Sci.
(2014)
O. Ben-Zwi et al.
An exact almost optimal algorithm for target set selection in social networks

N. Chen

On the approximability of influence in social networks

SIAM J. Discrete Math.

(2009)

W. Chen et al.

Scalable influence maximization for prevalent viral marketing in large-scale social networks

W. Chen et al.

Efficient influence maximization in social networks

W. Chen et al.

Scalable influence maximization in social networks under the linear threshold model

P. Clifford et al.

A model for spatial conflict

Biometrika

(1973)

M. De Choudhury, H. Sundaram, A. John, D. Seligmann, A. Kelliher, “Birds of a Feather”: Does User Homophily Impact...

Z. Dezso et al.

Halting viruses in scale-free networks

Phys. Rev. E

(2002)

P. Domingos

Mining social networks for viral marketing

IEEE Intell. Syst.

(2005)

P. Domingos et al.

Mining the network value of customers

P.A. Estevez et al.

Selecting the most influential nodes in social networks

Cited by (0)

View full text

Deprecation based greedy strategy for target set selection in large scale social networks

Abstract

Introduction

Section snippets

Preliminaries

Problem statement and related work

Deprecation based Greedy Strategy (DGS)

Experiments and results

Discussions and conclusion

Acknowledgments

J. Comb. Theory, Ser. B

Soc. Networks

Inform. Sci.

Inform. Sci.

Inform. Sci.

Soc. Networks

Inform. Sci.

Inform. Sci.

Inform. Sci.

An exact almost optimal algorithm for target set selection in social networks

On the approximability of influence in social networks

SIAM J. Discrete Math.

Scalable influence maximization for prevalent viral marketing in large-scale social networks

Efficient influence maximization in social networks

Scalable influence maximization in social networks under the linear threshold model

A model for spatial conflict

Biometrika

Halting viruses in scale-free networks

Phys. Rev. E

Mining social networks for viral marketing

IEEE Intell. Syst.

Mining the network value of customers

Selecting the most influential nodes in social networks