Combined fuzzy clustering and firefly algorithm for privacy preserving in social networks

doi:10.1016/j.eswa.2019.112968

Expert Systems with Applications

Volume 141, 1 March 2020, 112968

https://doi.org/10.1016/j.eswa.2019.112968 Get rights and content

Highlights

•
A combined fuzzy clustering and firefly algorithm (KFCFA) is presented.
•
A constrained multi-objective function is introduced for privacy preserving in social networks.
•
The proposed anonymity methodology can be performed at data level and graph level.
•
Our methodology guarantees to fulfill K-anonymity, L-diversity and T-closeness conditions.
•
The method is simulated over four social networks: Facebook, Google+, Twitter and Youtube.

Abstract

In recent years, an explosive growth of social networks has been made publicly available for understanding the behavior of users and data mining purposes. The main challenge in sharing the social network databases is protecting public released data from individual identification. The most common privacy preserving technique is anonymizing data by removing or changing some information, while the anonymized data should retain as much information as possible of the original data. K-anonymity and its extensions (e.g., L-diversity and T-closeness) have widely been used for data anonymization. The main drawback of the existing anonymity techniques is the lack of protection against attribute/link disclosure and similarity attacks. Moreover, they suffer from high amount of information loss in the released database. In order to overcome these drawbacks, this paper proposes a combined anonymizing algorithm based on K-member Fuzzy Clustering and Firefly Algorithm (KFCFA) to protect the anonymized database against identity disclosure, attribute disclosure, link disclosure, and similarity attacks, and significantly minimize the information loss. In KFCFA, at first, a modified K-member version of fuzzy c-means is utilized to create balanced clusters with at least K members in each cluster. Then, firefly algorithm is performed for further optimizing the primary clusters and anonymizing the network graph and data. To achieve this purpose, a constrained multi-objective function is introduced to simultaneously minimize the clustering error rate and the generated information loss, while satisfying the defined anonymity constraints. The proposed methodology can be utilized for both network graph structures and micro data. Simulation results over four social network databases from Facebook, Google+, Twitter and YouTube demonstrate the efficiency of the proposed KFCFA algorithm to minimize the information loss of the published data and graph, while satisfying K-anonymity, L-diversity and T-closeness conditions.

Introduction

With the advancement of technology, online social networks have attracted many attentions from the people all around the world to socialize themselves, interact with their friends, seek various information about their surroundings, express them to others, and also to meet social expectations in the society (James, Warkentin & Collignon, 2015). Social networks contain huge amounts of information about their users. Although the stored information can be beneficial to improve the quality of services to the users, it may also endanger the privacy of users. This is due to existence of sensitive information about users in the social networks. Therefore, social network users want to maintain the privacy of their shared data. The companies or data-owner also want to maintain the privacy of individuals, but at the same time, they want to analyze the social network data for their own benefits. The sharing of data by the data-owner to the data-miner could have multiple business or scientific objectives (Kumar & Kumar, 2017). The major challenge in sharing the social network data with the data-miner is maintaining the individual privacy of users while retaining the implicit knowledge embedded in the social network. Therefore, there is a need of anonymizing the social network data (graph and matrix) before releasing it to data-miner (Zhou, Pei & Luk, 2008).

Generally, in the anonymizing techniques, some edges of the graph or attributes of the data matrix are changed to satisfy protecting against the determined attacks. The anonymized graph/matrix should retain as much information as possible of the original graph/matrix (Lin & Wei, 2009). Various clustering-based techniques based on K-anonymity (KA) (Sweeney, 2002) and its extensions, L-diversity (LD) (Machanavajjhala, Gehrke, Kifer & Venkitasubramaniam, 2006) and T-closeness (TC) (Li, Li & Venkatasubramanian, 2007) have been proposed for graph/matrix anonymizing. The goal of K-anonymity is to cluster all samples into different clusters with at least K samples in each cluster. This technique preserves data samples against identity attacks with probability of 1/K.

Regarding the privacy preserving in social networks data publishing, privacy threats can be classified into three main categories (Casas-Roma, Herrera-Joancomartí & Torra, 2017; Zhou et al., 2008):

•
Identity disclosure occurs when the identity of a user is revealed. It includes sub-categories such as vertex existence, vertex properties and graph metrics.
•
Attribute disclosure seeks not to identify a user, but to reveal the sensitive attributes of the user. In this case, the sensitive data vector associated with each user is compromised.
•
Link disclosure occurs when sensitive relationship (link) between two users is disclosed. Depending on network characteristics, it can be refined as link relationships, sensitive edge labels, etc.

Generally, identity disclosure and link disclosure are discussed in all kinds of networks. However, attribute disclosure only may occur in the networks containing vertex-labelled personal information. Moreover, link disclosure can be considered as a special case of attribute disclosure, as the edges of a user with other users can be seen as a vector of binary attributes for the user. It should be mentioned that identity disclosure results also attribute disclosure, because identity disclosure occurs when the corresponding user be identified within a dataset (Kiabod, Dehkordi & Barekatain, 2019).

As mentioned above, KA clusters all users into different groups, each comprises at least K users. Therefore, each user cannot be distinguished from at least K-1 other users in the anonymized dataset, and consequently, users are preserved against identity attacks with probability of 1/K. The larger K, the more preservation against identity attacks. Although KA protects users from identity attacks, it cannot protect them against attribute/link disclosure attacks. Low diversity in the values of sensitive attributes may allow attribute disclosure attacks. In order to defend against the attribute attacks, each equivalent cluster should have at least one different value for each sensitive attribute. To achieve this purpose, LD makes random changes in the values of sensitive attributes to be diversified with at least L levels (L ≤ K). Therefore, the set of sensitive attributes of the users within each cluster have at least L different values. The larger L, the more protection against the attribute/link attacks. By applying LD, users are not yet protected against the similarity attacks. Distribution of data samples with similar diversity levels results different levels of privacy protection, due to semantic connection between the values of the sensitive attributes and their different sensitivity levels. Considering this issue, TC was introduced, which limits the distribution of the sensitive attributes within each cluster to be as close as possible to the global distribution (closer than T). The smaller T, the more protection against the similarity attacks. As a summary, identity, attribute/link, and similarity attacks can be defended by KA, LD, and TC, respectively. The main notations used in the paper are summarized in Table 1.

In this paper, a hybrid anonymizing method based on K-member Fuzzy Clustering (KFC) and Firefly Algorithm (FA), named KFCFA, is presented for privacy preserving in social networks. There are many attacks in online social networks (Yang, Zhu, Zhou & Xiang, 2016). In this paper, we focus on the attacks related to the social network data publishing for data mining purposes, i.e., identity disclosure, attribute/link disclosure, and similarity attacks. The main aim of the proposed KFCFA methodology is to protect the published data against identity, attribute/link, and similarity attacks, while minimizing the information loss and generating balanced clusters. The proposed KFCFA can be used for graph, matrix, and hybrid graph-matrix datasets. The anonymizing process at matrix level is performed through removing some features (feature selection) or changing the value of features of the different samples. On the other hand, at graph level, it is done by add or remove edges between different users. In the proposed methodology, at first, a modified fuzzy c-means (FCM) is performed to generate balanced clusters with at least K members in each cluster. Then, FA is utilized for further optimizing the generated clusters and anonymizing the graph and matrix, simultaneously. The main contributions in this paper can be summarized as follows:

•
We propose a hybrid technique based on K-member fuzzy clustering and FA (KFCFA) as an efficient solution for balanced clustering and anonymizing in social networks.
•
In KFCFA, at first, K-member fuzzy clustering creates balanced clusters. Then, FA is applied for further optimization of the clustering and anonymizing problems, together.
•
We introduce a modification of FCM to generate balanced clusters with at least K members within each cluster. It improves initial population of the FA to start from a set of near-optimal solutions which have the minimum clustering error and satisfy the KA condition.
•
The objective function of the FA is considered as a multi-objective function to minimize the ratio of intra-cluster distances to inter-cluster distances, minimize the average distortion rate within graph and matrix, and minimize imbalanced clusters.
•
The KFCFA protects the anonymized dataset against identity, attribute disclosure, and similarity attacks, by considering KA, LD and TC as three constraints.
•
In order to validate the performance of the KFCFA, it is compared with the existing techniques over four popular social network datasets: Facebook, Google+, Twitter and YouTube.

The rest of the paper is organized as follows: In Section 2, the existing anonymizing techniques are discussed. Section 3 provide the preliminaries and definitions. Section 4 introduces the proposed KFCFA anonymizing methodology. In Section 5, the KFCFA method is simulated and compared with the existing techniques. Finally, discussion and conclusion remarks are provided in Section 6.

Section snippets

Related works

Up-to-date anonymizing techniques work, with only minor exceptions, on the basis of the clustering-based KA algorithm. Privacy preserving in these techniques is performed by clustering input samples into separate clusters, each contains at least K members. Generally, data anonymity approaches can be categorized, according to the type of dataset, into graph- and matrix-based techniques (Kumar & Kumar, 2017). These methods are discussed in the following.

Basic definitions

Generally, features of a social network contain graph properties (edges between users) and matrix properties (personal attributes of users). Graph-based binary features indicate the presence or absence of relationships between users. Moreover, matrix-based features are divided into three categories (Rahimi et al., 2015):

•
Direct Identifier Attributes: include identifier features such as name, surname and national code, which can simply identify the user. These features should be eliminated before

Proposed KFCFA anonymizing methodology

It was shown that KA to find the optimal K-anonymous solution with the minimum information loss is an NP-Hard problem (Kim & Chung, 2017). Different heuristic and metaheuristic algorithms have been applied to find near optimal K-anonymous solutions (Byun et al., 2007; Honda et al., 2012; Kiabod et al., 2019; Rahimi et al., 2015; Sun et al., 2011). However, the anonymity constraints have not efficiently considered, and more importantly, there is no strategy to achieve the minimum distortion

Simulation settings

The proposed anonymization algorithm (KFCFA) has been successfully developed in MATLAB R2018b running on a personal computer with 2.6 GHz Core i7 processor and 16 GB memory. In order to evaluate the proposed KFCFA, its results are compared with four clustering-based anonymity techniques: K-Anonymity (KA) (Sweeney, 2002), P-sensitive KA (PKA) (Sun et al., 2011), K-member fuzzy KA (KFKA) (Honda et al., 2012), and T-closeness L-diversity K-anonymity 3 Layers (TLK3L) (Rahimi et al., 2015). Proper

Conclusion

In this paper, we have proposed a combined anonymization methodology based on fuzzy clustering and firefly algorithm (called KFCFA) for privacy preserving in social networks. In the proposed method, the anonymity process is done simultaneously at data level and graph level. An anonymous process at the data level is performed by deleting or modifying the attributes of the different users, while at the graph level, it is done by adding or removing the edges between the different users. In the

CRediT authorship contribution statement

Rohulla Kosari Langari: Conceptualization, Data curation, Formal analysis, Writing - original draft, Writing - review & editing. Soheila Sardar: Conceptualization, Data curation, Formal analysis, Writing - review & editing. Seyed Abdollah Amin Mousavi: Data curation, Formal analysis, Writing - original draft. Reza Radfar: Conceptualization, Data curation, Formal analysis, Writing - review & editing.

Declaration of Competing Interest

None.

Acknowledgments

All persons who have made substantial contributions to the work reported in the manuscript (e.g., technical help, writing and editing assistance, general support), but who do not meet the criteria for authorship, are named in the Acknowledgements and have given us their written permission to be named. If we have not included an Acknowledgements, then that indicates that we have not received substantial contributions from non-authors.

References (30)

F. Bonchi et al.
Identity obfuscation in graphs through the information theoretic lens
Information Sciences
(2014)
T.L. James et al.
A dual privacy decision model for online social networks
Information & management
(2015)
M. Kiabod et al.
TSRAM: A time-saving k-degree anonymization method in social network
Expert Systems with Applications
(2019)
S. Kumar et al.
Upper approximation based privacy preserving in online social networks
Expert Systems with Applications
(2017)
J.L. Lin et al.
Genetic algorithm-based clustering approach for k-anonymization
Expert Systems with Applications
(2009)
M. Shokouhifar et al.
Optimized sugeno fuzzy clustering algorithm for wireless sensor networks
Engineering Applications of Artificial Intelligence
(2017)
X. Sun et al.
Extended k-anonymity models against sensitive attribute disclosure
Computer Communications
(2011)
R. Wei et al.
Improving k-anonymity based privacy preservation for collaborative filtering
Computers & Electrical Engineering
(2018)
M. Yang et al.
Attacks and countermeasures in social network data publishing
ZTE Communications
(2016)
Z.M. Zahedi et al.
Swarm intelligence based fuzzy routing protocol for clustered wireless sensor networks
Expert Systems with Applications
(2016)

J.W. Byun et al.

Efficient k-anonymization using clustering techniques

J. Casas-Roma

Privacy-preserving on graphs using randomization and edge-relevance

J. Casas-Roma et al.

A survey of graph-modification techniques for privacy-preserving on networks

Artificial Intelligence Review

(2017)

S. Hartung et al.

Improved upper and lower bound heuristics for degree anonymization in social networks

M. Hay et al.

Anonymizing social networks

(2007)

Cited by (59)

A negative survey based privacy preservation method for topology of social networks
2023, Applied Soft Computing
With the development of social platforms, the social network has aroused wide attention. Since social networks contain a lot of personal sensitive information, many privacy preservation methods have been designed for social networks to allay concerns about privacy disclosure of people. However, most of the existing methods disturb the social networks too much to ensure the utility of social networks. To this end, we propose a negative survey based privacy preservation method, called NetNS, to preserve the topology privacy of social networks, where a dedicated negative survey model is developed to disturb edges in social networks in order to preserve the topology privacy of them. The theoretical analysis indicates that the developed NetNS is efficient, and can resist two common graph structure attacks including friendship attack and subgraph attack, while empirical studies conducted on three real-world social networks show that compared to six existing privacy preservation algorithms tailored for the topology of social networks, the developed NetNS can provide disturbed social networks with better utility while achieving same privacy preservation level.
Fuzzy sign-aware diffusion models for influence maximization in signed social networks
2023, Information Sciences
The diffusion models in the influence maximization problem are a trending topic in many companies' viral marketing to raise their business promotion. The majority of existing diffusion models have focused only on trust relationships, and a few models have considered distrust relationships. Nevertheless, the latter models lack appropriate theoretical support to cover social influences resulting from different user-relationship types and still do not provide proper predictions. In this study, a fuzzy-based approach is first introduced to model the influence propagation for different user-relationship types. Then, four novel fuzzy sign-aware diffusion models named FSC-SB, FSC-N, FST-SB, and FST-N are proposed by the introduced fuzzy-based approach in two categories: cascade and threshold-based models. In the proposed models, the user-relationship type is determined by a fuzzy expert system in which a natural multi-trust level relationship is applied instead of a commonly used crisp relationship. Moreover, new rules and equations are defined to determine a user's state by information received from its active neighbors. The performance of proposed models was compared with some state-of-the-art models conducted by two real-world networks, Bitcoin OTC and Bitcoin Alpha. The experimental results showed that the proposed models enhance the prediction accuracy and make effective decisions in viral marketing.
An optimization based framework for region wise optimal clusters in MR images using hybrid objective
2023, Neurocomputing
Swarm Intelligence based methods are amongst the highly efficient approaches for optimization in image clustering. Optimal clustering has been studied in many real-world applications, such as medical and aerial image segmentation. Region wise clustering is a class of challenges in image region segmentation. Uncertain convergence and high computational load are critical issues in the region-wise image clustering due to local optimum and the NP-hard cluster computation. Meta-heuristics approaches are efficient to achieve global optimum by including better search space exploration techniques. This paper develops a framework for cluster optimization by selecting the seeds in pathological medical resonance (MR) images using a variant of firefly optimization. The heuristics based method uses Gaussian random walk for convergence that occasionally results in local optima; therefore, we have investigated the firefly method with more search space exploration techniques and improved region-wise objective. Our framework applies the levy flights for exploration and compared with other search spaces like Cauchy, and Gaussian random walk. The intra-cluster and inter-cluster-based hybrid objective is converged swiftly. The framework has been compared with two of its variants and three other meta-heuristic-based methods, namely simulated annealing, PSO, and Cuckoo search. The MSE, PSNR, structural similarity(SSIM), and feature similarity(FSIM) based evaluation indices are measured for normal and abnormal MR images and listed in table-5, 6. Reported indices values for our frame work are better than existing methods. Figure-9, 10, 11 compares the stochastic search spaces among Levy flights, Cauchy random walk, Gaussian random walk and observed Levy flights as better search space. In section-3.5, The convergence for proposed framework is shown for multi-objective function against the single objective in two normal and abnormal images. Single objective converged from 188 to 119 and multi-objective converged from 196 to 117 for first image. For second image, the single objective converged from 168 to 94 and multi-objective converged from 183 to 93. Finally, we have illustrated the convergence criteria and computation complexity on publicly available MR data sets.
Application-specific clustering in wireless sensor networks using combined fuzzy firefly algorithm and random forest
2022, Expert Systems with Applications
Clustering in wireless sensor networks (WSNs) has proved to be one of the most efficient ways to hierarchically organize the network topology for the purposes of load-balancing and elongating the network lifetime. However, achieving optimal clustering in WSNs is an NP-hard problem, and consequently, heuristics and metaheuristics have been widely adopted. In this paper, a combined clustering technique based on a fuzzy-firefly algorithm (FFA) and random forest (RF), shortly named as: FFA-RF, is presented as an application-specific routing protocol for WSNs. Our FFA-RF protocol entails offline tuning and online routing phases: the offline phase consists of data collection using FFA, training and test of the RF, while the online phase is the actual application of the FFA-RF model to new network instances. In the offline phase, we construct an optimized fuzzy inference system based on FFA and apply it on different network topologies, to collect a comprehensive dataset. We then divide the resulting dataset into training and test sets to train and test the RF model. In the online phase, the trained RF model is used as an online clustering algorithm to estimate the fuzzy priority factor of the nodes for being cluster heads (CHs) in new network instances. To increase the generalizability of the RF for different configurations, node features as well as application features are used as inputs of the RF model. Simulation results for different network topologies demonstrate the superiority of the proposed FFA-RF protocol in prolonging the application-specific lifetime when compared against existing crisp heuristic, fuzzy heuristic, metaheuristic, and combined fuzzy-metaheuristic protocols.
Predicting crop yields using a new robust Bayesian averaging model based on multiple hybrid ANFIS and MLP models: Predicting crop yields using a new robust Bayesian averaging model
2022, Ain Shams Engineering Journal
Predicting crop yield is an important issue for farmers. Food security is important for decision-makers. The agriculture industry can more accurately supply human demand for food if the crop yield is predicted accurately. Tomato is one of the most important crops so that 160 million tonnes of tomatoes are produced annually around the world. In this study, tomato yield based on data of 40 cities of Iran country including annual average temperature (T), relative humidity (RH), effective rainfall (R), wind speed (WS), and Evapotranspiration (EV) for the period of 1968–2018 was predicted using a new Bayesian model averaging (BMA). The paper's main innovation is the use of the new BMA so that it allows the modellers to quantify the uncertainty of model parameters and inputs simultaneously. For this aim, first, the multiple Adaptive neuro-fuzzy interface system (ANFIS) and multi-layer perceptron (MLP) were used for predicting crop yield. To train the ANFIS and MLP model, a new algorithm, namely, multi verse optimization algorithm (MOA) was used. Also, the ability of MOA was benchmarked against the particle swarm optimization (PSO), and firefly algorithm (FFA). In the next level, the new BMA used the outputs of the ANFIS-MOA, MLP-MOA, ANFIS, FFA, MLP-FFA, ANFIS-PSO, MLP-PSO, ANFIS, and MLP for predicting tomato yield in an ensemble framework. The five- input combination of RH, T, and R, WS, and EV gave the best result. The mean absolute error (MAE) of the BMA in the testing level was 20.12 (Ton/ha) while it was 24.12, 24.45, 24.67, 25.12, 29.12, 30.12, 31.12, and 33.45 for the ANFIS-MOA, MLP-MOA, ANFIS-FFA, MLP-FFA, ANFIS-PSO, MLP-PSO, ANFIS, and MLP models. Regarding the results of uncertainty analysis, the uncertainty of BMA was lower than those of the ANFIS-MOA, MLP-MOA, ANFIS-FFA, MLP-FFA, ANFIS-PSO, MLP-PSO, ANFIS, and MLP models while the MLP model provided the highest uncertainty. The results of this study indicated that BMA using multiple MLP and ANFIS model was useful for predicting tomato yield.
Multi-strategy firefly algorithm with selective ensemble for complex engineering optimization problems
2022, Applied Soft Computing
Nowadays, more and more optimization techniques are used to deal with complex engineering optimization problems. Firefly algorithm (FA) inspired by the flash communication between fireflies, has been proven to be competitive with other swarm intelligence algorithms and has been widely applied to solve complex engineering optimization problems. However, FA has some defects in dealing with complex engineering optimization problems, such as the exploration and exploitation cannot be well balanced. Therefore, in order to achieve effective performance, the different characteristics of search strategies can be applied at different stages of the search process to achieve a balance between exploration and exploitation. In this paper, a multi-strategy firefly algorithm with selective ensemble (MSEFA) is proposed. In MSEFA, the algorithm has three novel search strategies with different characteristics in the strategy pool. In addition, an idea of selective ensemble is adopted to design a priority roulette selection method. The method can select suitable search strategies in different search stages and coordinate the balance of strategies so that better results can be obtained. Furthermore, a parameter adaptive transformation mechanism is designed to control the decreasing rate of step size $α$ . To verify the effectiveness of MSEFA, performance tests are conducted on the CEC 2013 and CEC 2019 test suites, after which MSEFA is used to solve four complex engineering optimization problems. Experimental results show that MSEFA has the best performance compared with other FA variants and other improved swarm intelligence algorithms. In addition, MSEFA also achieves the best results in dealing with four complex engineering optimization problems.

View all citing articles on Scopus

View full text

Combined fuzzy clustering and firefly algorithm for privacy preserving in social networks

Highlights

Abstract

Introduction

Section snippets

Related works

Basic definitions

Proposed KFCFA anonymizing methodology

Simulation settings

Conclusion

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgments

Information Sciences

Information & management

Expert Systems with Applications

Expert Systems with Applications

Expert Systems with Applications

Engineering Applications of Artificial Intelligence

Computer Communications

Computers & Electrical Engineering

ZTE Communications

Expert Systems with Applications

Efficient k-anonymization using clustering techniques

Privacy-preserving on graphs using randomization and edge-relevance

A survey of graph-modification techniques for privacy-preserving on networks

Artificial Intelligence Review

Improved upper and lower bound heuristics for degree anonymization in social networks

Anonymizing social networks