Combined fuzzy clustering and firefly algorithm for privacy preserving in social networks
Introduction
With the advancement of technology, online social networks have attracted many attentions from the people all around the world to socialize themselves, interact with their friends, seek various information about their surroundings, express them to others, and also to meet social expectations in the society (James, Warkentin & Collignon, 2015). Social networks contain huge amounts of information about their users. Although the stored information can be beneficial to improve the quality of services to the users, it may also endanger the privacy of users. This is due to existence of sensitive information about users in the social networks. Therefore, social network users want to maintain the privacy of their shared data. The companies or data-owner also want to maintain the privacy of individuals, but at the same time, they want to analyze the social network data for their own benefits. The sharing of data by the data-owner to the data-miner could have multiple business or scientific objectives (Kumar & Kumar, 2017). The major challenge in sharing the social network data with the data-miner is maintaining the individual privacy of users while retaining the implicit knowledge embedded in the social network. Therefore, there is a need of anonymizing the social network data (graph and matrix) before releasing it to data-miner (Zhou, Pei & Luk, 2008).
Generally, in the anonymizing techniques, some edges of the graph or attributes of the data matrix are changed to satisfy protecting against the determined attacks. The anonymized graph/matrix should retain as much information as possible of the original graph/matrix (Lin & Wei, 2009). Various clustering-based techniques based on K-anonymity (KA) (Sweeney, 2002) and its extensions, L-diversity (LD) (Machanavajjhala, Gehrke, Kifer & Venkitasubramaniam, 2006) and T-closeness (TC) (Li, Li & Venkatasubramanian, 2007) have been proposed for graph/matrix anonymizing. The goal of K-anonymity is to cluster all samples into different clusters with at least K samples in each cluster. This technique preserves data samples against identity attacks with probability of 1/K.
Regarding the privacy preserving in social networks data publishing, privacy threats can be classified into three main categories (Casas-Roma, Herrera-Joancomartí & Torra, 2017; Zhou et al., 2008):
- •
Identity disclosure occurs when the identity of a user is revealed. It includes sub-categories such as vertex existence, vertex properties and graph metrics.
- •
Attribute disclosure seeks not to identify a user, but to reveal the sensitive attributes of the user. In this case, the sensitive data vector associated with each user is compromised.
- •
Link disclosure occurs when sensitive relationship (link) between two users is disclosed. Depending on network characteristics, it can be refined as link relationships, sensitive edge labels, etc.
Generally, identity disclosure and link disclosure are discussed in all kinds of networks. However, attribute disclosure only may occur in the networks containing vertex-labelled personal information. Moreover, link disclosure can be considered as a special case of attribute disclosure, as the edges of a user with other users can be seen as a vector of binary attributes for the user. It should be mentioned that identity disclosure results also attribute disclosure, because identity disclosure occurs when the corresponding user be identified within a dataset (Kiabod, Dehkordi & Barekatain, 2019).
As mentioned above, KA clusters all users into different groups, each comprises at least K users. Therefore, each user cannot be distinguished from at least K-1 other users in the anonymized dataset, and consequently, users are preserved against identity attacks with probability of 1/K. The larger K, the more preservation against identity attacks. Although KA protects users from identity attacks, it cannot protect them against attribute/link disclosure attacks. Low diversity in the values of sensitive attributes may allow attribute disclosure attacks. In order to defend against the attribute attacks, each equivalent cluster should have at least one different value for each sensitive attribute. To achieve this purpose, LD makes random changes in the values of sensitive attributes to be diversified with at least L levels (L ≤ K). Therefore, the set of sensitive attributes of the users within each cluster have at least L different values. The larger L, the more protection against the attribute/link attacks. By applying LD, users are not yet protected against the similarity attacks. Distribution of data samples with similar diversity levels results different levels of privacy protection, due to semantic connection between the values of the sensitive attributes and their different sensitivity levels. Considering this issue, TC was introduced, which limits the distribution of the sensitive attributes within each cluster to be as close as possible to the global distribution (closer than T). The smaller T, the more protection against the similarity attacks. As a summary, identity, attribute/link, and similarity attacks can be defended by KA, LD, and TC, respectively. The main notations used in the paper are summarized in Table 1.
In this paper, a hybrid anonymizing method based on K-member Fuzzy Clustering (KFC) and Firefly Algorithm (FA), named KFCFA, is presented for privacy preserving in social networks. There are many attacks in online social networks (Yang, Zhu, Zhou & Xiang, 2016). In this paper, we focus on the attacks related to the social network data publishing for data mining purposes, i.e., identity disclosure, attribute/link disclosure, and similarity attacks. The main aim of the proposed KFCFA methodology is to protect the published data against identity, attribute/link, and similarity attacks, while minimizing the information loss and generating balanced clusters. The proposed KFCFA can be used for graph, matrix, and hybrid graph-matrix datasets. The anonymizing process at matrix level is performed through removing some features (feature selection) or changing the value of features of the different samples. On the other hand, at graph level, it is done by add or remove edges between different users. In the proposed methodology, at first, a modified fuzzy c-means (FCM) is performed to generate balanced clusters with at least K members in each cluster. Then, FA is utilized for further optimizing the generated clusters and anonymizing the graph and matrix, simultaneously. The main contributions in this paper can be summarized as follows:
- •
We propose a hybrid technique based on K-member fuzzy clustering and FA (KFCFA) as an efficient solution for balanced clustering and anonymizing in social networks.
- •
In KFCFA, at first, K-member fuzzy clustering creates balanced clusters. Then, FA is applied for further optimization of the clustering and anonymizing problems, together.
- •
We introduce a modification of FCM to generate balanced clusters with at least K members within each cluster. It improves initial population of the FA to start from a set of near-optimal solutions which have the minimum clustering error and satisfy the KA condition.
- •
The objective function of the FA is considered as a multi-objective function to minimize the ratio of intra-cluster distances to inter-cluster distances, minimize the average distortion rate within graph and matrix, and minimize imbalanced clusters.
- •
The KFCFA protects the anonymized dataset against identity, attribute disclosure, and similarity attacks, by considering KA, LD and TC as three constraints.
- •
In order to validate the performance of the KFCFA, it is compared with the existing techniques over four popular social network datasets: Facebook, Google+, Twitter and YouTube.
The rest of the paper is organized as follows: In Section 2, the existing anonymizing techniques are discussed. Section 3 provide the preliminaries and definitions. Section 4 introduces the proposed KFCFA anonymizing methodology. In Section 5, the KFCFA method is simulated and compared with the existing techniques. Finally, discussion and conclusion remarks are provided in Section 6.
Section snippets
Related works
Up-to-date anonymizing techniques work, with only minor exceptions, on the basis of the clustering-based KA algorithm. Privacy preserving in these techniques is performed by clustering input samples into separate clusters, each contains at least K members. Generally, data anonymity approaches can be categorized, according to the type of dataset, into graph- and matrix-based techniques (Kumar & Kumar, 2017). These methods are discussed in the following.
Basic definitions
Generally, features of a social network contain graph properties (edges between users) and matrix properties (personal attributes of users). Graph-based binary features indicate the presence or absence of relationships between users. Moreover, matrix-based features are divided into three categories (Rahimi et al., 2015):
- •
Direct Identifier Attributes: include identifier features such as name, surname and national code, which can simply identify the user. These features should be eliminated before
Proposed KFCFA anonymizing methodology
It was shown that KA to find the optimal K-anonymous solution with the minimum information loss is an NP-Hard problem (Kim & Chung, 2017). Different heuristic and metaheuristic algorithms have been applied to find near optimal K-anonymous solutions (Byun et al., 2007; Honda et al., 2012; Kiabod et al., 2019; Rahimi et al., 2015; Sun et al., 2011). However, the anonymity constraints have not efficiently considered, and more importantly, there is no strategy to achieve the minimum distortion
Simulation settings
The proposed anonymization algorithm (KFCFA) has been successfully developed in MATLAB R2018b running on a personal computer with 2.6 GHz Core i7 processor and 16 GB memory. In order to evaluate the proposed KFCFA, its results are compared with four clustering-based anonymity techniques: K-Anonymity (KA) (Sweeney, 2002), P-sensitive KA (PKA) (Sun et al., 2011), K-member fuzzy KA (KFKA) (Honda et al., 2012), and T-closeness L-diversity K-anonymity 3 Layers (TLK3L) (Rahimi et al., 2015). Proper
Conclusion
In this paper, we have proposed a combined anonymization methodology based on fuzzy clustering and firefly algorithm (called KFCFA) for privacy preserving in social networks. In the proposed method, the anonymity process is done simultaneously at data level and graph level. An anonymous process at the data level is performed by deleting or modifying the attributes of the different users, while at the graph level, it is done by adding or removing the edges between the different users. In the
CRediT authorship contribution statement
Rohulla Kosari Langari: Conceptualization, Data curation, Formal analysis, Writing - original draft, Writing - review & editing. Soheila Sardar: Conceptualization, Data curation, Formal analysis, Writing - review & editing. Seyed Abdollah Amin Mousavi: Data curation, Formal analysis, Writing - original draft. Reza Radfar: Conceptualization, Data curation, Formal analysis, Writing - review & editing.
Declaration of Competing Interest
None.
Acknowledgments
All persons who have made substantial contributions to the work reported in the manuscript (e.g., technical help, writing and editing assistance, general support), but who do not meet the criteria for authorship, are named in the Acknowledgements and have given us their written permission to be named. If we have not included an Acknowledgements, then that indicates that we have not received substantial contributions from non-authors.
References (30)
- et al.
Identity obfuscation in graphs through the information theoretic lens
Information Sciences
(2014) - et al.
A dual privacy decision model for online social networks
Information & management
(2015) - et al.
TSRAM: A time-saving k-degree anonymization method in social network
Expert Systems with Applications
(2019) - et al.
Upper approximation based privacy preserving in online social networks
Expert Systems with Applications
(2017) - et al.
Genetic algorithm-based clustering approach for k-anonymization
Expert Systems with Applications
(2009) - et al.
Optimized sugeno fuzzy clustering algorithm for wireless sensor networks
Engineering Applications of Artificial Intelligence
(2017) - et al.
Extended k-anonymity models against sensitive attribute disclosure
Computer Communications
(2011) - et al.
Improving k-anonymity based privacy preservation for collaborative filtering
Computers & Electrical Engineering
(2018) - et al.
Attacks and countermeasures in social network data publishing
ZTE Communications
(2016) - et al.
Swarm intelligence based fuzzy routing protocol for clustered wireless sensor networks
Expert Systems with Applications
(2016)
Efficient k-anonymization using clustering techniques
Privacy-preserving on graphs using randomization and edge-relevance
A survey of graph-modification techniques for privacy-preserving on networks
Artificial Intelligence Review
Improved upper and lower bound heuristics for degree anonymization in social networks
Anonymizing social networks
Cited by (59)
A negative survey based privacy preservation method for topology of social networks
2023, Applied Soft ComputingFuzzy sign-aware diffusion models for influence maximization in signed social networks
2023, Information SciencesApplication-specific clustering in wireless sensor networks using combined fuzzy firefly algorithm and random forest
2022, Expert Systems with ApplicationsMulti-strategy firefly algorithm with selective ensemble for complex engineering optimization problems
2022, Applied Soft Computing