1 Introduction

Recommender Systems (RS) are nowadays considered as the pivotal technical solution to assist users’ decision-making process. They are gaining momentum as the overwhelming volume of products, services, and multimedia contents on the Web has made the users’ choices more difficult. Among them, Collaborative filtering (CF) approaches have shown very high performance in real-world applications (e.g., Amazon [26]). Their key insight is that users prefer products experienced by similar users and then, from an algorithmic point of view, they mainly rely on the exploitation of user-user and item-item similarities. Unfortunately, malicious users may alter similarity values. Indeed, these similarities are vulnerable to the insertion of fake profiles. The injection of such manipulated profiles, named shilling attack [20], aims to push or nuke the probabilities of items to be recommended.

Recently, several works have proposed various types of attacks, classified into two categories [9]: low-knowledge and informed attack strategies. In the former attacks, the malicious user (or adversary) has poor system-specific knowledge [25, 28]. In the latter, the attacker has precise knowledge of the attacked recommendation model and the data distribution [12, 25].

Interestingly, the astonishing spread of knowledge graphs (\(\mathcal {KG}\)) may suggest new knowledge-aware strategies to mine the security of RS. In a Web mainly composed of unstructured information, \(\mathcal {KG}\) are the foundation of the Semantic Web. They are becoming increasingly important as they can represent data exploiting a manageable and inter-operable semantic structure. They are the pillars of well-known tools like IBM Watson [7], public decision-making systems [34], and advanced machine learning techniques [2, 4, 13]. Thanks to the Linked Open Data (LOD) initiativeFootnote 1, we have witnessed the growth of a broad ecosystem of linked data datasets known as LOD-cloudFootnote 2. These \(\mathcal {KG}\) contain detailed information about several domains. In fact, if a malicious user would attack one of these domains, items’ semantic descriptions would be priceless.

The main contributions envisioned in the present work is to study the possibility of leveraging semantic-encoded information with the goal to improve the efficacy of an attack in favor/disfavor of (a) given target item(s). Particularly, one of the features distinguishing this work from previous ones is that it exploits publicly available information resources obtained from \(\mathcal {KG}\) to generate more influential fake profiles that are able to undermine the performance of CF models. This attack strategy is named semantic-aware shilling attack SAShA and extends state-of-the-art shilling attack strategies such as Random, Love-hate, and Average based on the gathered semantic knowledge. It is noteworthy that the extension we propose solely relies on publicly available information and does not provide to the attacker any additional information about the system.

In this work, we aim at addressing the following research questions:

RQ1 :

Can public available semantic information be exploited to develop more effective shilling attack strategies against CF models, where the effectiveness is measured in terms of overall prediction shift and overall hit ratio?

RQ2 :

Can we assess which is the most impactful type of semantic information? Is multiple hops extraction of semantic-features from a knowledge graph more effective than single-hop features?

To this end, we have carried out extensive experiments to evaluate the impact of the proposed SAShA against standard CF model using two real-world recommender systems datasets (LibraryThing and Yahoo!Movies). Experimental results indicate that \(\mathcal {KG}\) information is a rich source of knowledge that can in fact worryingly improve the effectiveness of attacks.

The remainder of the paper is organized as follows. In Sect. 2, we analyze the state-of-the-art of CF models as well as shilling attacks. In Sect. 3, we describe the proposed approach (SAShA). Section 4 focuses on experimental validation of the proposed attacks scenarios, where we provide a discussion of the experimental results. Finally, in Sect. 5, we present conclusions and introduce open challenges.

2 Related Work

In this Section, we focus on related literature on recommender systems and state-of-the-art of attacks on collaborative recommender models.

2.1 Recommender Systems (RSs)

Recommendation models can be broadly categorized as content-based filtering (CBF), collaborative filtering (CF) and hybrid. On the one hand, CBF uses items’ content attributes (features) together with target user’s own interactions in order to create a user profile characterizing the nature of her interest(s). On the other hand, CF models generate recommendation by solely exploiting the similarity between interaction patterns of users. Today, CF models are the mainstream of academic and industrial research due to their state-of-the-art recommendation quality particularly when sufficient amount of interaction data—either explicit (e.g., rating scores) or implicit (previous clicks, check-ins etc.)—are available. Various CF models developed today can be classified into two main groups: memory-based and model-based. While memory-based models make recommendations exclusively based on similarities in user’s interactions (user-based CF [23, 32]) or items’ interactions (item-based CF [23, 33]), model-based approaches compute a latent representation of items and users [24], whose linear interaction can explains an observed feedback. Model-based approaches can be implemented by exploiting different machine learning techniques. Among them, matrix factorization (MF) models play a paramount role.

It should be noted, that modern RS nowadays may exploit a variety of side information such as metadata (tags, reviews) [29], social connections [6], image and audio signal features [14] and users-items contextual data [3] to build more in-domain (i.e., domain-dependent) or context-aware recommendations models. \(\mathcal {KG}\) are another rich source of information that have gained increased popularity in the community of RS for building knowledge-aware recommender systems (KARS). These models can be classified into: (i) path-based methods [19, 37], which use meta-paths to evaluate the user-item similarities and, (ii) \(\mathcal {KG}\) embedding-based techniques, that leverages \(\mathcal {KG}\) embeddings to semantically regularize items latent representations [16, 21, 35]. More recently, \(\mathcal {KG}\) have also been used to support the reasoning and explainability of recommendations [5, 36].

For the simplicity of the presentation, in this work we step our attention aside (shilling attacks against) CF models leveraging these side information for the core task of recommendation, and leave it for an extension in future works. We do however make a fundamental assumption in all considered scenarios that the “attacker can have access to \(\mathcal {KG}\), given their free accessibility and use them to shape more in-domain attacks.”

2.2 Shilling Attacks on Recommender System

Despite the widespread application of customer-oriented CF models by online services adopted to increase their traffic and promote sales, the reliance of these models on the so-called “word-of-mouth” (i.e., what other people like and dislike), makes them at the same time vulnerable to meticulously crafted profiles that aim to alter distribution of ratings so to misuse this dependency toward a particular (malicious) purpose. The motivation for such shilling attacks can be many unfortunately, including personal gain, market penetration by rival companies [25], malicious reasons and even causing complete mischief on an underlying system [20].

In the literature, one standard way to classify these shilling attacks is based on the intent and amount of knowledge required to perform attacks. According to the intent, generally attacks are classified as push attacks that aim to increase the appeal of some targeted items, and nuke items, which conversely aim to lower the popularity of some targeted items. As for the knowledge level, they can be categorized according to low-knowledge attacks and informed attack strategies. Low-knowledge attacks require little or no knowledge about the rating distribution [25, 28], while, informed attacks assume adversaries with knowledge on dataset rating distribution, which use this knowledge to generate effective fake profiles for shilling attacks [25, 30].

A large body of research work has been devoted on studying shilling attacks from multiple perspectives: altering the performance of CF models [12, 15, 25], implementation attack detection policies [8, 11, 38] and building robust recommendation models against attacks [28, 30]. Regardless, a typical characteristic of the previous literature on shilling attack strategies is that they usually target the relations between users, and items, based on similarities scores estimated on their past feedback (e.g., ratings). However, these strategies do not consider the possibility of exploiting publicly available \(\mathcal {KG}\) to gain more information on the semantic similarities between the items available in the RS catalogue. Indeed, considering that product or service providers’ catalogues are freely accessible to everyone, this work presents a novel attack strategy that exploits a freely accessible knowledge graph (DBpedia) to assess if attacks based on semantic similarities between items are more effective than baseline versions that rely only on rating scores of users.

3 Approach

In this section, we describe the development of a novel method for integrating information obtained from a knowledge graph into the design of shilling attacks against targeted items in a CF system. We first introduce the characteristics of \(\mathcal {KG}\) in Sect. 3.1. Afterwards, we present the proposed semantic-aware extensions to variety of popular shilling attacks namely: Random, Love-Hate, and Average attacks in Sect. 3.2.

3.1 Knowledge Graph: Identification of Content from \(\mathcal {KG}\)

A knowledge graph can be seen as a structured repository of knowledge, represented in the form a graph, that can encode different types of information:

  • Factual. General statements as Rika Dialina was born in Crete or Heraklion is the capital of Crete where we describe an entity by its attributes which are in turn connected to other entities (or literal values);

  • Categorical. These statements bind the entity to a specific category (i.e., the categories associated to an article in Wikipedia pages). Often, categories are part of a hierarchy. The hierarchy lets us define entities in a more generic or specific way;

  • Ontological. We can classify entities in a more formal way using a hierarchical structure of classes. In contrast to categories, sub-classes and super-classes are connected through IS-A relations.

In a knowledge graph we can represent each entity through the triple structure \(\sigma \xrightarrow {\rho } \omega \), with a subject (\(\sigma \)), a relation (predicate) \(\rho \) and an object (\(\omega \)). Among the multiple ways to represent features coming from a knowledge graph, we have chosen to represent each distinct triple as a single feature [5]. Hence, given a set of items \(I = \{i_1,i_2,\ldots ,i_N \}\) in a collection and the corresponding triples \(\langle i, \rho , \omega \rangle \) in a knowledge graph, we can build the set of 1-hop features as \(1\text {-}HOP\text {-}F = \{\langle \rho , \omega \rangle \mid \langle i, \rho , \omega \rangle \in \mathcal {KG} \text { with } i\in I \}\).

In an analogous way we can identify 2-hop features. Indeed, we can continue exploring \(\mathcal {KG}\) by retrieving the triples \(\omega \xrightarrow {\rho ^{\prime }} \omega ^{\prime }\), where \(\omega \) is the object of a 1-hop triple and the subject of the new triple. Here, the double-hop relation (predicate) is denoted by \(\rho ^{\prime }\) while the new object is referred as (\(\omega ^{\prime }\)). Hence, we define the overall feature set as \(2\text {-}HOP\text {-}F = \{\langle \rho , \omega , \rho ^{\prime }, \omega ^{\prime } \rangle \mid \langle i, \rho , \omega , \rho ^{\prime }, \omega ^{\prime } \rangle \in \mathcal {KG} \text { with } i\in I \}\). With respect to the previous classification of different types of information in a knowledge graph, we consider a 2-hop feature as Factual if and only if both relations (\(\rho \), and \(\rho ^{\prime }\)) are Factual. The same holds for the other types of encoded information.

3.2 Strategies for Attacking a Recommender System

A shilling attack against a recommendation model is based on a set of fake profiles meticulously created by the attacker and inserted into the system. The ultimate goal is to alter recommendation in favor of (push scenario) or organist (nuke scenario) a single target item \(i_{t}\). In this work, we focus on the push attack scenario but everything can be reused also in case of a nuke one. The fake user profile (attack profile) follows the general structure proposed by Bhaumik [8] shown in Fig. 1. It is built up of a rating vector of dimensionality N where N is the entire items in the collection (\(N = |I_S| + |I_F| + |I_{\emptyset }| + |I_{T}|\)). The profile is subdivided into four non-overlapping segments:

Fig. 1.
figure 1

General form of a fake user profile

  • \(I_{T}\): This is the target item for which a rating score will be predicted by the recommendation model. Often, this rating is assigned to be the maximum or minimum possible score based on the attack goal (push or pull).

  • \(I_{\emptyset }\): This is the unrated item set, i.e., items that will not contain any ratings in the profile.

  • \(I_{F}\): The filler item set. These are items for which rating scores will be assigned specific to each attack strategy.

  • \(I_{S}\): The selected item set. These items are selected in the case of informed attack strategies, which exploit attacker’s knowledge to maximize the attack impact, for instance by selecting items with the higher number of ratings.

The ways \(I_{S}\) and \(I_{F}\) are chosen depend on the attack strategy. The attack size is defined as the number of injected fake user profiles. Hereafter, \(\phi = |I_F|\) indicates the filler size, \(\alpha = |I_S|\) the selected item set size and \(\chi = |I_\emptyset |\) is the size of unrated items. In this paper, we focus our attention on the selection process of \(I_{F}\) since \(I_{S}\) is built by exploiting the attacker’s knowledge of the data distribution.

Semantic-Aware Shilling Attack Strategies (SAShA). While previous work on RS has investigated the impact of different standard attack models on CF system, in this work, we propose to strengthen state-of-the-art strategies via the exploitation of semantic similarities between items.

This attack strategy generates fraudulent profiles by exploiting \(\mathcal {KG}\) information to fill \(I_{F}\). The key idea is that we can compute the semantic similarity of the target item \(i_{t}\) with all the items in the catalog using \(\mathcal {KG}\)-derived features. Then, we use this information to select the filler items of each profile to generate the set \(I_F\).

The insight of our approach is that a similarity value based on semantic features leads to more natural and coherent fake profiles. These profiles are indistinguishable from the real ones, and they effortlessly enter the neighborhood of users and items. In order to compute the semantic similarity between items, in our experimental evaluation, we exploit the widely adopted Cosine Vector Similarity [17].

To test our semantic-aware attacks to recommender systems, we propose three original variants of low-knowledge and informed attack strategies: random attack, love-hate attack, and s average attack.

  • Semantic-aware Random Attack (SAShA-random) is an extension of Random Attack [25]. The baseline version is a naive attack in which each fake user is composed only of random items (\(\alpha = 0, \phi =\) profile-size). The fake ratings are sampled from all items using a uniform distribution. We modify this attack by changing the set to extract the items. In detail, we extract items to fill \(I_{F}\) from a subset of items that are most similar to \(i_{t}\). We compute the item-item Cosine Similarity using the semantic features as introduced in Sect. 3.1. Then, we build a set of most-similar items, considering the first quartile of similarity values. Finally, we extract \(\phi \) items from this set, adopting a uniform distribution.

  • Semantic-aware Love-Hate Attack (SAShA-love-hate) is a low-knowledge attack that extends the standard Love-Hate attack [28]. This attack randomly extracts filler items \(I_{F}\) from the catalog. All these items are associated with the minimum possible rating value. The Love-Hate attack aims to reduce the average rating of all the platform items but the target item. Indeed, even though the target item is not present in the fake profiles, its relative rank increases. We have re-interpreted the rationale behind the Love-Hate attack by taking into account the semantic description of the target item and its similarity with other items within the catalogue. In this case, we extract items to fill \(I_{F}\) from the 2nd, 3rd, and 4th quartiles. As in the original variant, the rationale is to select the most dissimilar items.

  • Semantic-aware Average Attack (SAShA-average) is an informed attack that extends the AverageBots attack [28]. The baseline attack takes advantage of the mean and the variance of the ratings. Then, it randomly samples the rating of each filler item from a normal distribution built using the previous mean and variance. Analogously to SAShA-random, we extend the baseline by extracting the filler items from the sub-set of most similar items. We use as candidate items the ones in the first quartile regarding their similarity with \(i_{t}\).

4 Experimental Evaluation

This section is devoted to comparing the proposed approaches against baseline attack strategies. We first introduce the experimental setup, where we present the two well-known datasets for recommendation scenarios. Then, we describe the feature extraction and selection procedure we have adopted to form semantic-aware shilling attacks. Finally, we detail the three canonical CF models we have analyzed. We have carried extensive experiments intented to answer the research questions in Sect. 1. In particular, we aim to assess: (i) whether freely available semantic knowledge can help to generate stronger shilling attacks; (ii) if \(\mathcal {KG}\) features types have a different influence on SAShA effectiveness; (iii) what is the most robust CF-RS against SAShA attacks.

4.1 Experimental Setting

Datasets. In the experiments, we have exploited two well-known datasets with explicit feedbacks to simulate the process of a recommendation engine: LibraryThing  [18] and Yahoo!Movies. The first dataset is derived from the social cataloging web application LibraryThingFootnote 3 and contains ratings ranging from 1 to 10. To speed up the experiments, we have randomly sampled with a uniform distribution the \(25\%\) of the original items in the dataset. Moreover, in order to avoid cold situations (which are usually not of interest in attacks to recommender systems) we removed users with less than five interactions. The second dataset contains movie ratings collected on Yahoo!MoviesFootnote 4 up to November 2003. It contains ratings ranging from 1 to 5, and mappings to MovieLens and EachMovie datasets. For both datasets, we have used the items-features sets \(1\text {-}HOP\text {-}F\) and \(2\text {-}HOP\text {-}F\) extracted from DBpedia by exploiting mappings which are publicly available at https://github.com/sisinflab/LinkedDatasets. We show datasets statistics in Table 1.

Table 1. Datasets statistics.

Feature Extraction. We have extracted the semantic information to build SAShA exploiting the public available item-entity mapping to DBpedia. We did not consider noisy features containing the following predicates: owl:sameAs, dbo:thumbnail, foaf:depiction, prov:wasDerivedFrom, foaf:isPrimaryTopicOf, as suggested in [5].

Feature Selection. To analyze the impact of different feature types, we have performed experiments considering categorical (CS), ontological (OS) and factual (FS) features. We have chosen to explore those classes of features since they are commonly adopted in the community [5]. For the selection of single-hop (1H) features, the employed policies are:

  • CS-1H, we select the features containing the property dcterms:subject;

  • OS-1H, we consider the features including the property rdf:type;

  • FS-1H, we pick all the features but ontological and categorical ones.

For the selection of double-hops (2H) features, the applied policies are:

  • CS-2H, we select the features with properties equal to either dcterms:subject or skos:broader;

  • OS-2H, we consider the features including the properties rdf:type, rdf-schema:subClassOf or owl:equivalentClass;

  • FS-2H, we pick up the features which are not in the previous two categories.

Noteworthy, we have not put any categorical/ontological features into the noisy list. If some domain-specific categorical/ontological features are not in the respective lists, we have considered them as factual features.

Feature Filtering. Following the aforementioned directions, we have extracted 1H, and 2H features for LibraryThing, and Yahoo!Movies. Due to the extent of the catalogs, we obtained millions of features. Consequently, we removed irrelevant features following the filtering technique proposed in [18, 31]. In detail, we dropped off all the features with more than 99.74% (t) of missing values and more than t of distinct values. In detail, we dropped off all the features with more than 99.74% of missing values and distinct values. The statistics of the resulting datasets is depicted in Table 2.

Table 2. Selected features in the different settings either for single and double hops.

Recommender Models. We have conducted experiments considering all the attacks described in Sect. 3.2 on the following baseline Collaborative Filtering Recommender Systems:

  • User-kNN [23, 32] predicts the score of unknown user-item pairs (\(\hat{r}_{u i}\)) considering the feedback of the users in the neighborhood. We have tested SAShA using the formulation mentioned in [23]. It considers the user and item’s ratings biases. Let u be a user inside the set of users U, and i be an item in the set of items I, we estimate the rating given by u to i based on the following Equation:

    $$\begin{aligned} \hat{r}_{u i}= b_{ui} +\frac{\sum _{v \in U_{i}^{k}(u)} \delta (u, v) \cdot (r_{v i}-b_{v i})}{\sum _{v \in U_{i}^{k}(u)} \delta (u, v)} \end{aligned}$$
    (1)

    where \(\delta \) is the distance metric to measure the similarity between users, \(U_{i}^{k}(u)\) is the set of k-neighborhood users v of u. We define \(b_{ui}\) as \(\mu +b_{u}+b_{i}\), where \(\mu \), \(b_{u}\), \(b_{i}\) are the overall average rating, the observed bias of user u and item i, respectively. Following directions suggested in [10], we apply as distance metric \(\delta \) the Pearson Correlation and a number of neighbors k equal to 40.

  • Item-kNN [23, 33] estimates the user-item rating score (\(\hat{r}_{u i}\)) using the recorded feedback given by u to the k-items j in the neighborhood of the item i. Equation 2 defines the rating prediction formula for Item-kNN.

    $$\begin{aligned} \hat{r}_{u i}=b_{u i}+\frac{\sum _{j \in I_{u}^{k}(i)} \delta (i, j) \cdot (r_{u j}-b_{u j})}{\sum _{j \in I_{u}^{k}(i)} \delta (i, j)} \end{aligned}$$
    (2)

    In Eq. 2, the set of k items inside the i neighborhood is denoted as \(I_{u}^{k}(i)\). The similarity function \(\delta \) and the number of considered neighbors k are selected as in User-kNN.

  • Matrix Factorization (MF) [24] is a latent factor model used for items recommendation task that learns user-item preferences, by factorizing the sparse user-item feedback matrix. The learned user and item representation, fitted on previously recorder interactions, are exploited to predict \(\hat{r}_{u i}\) as follows:

    $$\begin{aligned} \hat{r}_{u i}= b_{ui} + \mathbf{q} _{i}^{T} \mathbf{p} _{u} \end{aligned}$$
    (3)

    In Eq. 3, \(\mathbf{q} _{i} \in \mathbb {R}^f\) and \(\mathbf{p} _{u} \in \mathbb {R}^f \) are the latent vectors for item i and user u learned by the model. We set the number of latent factors f to 100, as suggested in [22].

Evaluation Metrics. We have evaluated our attack strategy by adopting Overall Prediction Shift, and Overall Hit-Ratio@k. Let \(I_{T}\) be the set of attacked items, and \(U_{T}\) be the set of users that have not rated the items in \(I_{T}\). We define the Overall Prediction Shift (PS) [1] as the average variation of the predicted score for the target item.

$$\begin{aligned} PS(I_{T},U_{T})=\frac{\sum _{i \subset I_{T},u \subset U_{T}} (\hat{r}_{ui} - r_{ui}) }{|I_{T}| \times |U_{T}|} \end{aligned}$$
(4)

where \(\hat{r}_{ui}\) is the predicted rating on item i for user u after the shilling attack, and \(r_{ui}\) is the prediction without (before) attack. We define the Overall Hit-Ratio@k (HR@k) [1] as the average of hr@k for each attacked item. Equation 5 defines HR@k as:

$$\begin{aligned} HR@k(I_{T},U_{T})=\frac{\sum _{i \subset I_{T}} hr@k(i,U_{T})}{|I_{T}|} \end{aligned}$$
(5)

where \(hr@k(i,U_{T})\) measures the number of occurrences of the attacked item i in the top-k recommendation lists of the users in \(|U_T|\).

Evaluation Protocol. Inspired by the evaluation proposed in [25, 27], we have performed a total of 126 experiments. For each dataset, we have generated the recommendations concerning all users using the selected CF models (i.e., User-kNN, Item-kNN and MF). Then, we have added the fake profiles generated according to the baseline attack strategies, and we have re-computed the recommendation lists. We have evaluated the effectiveness of each attack by measuring the above-mentioned metrics on both the initial and the new recommendation lists. After this step, we have performed a series of SAShA attacks as described in Sect. 3. In detail, we have considered different feature types (i.e., categorical, ontological and factual) extracted at 1 or 2 hops. Finally, we have evaluated the HR@k and PS for each SAShA variant comparing it against baselines. It is worth to note that, in our experiments, each attack is a push attack. Indeed, the attacker’s purpose is to increase the probability that the target item is recommended. Moreover, by adopting the evaluation protocol proposed in [15, 28], we have performed the attacks considering a different amount of added fake user profiles: \(1\%, 2.5\%\) and \(5\%\) of the total number of users. We have tested the attacks considering 50 randomly sampled target items.

4.2 Results and Discussion

The discussion of results is organized accordingly to the research questions stated in Sect. 1. Firstly, we describe the influence of semantic knowledge on attack strategies. Later, we compare the impact of the different types of semantic information.

Analysis of the Effectiveness of Semantic Knowledge on Shilling Attacks. The first Research Question aims to check whether the injection of Linked Open Data as a new source of knowledge can represent a ‘weapon’ for attackers against CF-RS. Table 3 reports the results of the HR@10 for each attack. For both the baseline and semantic-aware variants, we highlight in bold the best results.

Table 3. Experimental results for SAShA at single and double hops.

Starting from the analysis of the low-informed random attack, experiments show that the semantic-aware attacks are remarkably effective. For instance, the semantic-attacks with ontological information at single hop (SAShA-OS-1H), outperforms the baselines independently of the attacked model. To support these insights, we can observe the PS resulting from random attacks. Figure 2a shows that any variant of SAShA has a higher prediction shift w.r.t. the baseline for Yahoo! Movies. In Fig. 2b, we can notice that the semantic strategy is the most effective one for each model. As an example, the PS of Rnd-SAShA-OS-1H increases up to \(6.82\%\) over the corresponding baseline in the case of attacks against User-kNN on Yahoo! Movies dataset. The full results are online availableFootnote 5.

Fig. 2.
figure 2

(a) Prediction Shift on Yahoo!Movies for random attacks at single hop. (b) Prediction Shift on LibraryThing for random attacks at single hop.

In Table 3, we observe that the injection of semantic information for love-hate attack is not particularly effective. This can be due to the specific attack strategy. A possible interpretation is that, since the rationale is to decrease the overall mean rating of all items but the target one, exploiting similarity does not strengthen the approach.

In the informed attacks (i.e., the average attack), results show that semantic integration can be a useful source of knowledge. For instance, Avg-SAShA-OS-2H improves performance on Item-kNN by \(10.2\%\) compared to the baseline.

It is noteworthy that in the semantic variant of the random attack on the movie domain, Rnd-SAShA-CS-2H, reaches performance that is comparable with the baseline average attack. This observation shows that even an attacker that is not able to access system knowledge can perform powerful attacks by exploiting public (semantic) available knowledge bases.

Analysis of the Impact of Different Semantic Information Types, and Multi-hops Influence. In the previous analysis, we have focused on the effectiveness of SAShA strategy irrespective of different types of semantic properties (Sect. 4.1). Table 3 shows that each attack that exploits ontological information is generally the most effective one if we consider single-hop features. We motivate this finding with the ontological relation between the fake profiles and the target item. Exploiting ontological relations we can compute similarities without the “noisy” factual features. A possible interpretation is that a strong ontological similarity is manifest for humans, but for an autonomous agent it can be “hidden” by the presence of other features. Moreover, the exploitation of items’ categorization is particularly effective to attack CF-RS since CF approaches recommend items based on similarities.

Table 3 shows the results for double-hop features. Also in this case, the previous findings are mostly confirmed but for random attacks on Yahoo!Movies.

Finally, we focus on the differences between the impact of single-hop and double-hops features. Experimental results show that the variants that consider the second hop have not a big influence on the effectiveness of attacks. In some cases, we observe a worsening of performance as in LibraryThing. For instance, the performance of random SAShA at double-hops considering ontological features decreases by \(13.1\%\) compared to the same configuration at single-hop (when attacking Item-kNN).

5 Conclusion and Open Challenges

In this work, we have proposed a semantic-aware method for attacking collaborative filtering (CF) recommendation models, named SAShA, in which we explore the impact of publicly available knowledge graph data to generate fake profiles. We have evaluated SAShA on two real-world datasets by extending three baseline Shilling attacks considering different semantic types of features. In detail, we have extended random, love-hate and average attacks by considering Ontological, Categorical and Factual \(\mathcal {KG}\) features extracted from DBpedia. Experimental evaluation has shown that SAShA outperforms baseline attacks. We have performed an extensive set of experiments that show semantic information is a powerful tool to implement effective attacks also when attackers do not have any knowledge of the system under attack. Additionally, we have found that Ontological features are the most effective one, while multi-hops features do not guarantee a significant improvement. We plan to further extend the experimental evaluation of SAShA with different sources of knowledge like Wikidata. Moreover, we intent to explore the efficacy of semantic information with other state-of-the-art attacks (e.g., by considering deep learning-based techniques), with a focus on possible applications of semantic-based attacks against social networks. Finally, we plan to investigate the possibility to support defensive algorithms that take advantage of semantic knowledge.