Keywords

1 Introduction

Social networks have become an increasingly important tool when it comes to the communication and interaction between users. Even more, when companies and political parties, take advantage of this media to interact and transmit thousands (even millions) of messages in very short periods of time. The main issue in question arises, when the sender of the message is not an “ordinary” person, but an automated and/or false account. The so-called “social” bots (in Twitter) are accounts controlled by software that can generate content (Tweets) and establish interactions (RT, Likes, Follows) algorithmically without (or minimum) intervention of humans. These entities can be used in different ways, on the one hand, they can be used for the dissemination of news and publications or the coordination of volunteers for activities; and on the other hand, they can be used to emulate human behavior in a negative way, in order to increase the political support that a candidate/party can receive [20]. It is also possible that these bots can contaminate the discussion that occurs in the network by granting false credibility to their messages and influencing other users [1, 10]. During the investigation carried out by Deep PUCV, consisting of a predictive model of electoral results [18], which is based on the communicative interaction of users in social networks applying computational intelligence techniques and Big Data, that it was discovered that some messages were repeated among different users at the same time stamp, which led to the suspicion of bots and/or cyborgs. From this, arose the need to detect those false accounts (bots) related to the candidates who applied for the Chilean presidential office, in order to automatically identify them during the course of the campaign period. In order to develop this research, we based our work on [22], which provided Machine Learning methodology, to identify certain characteristics that reveal whether an account is a bot or not. In this paper, at first we show the results of some manual analysis of bot detection, and then we propose to automatically detect this accounts by means of an automated way by using a heterogeneous representation of the accounts and machine learning classification models. The structure of the paper is as follows: In the next section, we present related work and an analysis of the social activity regarding years 2017 election year in Chile. In the following section we present the dataset and the proposed methodology. In Section 4 we show the results of the experiments carried out with several machine learning classification models. In the last section we present some concluding remarks and delineate future work.

2 Related Work and Forensic Analysis of Social Media Events in Chile 2017

In the last years the computing community has been developing complex and advanced techniques to detect social bots in an accurate way. According to [14] it is possible to classify the approaches into three classes: (1) bot detection systems based on the social network topology, (2) systems based on feature-based machine learning methods and (3) systems based on crowdsourcing on user posts and profile analysis.

  • Structure-Based (Social Network-Based) Bot Detection

    Sybil accounts are the multiple accounts controlled by an malicious user [8]. Structure-based detection techniques focus on detecting Sybil accounts. These accounts are used to infiltrate Social Networks, steal private data, disseminate misinformation and malware. That is why, Sybil attacks are a fundamental threat for social networks [9, 11, 16]. For instance, it was reported in 2015 that around 170 million fake Facebook accounts were Sybil accounts [17]. Whereas this type of bots can be generated intentionally by users for benign purposes such as preserving anonymity; they are mainly considered as malicious. Knowing how Sybil accounts spread on the network is crucial to identify them specially for this type of detection techniques.

  • Machine Learning-Based Bot Detection

    The more sophisticated social bots are (with Artificial Intelligence (AI)), the more risk they pose. That is why, detecting them has become a difficult challenge. The rise of AI leads has increased the sophistication but also the techniques to detect them. The main idea behind them is to find out key features of social bots to find the patterns that differentiate the bots with humans. Chu et al. [5] carried out a study in order to profile human, bot, and cyborgs. They characterized the difference among them in terms of tweet content, tweeting behaviour, and account properties like external URL ratio. Lee et al. [19] present a study for social honeypots for profiling and filtering of content polluters in social media by using their profile features.

  • CrowdSourcing-Based Bot Detection

    Wang et al. [23] proposed a new approach of applying human effort (crowdsourcing) to the detection of bots. Their insight is that careful users can detect even slight inconsistencies in account profiles and posts. They propose a two-layered system containing filtering and crowdsourcing layer. They offer to use prior automation techniques such as community detection and network-based feature selection, and user reports in filtering layer to obtain suspicious profiles. Then, they apply crowdsourcing for final decision on classifying accounts either legitimate or bot.

During the three Chilean elections that took place in 2017 (primary, first and second presidential round), a total of 12 candidates were running for the presidential candidacy, from which, for the purposes of the present research, we worked with the data of the 8 candidates participating in the first presidential round. 2017 was a year of intense electoral activity and, consequently, a period of high use, of both traditional media and social networks among Chilean users and presidential candidates. That is why the traditional media events (television and radio interviews/debates) that occurred during the elections were used to analyze the activity on Twitter. Twitter is the most used social network in political campaign contexts to publicize their opinions and electoral preferences [4, 13, 15]. In this way it was possible to analyze and detect the activity of candidates running for the presidency.

As a first attempt to adopt an automated bot detection approach, a detailed analysis has been carried out before the two debates held on September 14 and 28, respectively, thus being able to obtain possible indicators of suspicious activity on social networks. From this analysis we observed, analyzing the debate of September 14, suspicious activities related to one candidate, which obtained a very high peak of participation in the hours of the debate, as shown in Fig. 1.

Fig. 1.
figure 1

Messages issued for each candidate on Twitter through September 14

One of the first indicators of possible suspicious activity of the accounts related to this candidate, corresponded to an unusual high activity regarding the mentions and retweets, compared to the other candidates, with the exception of Sebastián Piñera, who constantly generated more activity. During the previous moments of the debate, it is to be expected that the mentions to the candidates will increase. But the drastic increase in mentions for Ominami, who obtained his peak at 22:00 h, with 4172 messages (between 21:00 and 22:00), was a sign of abnormality. Moreover, the next day a peak of messages was again presented between 19:00 and 20:00 (Appearance of candidates Kast and Ominami on CNN [6]), with a total of 2710 messages, presenting the same message decline behavior for the later hours, as shown in Fig. 2.

Fig. 2.
figure 2

Messages issued for each candidate on Twitter through September 15

Given the above, we proceeded to review the volume of original tweets versus retweets (RT) for each of the candidates, on the established dates and times, resulting in Ominami being one of the candidates with the lowest proportion of original tweets vs RT, with a ratio of 1/6.23 and 1/6.76 for days 14 and 15 respectively. In contrast to other candidates: Beatriz Sánchez with 1/2.02 and 1/2.53; José Antonio Kast with 1/3,35 and 1/3,77; Sebastián Piñera with 1/1.95 and 1/2.70; Alejandro Guillier with 1/4 and 1/2.08; and finally Carolina Goic with 1/2.46 and 1/2.58 respectively for the aforementioned dates, as seen in the following Fig. 3.

Fig. 3.
figure 3

Detail of Original messages versus RT for both dates

Another analysis, consists in monitoring the applications used to upload the messages, which indicate if there may be a certain level of automation in the generation of messages and simulation of behaviors. At this point, it was detected that the third most used application for Ominami corresponds to TweetDeck [21], which within all its characteristics, allows as a main function, the management of multiple accounts at the same time, being able to operate simultaneously and coordinate the actions of the accounts. Figures 4, 5 and 6 show the composition of the applications for the candidates at the dates and times described above, where the proportion of messages made through TweetDeck versus other applications for each candidate:

Fig. 4.
figure 4

Origins of the messages for the candidates Alejandro Guillier, Marco Enriquez Ominami, Jose Antonio Kast, Sebastián Piñera during the indicated days

Fig. 5.
figure 5

Origins of the messages for the candidates Carolina Goic, Marco Enriquez Ominami, Beatriz Sánchez, Sebastián Piñera during the indicated days

Fig. 6.
figure 6

Origins of the messages for the candidates Carolina Goic, Marco Enriquez Ominami, Beatriz Sánchez, Sebastián Piñera during the indicated days

While most of the messages generated came from Android, followed by iPhone, it should be noted that for the day of the debate (September 14), Ominami presents a similar proportion between TwitterDeck and Iphone. Presenting situations in which suspicious behavior can be evidenced in different accounts, where two accounts perform RTs to the same tweets, in the same order and at similar times. We could find this behaviour in various accounts.

In the dates discussed before, the presence of bots and/or cyborgs was so evident, that several users noticed the situation. Despite having not appeared at the time in the interview, he was already generating a great number of positive tweets (Fig. 7).

Fig. 7.
figure 7

User noticing strange behavior on Twitter. “Tell the MEO bots, that he has not intervened yet”

One of the main reasons that made us suspect that Ominami is using bots to generate these behaviors, lies mainly in the use of the TweetDeck application. Although it allows the automation of certain tasks for several accounts, still requires a human user to perform these actions. So instead of defining all these accounts as bots, we will proceed to call them Cyborgs, which, unlike bots, work with a human which occupies computing tools and is no longer \(100\%\) autonomous. On the other hand, in the annex the graphs for the event of September 28 are shown, where a similar behavior occurs for Ominami.

2.1 Data Collection and Analysis

For the present study, 9, 367, 127 tweets were collected, from 372, 665 users, following three search criterias:

  1. (a)

    Mention of a candidate’s account.

  2. (b)

    Mention of the name of the candidate.

  3. (c)

    Mention of a hashtag related to an event of the candidates.

After the collection and storage procedure, we proceed to perform the tweet classification stage, carried out manually by 6 experts who tagged a total of 640, 224 tweets in three sentiment categories (positive, neutral and negative). It was during this stage of manual classification that it was discovered that several tweets were repeated frequently between different users at the same time. The messages were identical, but they were not retweets. Consequently, the suspicion of possible false accounts arose, which led to creating a new tag, tagging them as bots or not bots, based on the messages, name of accounts, etc. In this manual tagging process we collected a total of 2472 bots accounts, which were used for the validation stage.

Regarding the repeated tweets between different users, 4091 tweets were found, from 3072 different users.

3 Automated Detection

For the automated detection of bots, we based our procedure on [22]. In said article the authors proposed the use of supervised automatic learning techniques for the automatic classification of bot accounts.

Regarding the characteristics of a user, data and metadata were extracted from the Twitter users, namely the number of followers and followings, publications (related to the primaries), date of creation of the account, number of tweets generated, number of favorites and others.

In order to train a model for the detection of bots, we trained the model with the results obtained with a known application called botometer [3] and samples of Twitter accounts that were detected manually.

In conjunction with the aforementioned, friendship relations and the flow of information among users showing behavioral of different nature were characterized: humans and bot. According to Varol [22]:

  • Human beings tend to interact with more human accounts than bot ones, on average.

  • The reciprocity of the bonds of friendship is greater among human beings.

  • Some bots target more or less random users, others can choose targets based on their intentions.

3.1 Description of Extracted Characteristics

To perform the feature extraction process, we can establish 6 different groups:

  • Based on the user: features corresponding to the user characteristics. With them it has been possible to classify the users and the patterns they possess. Among them you can find the number of friends and followers, description of the profile and configuration, number of tweets produced by users, among others. (20 features)

  • Friends Features: on Twitter, interconnection is actively encouraged. Users are linked by follower-friend relationships (tracking). The content travels from person to person through retweets. In addition, tweets can be directed to specific users through mentions. So it has been considered four types of links: retweeting, mentions, retweeted, and being mentioned. For each group separately, features are extracted about the use of the language, local time, popularity, etc. Bearing in mind that, due to the Twitter API limits, we do not use follower/tracking information beyond these aggregated statistics. (9 features)

  • Network Features: within a network structure, relevant information to characterize different types of communication can be obtained. Where the use of them, helps in the tasks of political astroturf. For this work, three different networks are recognized: Retweet networks, mentions and hashtag. (7 features)

  • Temporal characteristics: Here several temporal characteristics related to user activity are measured, including the average rates of tweet production over several time periods and distributions of time intervals between events. (3 features)

  • Content and language features: In this work the quality of tweets in terms of informal or deceptive language is not analyzed. Instead, certain statistics related to the length and entropy of the body of a tweet are extracted, identifying also the different categories of POS-tags (verbs, predicate, adjective, adverbs, etc.). (4 features)

  • Sentiment characteristics: with the analysis of feeling you can get the emotions that the user transmits when publishing a tweet, with this it is possible to know the mood of a conversation and in this case the intention of support by part of a user towards a particular candidate. (18 features)

The total of features extracted from the users were 61. For the details of these features please refer to [22].

4 Experimental Results

Different Machine Learning classification algorithms were used to compare results, these correspond to Random Forest, AdaBoost (Gaussian Naive Bayes as weak learner), Decision Trees and Support Vector Machines. Each algorithm was trained with a quantity of 2241 users, dataset obtained through Botometer [3], of which 731 correspond to bots and 1510 to no bots, in terms of testing the data, an amount of 1078 users was tested, of which half were classified manually as bots and the other as non-bots. The results obtained during the training and testing stages are shown in the Tables 1 and 2. The parameters for the Random Forest Classifier were the following: number of estimators (5, 10, 50, 100, 200, 500, 1000), criteria (giny and entropy), maximum number of features (1–6) and maximum depth (1, 5, 10, 15, 20). In the case of the AdaBoost Classifier we trained with different number of weak learner (1, 5, 10, 50, 100, 200, 500, 1000, 2000, 3000), and Guassian Naive Bayes as a weak classifier. In the case of the Decision Tree classifier the parameters were the following: number of estimators (1, 5, 10, 50, 100, 200, 500, 1000, 2000, 3000), as splitting criteria (best and random), maximum depth (1, 5, 10, 15, 20), maximum number of features (1–6) and minimum number to split (0.0001, 0.001, 0.1, 0.2 and 0.5). In the case of the Support Vector Machines we used the following parameters: kernel (linear, polynomial, radial basis function, sigmoid) and degree (1, 3, 5, 10, 50, 100, 200, 500, 1000, 1500, 2000, 3000).

Table 1. Results obtained during the training stage

In Table 1 we can observe that all classifiers obtained similar results in the training stage. The experiments were carried out with 10 experimental runs, and Grid Search approach was used to find the best combination of parameters. The results shown are the average values of the experiments. Also, dimensionality reduction techniques were used in order improve the results, but the best results were obtained with the original features.

Table 2. Results obtained during the prediction stage

In Table 2 we can see the results of the testing stage. The best results in all performance measures were obtained with the Support Vector Machine model.

5 Conclusions and Future Work

The main objective of this research is the detection of bots, which correspond to accounts controlled by hybrid or automated methods that create content and interact with other accounts. In this work, different models of bots identification methods are presented, together with an Machine Learning-based method for automated detection. To carry it out, different characteristics of Twitter users are extracted through the Api provided by said social network, complementing it with the public dataset that has bots already identified on Twitter. From these data the different models are trained and the evaluation of their performance is obtained, obtaining an average training accuracy of 0.83. Although the testing results are not optimal (at best 0.58 accuracy), we will continue to work to improve the results and achieve greater precision, through a more complex Graph representation of the user network and its features. In this way it will be possible to detect new features that allow identifying a user as a bot or not bot, and adding these to the already defined models.