Machine Learning Methods for Detecting and Monitoring Extremist Information on the Internet

Mashechkin, I. V.; Petrovskiy, M. I.; Tsarev, D. V.; Chikunov, M. N.

doi:10.1134/S0361768819030058

Machine Learning Methods for Detecting and Monitoring Extremist Information on the Internet

Published: 11 June 2019

Volume 45, pages 99–115, (2019)
Cite this article

Programming and Computer Software Aims and scope Submit manuscript

I. V. Mashechkin¹,
M. I. Petrovskiy¹,
D. V. Tsarev¹ &
…
M. N. Chikunov¹

674 Accesses
19 Citations
Explore all metrics

Abstract

In this paper, we employ machine learning methods to solve the problem of countering terrorism and extremism by using information from the Internet. This problem involves retrieving electronic messages, documents, and web resources that potentially contain information of terrorist or extremist nature, identifying the structure of user groups and online communities that disseminate this information, monitoring and modeling information flows in these communities, as well as assessing threats and predicting risks based on monitoring results. We propose some original language-independent algorithms for pattern-based information retrieval, thematic modeling, and prediction of message flow characteristics, as well as assessment and prediction of potential risk coming from members of online communities by using data on the structure of relations in these communities, which makes it possible to detect potentially dangerous users even without full access to the content they distribute, e.g., through private channels and chat rooms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Detection of Extremist Ideation on Social Media Using Machine Learning Techniques

Pattern Based Information Retrieval Approach to Discover Extremist Information on the Internet

Automatic Classification and Linguistic Analysis of Extremist Online Material

REFERENCES

Why big data analytics holds the key to tackling the changing terror threat, J. Adv. Anal. Intell., 2015. http:// www.sas.com/content/dam/SAS/en_us/doc/other1/ iq-q115.pdf.
Hankin, C., IDEAS factory – Detecting terrorist activities: Making sense. http://gow.epsrc.ac.uk/NGBOViewGrant.aspx?GrantRef=EP/H023135/1.
Nizamani, S. et al., Modeling suspicious email detection using enhanced feature selection, 2013, preprint arXiv 1312.1971.
Sheehan, I.S., Assessing and comparing data sources for terrorism research, Evidence-Based Counterterrorism Policy, New York: Springer, 2012, vol. 3, pp. 13–40.
Google Scholar
Berger, J.M. and Morgan, J., The ISIS Twitter Census, Brookings Project on US Relations with the Islamic World, 2015, no. 20.
IDEAS factory – Detecting terrorist activities: Making sense. http://www.slideserve.com/fawzia/detecting-terrorist-activities-making-sense.
Proc. Workshop Link Analysis, Counterterrorism, and Security, SIAM Int. Conf. Data Mining, California, 2005. http://research.cs.queensu.ca/home/skill.
Zhang, Y., Zeng, S., Fan, L., Dang, Y., Catherine, A., Larson, C.A., and Chen, H., Dark web forums portal: Searching and analyzing jihadist forums, Proc. IEEE Int. Conf. Intelligence and Security Informatics (ISI), Piscataway, USA, 2009, pp. 71–76.
Abbasi, A. and Chen, H., Applying authorship analysis to extremist-group web forum messages, IEEE Intell. Syst., 2005, vol. 20, pp. 67–75.
Article Google Scholar
Ríos, S.A. and Muñoz, R., Dark web portal overlapping community detection based on topic models, Proc. ACM SIGKDD Workshop Intelligence and Security Informatics (ISI-KDD), New York, 2012.
Kuang, D., Choo, J., and Park, H., Nonnegative matrix factorization for interactive topic modeling and document clustering, in Partitional Clustering Algorithms, Springer, 2015, pp. 215–243.
Google Scholar
Tsarev, D.V., Petrovskiy, M.I., and Mashechkin, I.V., Using NMF-based text summarization to improve supervised and unsupervised classification, Proc. 11th IEEE Int. Conf. Application of Information and Communication Technologies, 2011, pp. 185–189.
Elovici, Y., Shapira, B., Last, M., Zaafrany, O., Friedman, M., Schneider, M., and Kandel, A., Detection of access to terror-related web sites using an advanced terror detection system (ATDS), J. Am. Soc. Inf. Sci. Technol., 2010, vol. 61, pp. 405–418.
Google Scholar
Agarwal, S. and Sureka, A., Applying social media intelligence for predicting and identifying on-line radicalization and civil unrest oriented threats, 2015, arXiv 1511.06858.
Badia, A. and Kantardzic, M., Link analysis tools for intelligence and counterterrorism, Lect. Notes Comput. Sci., vol. 3495, pp. 49–59.
Ferrara, E., Wang, W.-Q., Varol, O., Flammini, A., and Galstyan, A., Predicting online extremism, content adopters, and interaction reciprocity, Proc. Int. Conf. Social Informatics, 2016, pp. 22–39.
Ríos, S.A. and Muñoz, R., Dark web portal overlapping community detection based on topic models, Proc. ACM SIGKDD Workshop Intelligence and Security Informatics (ISI-KDD), New York, 2012.
Toure, I. and Gangopadhyay, A., Analyzing terror attacks using latent semantic indexing, Proc. IEEE Int. Conf. Technologies for Homeland Security (HST), 2013, pp. 334–337.
Scanlon, J.R. and Gerber, M.S., Forecasting violent extremist cyber recruitment, IEEE Trans. Inf. Forensics Secur., 2015, vol. 10, no. 11, pp. 2461–2470.
Article Google Scholar
L’Huillier, G., Alvarez, H., Ríos, S.A., and Aguilera, F., Topic-based social network analysis for virtual communities of interests in the dark web, SIGKDD Explor. Newsl., 2011, vol. 12, no. 2, pp. 66–73.
Article Google Scholar
Yang, L., Liu, F., Kizza, J.M., and Ege, R.K., Discovering topics from dark websites, Proc. IEEE Symp. Computational Intelligence in Cyber Security (CICS), 2009, pp. 175–179.
Petrovskiy, M., Tsarev, D., and Pospelova, I., Pattern based information retrieval approach to discover extremist information on the Internet, Mining Intelligence and Knowledge Exploration, Ghosh, A., Pal, R., and Prasath, R., Eds., Springer, 2017.
Google Scholar
Manning, C.D. et al., Introduction to Information Retrieval, Cambridge University Press, 2008, vol. 1.
Book MATH Google Scholar
Chisholm, E. and Kolda, T.G., New term weighting formulas for the vector space method in information retrieval, Computer Science and Mathematics Division, Oak Ridge National Laboratory, 1999.
Book Google Scholar
Landauer, T.K. and Dumais, S.T., A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge, Psychol. Rev., 1997, vol. 104, no. 2, p. 211.
Article Google Scholar
Lee, D.D. and Seung, H.S., Learning the parts of objects by non-negative matrix factorization, Nature, 1999, vol. 401, no. 6755, pp. 788–791.
Article MATH Google Scholar
Tsarev, D.V., Petrovskiy, M.I., Mashechkin, I.V., and Popov, D.S., Automatic text summarization using latent semantic analysis, Program. Comput. Software, 2011, vol. 37, no. 6, pp. 299–305.
Article MathSciNet MATH Google Scholar
Steinberger, J. and Ježek, K., Text summarization and singular value decomposition, Advances in Information Systems, Berlin: Springer, 2005, pp. 245–254.
Google Scholar
Ester, M., Kriegel, H.-P., Sander, J., and Xu, X., A density-based algorithm for discovering clusters in large spatial databases with noise, Proc. 2nd Int. Conf. Knowledge Discovery and Data Mining (KDD), 1996, pp. 226–231.
Levenshtein, V.I., Binary codes with correction of fallouts, insertions, and substitutions of characters, Dokl. Akad. Nauk SSSR (Proc. Acad. Sci. USSR), 1965, vol. 163, no. 4, pp. 845–848.
Hurvich, C.M., Simonoff, J.S., and Tsai, C.L., Smoothing parameter selection in nonparametric regression using an improved Akaike information criterion, J. R. Stat. Soc. B, 1998, vol. 60, pp. 271–293.
Article MathSciNet MATH Google Scholar
Salvador, S. and Chan, P., FastDTW: Toward accurate dynamic time warping in linear time and space, Proc. KDD Workshop Mining Temporal and Sequential Data, 2004, pp. 70–80.
Notation for ARIMA models, Time Series Forecasting System, SAS Institute.
Shehabat, A., Mitew, T., and Alzoubi, Y., Encrypted jihad: Investigating the role of Telegram app in lone wolf attacks in the West, J. Strategic Secur., 2017, no. 3, pp. 27–53.
Page, L., Brin, S., Motwani, R., and Winograd, T., The pagerank citation ranking: Bringing order to the web, Stanford InfoLab, 1999.
Google Scholar
Kleinberg, J.M., Authoritative sources in a hyperlinked environment, J. ACM, 1999, vol. 46, nos. 5–7, pp. 604–632.
Article MathSciNet MATH Google Scholar
Wasserman, S. and Faust, K., Social Network Analysis: Methods and Applications (Structural Analysis in the Social Sciences), Cambridge University Press, 1994, 1st ed.
Book MATH Google Scholar
Chen, T. and Guestrin, C., XGBoost: A scalable tree boosting system, Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, 2016, pp. 785–794.
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., and Liu, T.Y., LightGBM: A highly efficient gradient boosting decision tree, Adv. Neural Inf. Process. Syst., 2017, pp. 3149–3157.
Baldi, P., Autoencoders, unsupervised learning, and deep architectures, Proc. ICML Workshop Unsupervised and Transfer Learning, 2012, pp. 37–49.
The 20 Newsgroups data set. http://people.csail.mit. edu/jrennie/20Newsgroups.
Kaggle “How ISIS uses Twitter” dataset. http://www. kaggle.com/fifthtribe/how-isis-uses-twitter.
Kaggle “ISIS religious texts” dataset. http://www.kaggle.com/fifthtribe/isis-religious-texts.
Breiman, L., Friedman, J.H., Olshen, R.A., and Stone, C.J., Classification and regression trees, Monterey: Wadsworth & Brooks/Cole Advanced Books & Software, 1984.
MATH Google Scholar
Breiman, L., Bagging predictors, Mach. Learn., 1996, vol. 24, no. 2, pp. 123–140.
MATH Google Scholar
Hutter, F., Hoos, H., and Leyton-Brown, K., Sequential model-based optimization for general algorithm configuration, Learn. Intell. Optim., 2011, pp. 507–523.

Download references

ACKNOWLEDGMENTS

This work was supported by the Russian Foundation for Basic Research, project no. 16-29-09555 ofi_m.

Author information

Authors and Affiliations

Faculty of Computational Mathematics and Cybernetics, Moscow State University, 119899, Moscow, Russia
I. V. Mashechkin, M. I. Petrovskiy, D. V. Tsarev & M. N. Chikunov

Authors

I. V. Mashechkin
View author publications
You can also search for this author in PubMed Google Scholar
M. I. Petrovskiy
View author publications
You can also search for this author in PubMed Google Scholar
D. V. Tsarev
View author publications
You can also search for this author in PubMed Google Scholar
M. N. Chikunov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to I. V. Mashechkin, M. I. Petrovskiy, D. V. Tsarev or M. N. Chikunov.

Additional information

Translated by Yu. Kornienko

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mashechkin, I.V., Petrovskiy, M.I., Tsarev, D.V. et al. Machine Learning Methods for Detecting and Monitoring Extremist Information on the Internet. Program Comput Soft 45, 99–115 (2019). https://doi.org/10.1134/S0361768819030058

Download citation

Received: 15 January 2019
Revised: 15 January 2019
Accepted: 15 January 2019
Published: 11 June 2019
Issue Date: May 2019
DOI: https://doi.org/10.1134/S0361768819030058

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions