Skip to main content
Log in

Machine Learning Methods for Detecting and Monitoring Extremist Information on the Internet

  • Published:
Programming and Computer Software Aims and scope Submit manuscript

Abstract

In this paper, we employ machine learning methods to solve the problem of countering terrorism and extremism by using information from the Internet. This problem involves retrieving electronic messages, documents, and web resources that potentially contain information of terrorist or extremist nature, identifying the structure of user groups and online communities that disseminate this information, monitoring and modeling information flows in these communities, as well as assessing threats and predicting risks based on monitoring results. We propose some original language-independent algorithms for pattern-based information retrieval, thematic modeling, and prediction of message flow characteristics, as well as assessment and prediction of potential risk coming from members of online communities by using data on the structure of relations in these communities, which makes it possible to detect potentially dangerous users even without full access to the content they distribute, e.g., through private channels and chat rooms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.

Similar content being viewed by others

REFERENCES

  1. Why big data analytics holds the key to tackling the changing terror threat, J. Adv. Anal. Intell., 2015. http:// www.sas.com/content/dam/SAS/en_us/doc/other1/ iq-q115.pdf.

  2. Hankin, C., IDEAS factory – Detecting terrorist activities: Making sense. http://gow.epsrc.ac.uk/NGBOViewGrant.aspx?GrantRef=EP/H023135/1.

  3. Nizamani, S. et al., Modeling suspicious email detection using enhanced feature selection, 2013, preprint arXiv 1312.1971.

  4. Sheehan, I.S., Assessing and comparing data sources for terrorism research, Evidence-Based Counterterrorism Policy, New York: Springer, 2012, vol. 3, pp. 13–40.

    Google Scholar 

  5. Berger, J.M. and Morgan, J., The ISIS Twitter Census, Brookings Project on US Relations with the Islamic World, 2015, no. 20.

  6. IDEAS factory – Detecting terrorist activities: Making sense. http://www.slideserve.com/fawzia/detecting-terrorist-activities-making-sense.

  7. Proc. Workshop Link Analysis, Counterterrorism, and Security, SIAM Int. Conf. Data Mining, California, 2005. http://research.cs.queensu.ca/home/skill.

  8. Zhang, Y., Zeng, S., Fan, L., Dang, Y., Catherine, A., Larson, C.A., and Chen, H., Dark web forums portal: Searching and analyzing jihadist forums, Proc. IEEE Int. Conf. Intelligence and Security Informatics (ISI), Piscataway, USA, 2009, pp. 71–76.

  9. Abbasi, A. and Chen, H., Applying authorship analysis to extremist-group web forum messages, IEEE Intell. Syst., 2005, vol. 20, pp. 67–75.

    Article  Google Scholar 

  10. Ríos, S.A. and Muñoz, R., Dark web portal overlapping community detection based on topic models, Proc. ACM SIGKDD Workshop Intelligence and Security Informatics (ISI-KDD), New York, 2012.

  11. Kuang, D., Choo, J., and Park, H., Nonnegative matrix factorization for interactive topic modeling and document clustering, in Partitional Clustering Algorithms, Springer, 2015, pp. 215–243.

    Google Scholar 

  12. Tsarev, D.V., Petrovskiy, M.I., and Mashechkin, I.V., Using NMF-based text summarization to improve supervised and unsupervised classification, Proc. 11th IEEE Int. Conf. Application of Information and Communication Technologies, 2011, pp. 185–189.

  13. Elovici, Y., Shapira, B., Last, M., Zaafrany, O., Friedman, M., Schneider, M., and Kandel, A., Detection of access to terror-related web sites using an advanced terror detection system (ATDS), J. Am. Soc. Inf. Sci. Technol., 2010, vol. 61, pp. 405–418.

    Google Scholar 

  14. Agarwal, S. and Sureka, A., Applying social media intelligence for predicting and identifying on-line radicalization and civil unrest oriented threats, 2015, arXiv 1511.06858.

  15. Badia, A. and Kantardzic, M., Link analysis tools for intelligence and counterterrorism, Lect. Notes Comput. Sci., vol. 3495, pp. 49–59.

  16. Ferrara, E., Wang, W.-Q., Varol, O., Flammini, A., and Galstyan, A., Predicting online extremism, content adopters, and interaction reciprocity, Proc. Int. Conf. Social Informatics, 2016, pp. 22–39.

  17. Ríos, S.A. and Muñoz, R., Dark web portal overlapping community detection based on topic models, Proc. ACM SIGKDD Workshop Intelligence and Security Informatics (ISI-KDD), New York, 2012.

  18. Toure, I. and Gangopadhyay, A., Analyzing terror attacks using latent semantic indexing, Proc. IEEE Int. Conf. Technologies for Homeland Security (HST), 2013, pp. 334–337.

  19. Scanlon, J.R. and Gerber, M.S., Forecasting violent extremist cyber recruitment, IEEE Trans. Inf. Forensics Secur., 2015, vol. 10, no. 11, pp. 2461–2470.

    Article  Google Scholar 

  20. L’Huillier, G., Alvarez, H., Ríos, S.A., and Aguilera, F., Topic-based social network analysis for virtual communities of interests in the dark web, SIGKDD Explor. Newsl., 2011, vol. 12, no. 2, pp. 66–73.

    Article  Google Scholar 

  21. Yang, L., Liu, F., Kizza, J.M., and Ege, R.K., Discovering topics from dark websites, Proc. IEEE Symp. Computational Intelligence in Cyber Security (CICS), 2009, pp. 175–179.

  22. Petrovskiy, M., Tsarev, D., and Pospelova, I., Pattern based information retrieval approach to discover extremist information on the Internet, Mining Intelligence and Knowledge Exploration, Ghosh, A., Pal, R., and Prasath, R., Eds., Springer, 2017.

    Google Scholar 

  23. Manning, C.D. et al., Introduction to Information Retrieval, Cambridge University Press, 2008, vol. 1.

    Book  MATH  Google Scholar 

  24. Chisholm, E. and Kolda, T.G., New term weighting formulas for the vector space method in information retrieval, Computer Science and Mathematics Division, Oak Ridge National Laboratory, 1999.

    Book  Google Scholar 

  25. Landauer, T.K. and Dumais, S.T., A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge, Psychol. Rev., 1997, vol. 104, no. 2, p. 211.

    Article  Google Scholar 

  26. Lee, D.D. and Seung, H.S., Learning the parts of objects by non-negative matrix factorization, Nature, 1999, vol. 401, no. 6755, pp. 788–791.

    Article  MATH  Google Scholar 

  27. Tsarev, D.V., Petrovskiy, M.I., Mashechkin, I.V., and Popov, D.S., Automatic text summarization using latent semantic analysis, Program. Comput. Software, 2011, vol. 37, no. 6, pp. 299–305.

    Article  MathSciNet  MATH  Google Scholar 

  28. Steinberger, J. and Ježek, K., Text summarization and singular value decomposition, Advances in Information Systems, Berlin: Springer, 2005, pp. 245–254.

    Google Scholar 

  29. Ester, M., Kriegel, H.-P., Sander, J., and Xu, X., A density-based algorithm for discovering clusters in large spatial databases with noise, Proc. 2nd Int. Conf. Knowledge Discovery and Data Mining (KDD), 1996, pp. 226–231.

  30. Levenshtein, V.I., Binary codes with correction of fallouts, insertions, and substitutions of characters, Dokl. Akad. Nauk SSSR (Proc. Acad. Sci. USSR), 1965, vol. 163, no. 4, pp. 845–848.

  31. Hurvich, C.M., Simonoff, J.S., and Tsai, C.L., Smoothing parameter selection in nonparametric regression using an improved Akaike information criterion, J. R. Stat. Soc. B, 1998, vol. 60, pp. 271–293.

    Article  MathSciNet  MATH  Google Scholar 

  32. Salvador, S. and Chan, P., FastDTW: Toward accurate dynamic time warping in linear time and space, Proc. KDD Workshop Mining Temporal and Sequential Data, 2004, pp. 70–80.

  33. Notation for ARIMA models, Time Series Forecasting System, SAS Institute.

  34. Shehabat, A., Mitew, T., and Alzoubi, Y., Encrypted jihad: Investigating the role of Telegram app in lone wolf attacks in the West, J. Strategic Secur., 2017, no. 3, pp. 27–53.

  35. Page, L., Brin, S., Motwani, R., and Winograd, T., The pagerank citation ranking: Bringing order to the web, Stanford InfoLab, 1999.

    Google Scholar 

  36. Kleinberg, J.M., Authoritative sources in a hyperlinked environment, J. ACM, 1999, vol. 46, nos. 5–7, pp. 604–632.

    Article  MathSciNet  MATH  Google Scholar 

  37. Wasserman, S. and Faust, K., Social Network Analysis: Methods and Applications (Structural Analysis in the Social Sciences), Cambridge University Press, 1994, 1st ed.

    Book  MATH  Google Scholar 

  38. Chen, T. and Guestrin, C., XGBoost: A scalable tree boosting system, Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, 2016, pp. 785–794.

  39. Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., and Liu, T.Y., LightGBM: A highly efficient gradient boosting decision tree, Adv. Neural Inf. Process. Syst., 2017, pp. 3149–3157.

  40. Baldi, P., Autoencoders, unsupervised learning, and deep architectures, Proc. ICML Workshop Unsupervised and Transfer Learning, 2012, pp. 37–49.

  41. The 20 Newsgroups data set. http://people.csail.mit. edu/jrennie/20Newsgroups.

  42. Kaggle “How ISIS uses Twitter” dataset. http://www. kaggle.com/fifthtribe/how-isis-uses-twitter.

  43. Kaggle “ISIS religious texts” dataset. http://www.kaggle.com/fifthtribe/isis-religious-texts.

  44. Breiman, L., Friedman, J.H., Olshen, R.A., and Stone, C.J., Classification and regression trees, Monterey: Wadsworth & Brooks/Cole Advanced Books & Software, 1984.

    MATH  Google Scholar 

  45. Breiman, L., Bagging predictors, Mach. Learn., 1996, vol. 24, no. 2, pp. 123–140.

    MATH  Google Scholar 

  46. Hutter, F., Hoos, H., and Leyton-Brown, K., Sequential model-based optimization for general algorithm configuration, Learn. Intell. Optim., 2011, pp. 507–523.

Download references

ACKNOWLEDGMENTS

This work was supported by the Russian Foundation for Basic Research, project no. 16-29-09555 ofi_m.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to I. V. Mashechkin, M. I. Petrovskiy, D. V. Tsarev or M. N. Chikunov.

Additional information

Translated by Yu. Kornienko

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mashechkin, I.V., Petrovskiy, M.I., Tsarev, D.V. et al. Machine Learning Methods for Detecting and Monitoring Extremist Information on the Internet. Program Comput Soft 45, 99–115 (2019). https://doi.org/10.1134/S0361768819030058

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S0361768819030058

Navigation