A text-mining based cyber-risk assessment and mitigation framework for critical analysis of online hacker forums

https://doi.org/10.1016/j.dss.2021.113651Get rights and content

Highlights

  • Novel text-mining based cyber-risk assessment and mitigation framework.

  • Identify hacker expertise using explicit and implicit features on online forums.

  • Expert hackers demonstrate leadership in online forums.

  • Compute financial impact for every {hacker expertise, attack-type} combination.

  • Prioritize hacker mitigation strategies.

Abstract

Online hacker communities are meeting spots for aspiring and seasoned cybercriminals where they engage in technical discussions, share exploits and relevant hacking tools to be used in launching cyber-attacks on business organizations. Sometimes, the affected organizations can detect these attacks in advance, with the help of cyber-threat intelligence derived from the explicit and implicit features of hacker communication in these forums. Herein, we proposed a novel text-mining based cyber-risk assessment and mitigation framework, which performs the following critical tasks. (i) Cyber-risk Assessment - to identify hacker expertise (i.e., newbie, beginner, intermediate, and advanced) using explicit and implicit features applying various classification algorithms. Among these features, cybersecurity keywords, sharing of attachments, and sentiments emerged as significant. Further, we found that expert hackers demonstrate leadership in the online forums that eventually serve as communities of practice. Consequently, novice hackers gradually develop their cyber-attack skills through prolonged observations, interactions, and external influences in this social learning process. (ii) Cyber-risk mitigation – computes financial impact for every {hacker expertise, attack-type} combination, and then by ranking them on a {likelihood, impact} decision-matrix to prioritize mitigation strategies in affected organizations. Through these novel recommendations, our framework can guide managers to decide on appropriate cybersecurity controls using an {expected loss, probability, attack-type, hacker expertise} metric against financial losses due to cyber-attacks.

Introduction

Cybercriminals have adversely impacted the global economy to billions of dollars of losses across various organizations in recent years. For instance, in December 2020, attackers conducted a large-scale breach across the users of Orion, a network monitoring product by SolarWinds. Organizations affected in the attack included top U.S. federal agencies such as the Department of Justice, U.S. Treasury, Homeland Security, and Fortune 500 companies such as Microsoft, Intel, Cisco and their own clients, as well as cybersecurity firm FireEye and many more.1 Threat actors used a malware code named Sunburst and surreptitiously introduced it into the organizational networks as early as September 2019 but went undetected for over a year.2 In another incident during January 2021, state-sponsored threat groups actively exploited four zero-day vulnerabilities in the Microsoft Exchange server, and deployed backdoors to launch widespread attacks. Some of the most targeted industries in this attack were government and military (23%), followed by manufacturing (15%), and banking and financial services (14%).3 These incidents suggest an ever-growing trend where the likelihood of cyber-attacks are increasing in recent years, and are continuing to have a significant negative economic impact on organizations.4 According to a recent World Economic Forum Report,5 organizations need to use “active defence” to survive in the age of advanced cyber-threats. Therefore, cyber-attacks require proactive intervention from governmental, non-governmental, and business organizations alike.

Globally, hacker communities, also known as “dark forums”, have become one-stop sites for cyber-criminals who exchange malicious technical knowledge, hacking tools and exploits before conducting cyber-attacks. These online “dark forums” are novel and promising sources of cyber-threat intelligence that firms and technology professionals can proactively scan to avert future cyber-attacks [6,46]. These forums originally belong to the “Dark Web”, which can be of four primary types: dark forums, internet chat boards, darknet markets, and carding shops. This study focuses on dark forums as virtual communities-of-practice where groups of people share a mutual concern - i.e., the pursuit of malicious technical knowledge [36,46]. Armstrong and Hagel [2] have defined virtual communities as computer-mediated platforms where they highlight member-generated content, leading to its mutually cognitive integration of the content. To elaborate, we present the message exchange mechanism for the Hackhound forum6 (Table 1), where hackers seek to expand their knowledge through continual interaction [20]. A beginner hacker is interested in acquiring knowledge: e.g., David87965 participated in Books from offensive sub-forum to gather technical information and Members' security to inquire about a possible breach.

In contrast, senior hackers are more interested in sharing knowledge. For instance, Hacker4Life shared a malicious worm in Black Worm Generator sub-forum. The Community of Practice Theory [51] supports these behavioural traits through social learning, where an individual's knowledge acquisition is dependent mainly on peers and mutual interactions among them [13]. Such knowledge-sharing behaviour can serve as explicit predictors of hacker expertise, where beginners posted messages at a rate of 520/159 = 3.271 per hacker versus 665/374 = 1.778 per newbie.

Again, analysts can examine message-exchange mechanisms in these dark forums to reveal implicit predictors of hacker expertise, such as the number of cyber-security keywords published by each hacker. For instance, “intermediate” hackers, Hacker4Life wrote “dark worm,” and BlackArray posted “improved security scanner,” which were technically more enriching than what “beginner” hacker googlefloober and David87965 wrote. Therefore, cyber-security analysts can detect hacker expertise using such meaningful combinations of explicit and implicit message-exchange features. Subsequently, firms can design mitigation strategies based on the hacker's level of knowledge (i.e. expertise) and type of attack he/she can inflict, thereby preventing future attacks. Such proactive IT risk management techniques are known as cyber-threat intelligence [5,6,34,45]. For instance, sensitive financial7 and personal information8 leaked from consumers is often available on darknet forums for sale. Credit-monitoring firms can proactively investigate these forums, extract similar information and possibly prevent large-scale data breaches in future. With this in mind, scholars and practitioners admit that a deeper understanding based on hackers' explicit and implicit message-exchange behaviour is required to determine their expertise. These are eventually needed to minimize the efficacy and extent of similar attacks in the future. Therefore, building from these current gaps and objectives of organization-level cyber-threat intelligence, we pose three research questions that are highly relevant for firms and cyber-security researchers:

  • RQ1: What are the determinants (both explicit and implicit) of the expertise of hackers {such as newbie, beginner, intermediate, or advanced} in dark forums?

  • RQ2: What is the likelihood of getting attacked by a hacker even after a successful cyber-threat intelligence analysis?

  • RQ3: What will be a firm's cyber-risk mitigation strategy {expected loss, probability} when faced with different types of attacks from these hackers for {attack-type, hacker expertise}?

We seek answers to these questions by proposing a two-stage framework, as shown in Fig. 1. In the first stage, we built a cyber-risk assessment module, using hacker-expertise as an input that we measured by (i) quantitative features, or explicit (by examining participation behaviour of the hackers within the dark forum), and (ii) qualitative features, which were implicit (by analyzing the content of communication made by hackers). This module provides us with the probability of correctly classifying hackers into various expertise levels: novice, beginner, intermediate, and advanced. In the second stage, we built a cyber-risk mitigation module, where (i) we apply this probability to compute the expected losses arising from major attack-types which these hackers could launch if they went undetected, (ii) built a risk-impact matrix using the {expected loss, probability, attack-type, hacker expertise} tuple, and (iii) proposed cyber-risk mitigation strategies using the risk-impact matrix. This study found that firms are most vulnerable to phishing attacks that compromise personal and financial information [3,15], followed by virus attacks launched by midway groups of hackers such as intermediate and beginners. Based on these findings, this study proposed actionable risk mitigation strategies.

The remainder of this paper is organized as follows. Section 2 presents an overview of existing studies and theoretical premises on hacker forums and the “dark side of information technology”. Section 3 explores the data and describes the methodology. Section 4 presents the modelling techniques. Section 5 presents the empirical results. Finally, section 6 discusses the research findings from the results, implications of this study, and concluding remarks.

Section snippets

Background work on Hacker forums and dark-side of information technology

Scholars have examined hacker forums as a part of the literature on the “dark side of information technology” [16,18]. These forums allow hackers to exchange messages, malicious codes and other technical assets [4,22,45,46]. Often, these discussions help in breaching the computer networks of organizations and cause financial losses [7,18,40]. Primarily, there are four categories of dark web platforms that researchers have examined to extract first-hand cyber-threat intelligence [6], namely: (i)

Variables used to build the cyber-risk assessment module

We collected hacker-forum data for the Hackhound Forum available with AZSecure Portal, Artificial Intelligence Lab at the University of Arizona. Before pre-processing, the dataset consisted of 4242 forum posts by 834 unique hackers from October 2012 to September 2015, on a diverse set of hacking topics collected in 2015. First, we categorized the determinants of a hacker's expertise in a Darknet forum into explicit and implicit features. These were further subdivided into (i) forum-usage (

Cyber-risk assessment: probability computation for detecting expert hackers

We applied classification algorithms to offer a baseline performance evaluation for our multi-class hacker taxonomy problem. Due to the ordinal nature of the operationalized variables, we used M1a: k-Nearest Neighbor (k−NN), M1b: CART (Classification and Regression Tree) [12], M1c: Ensemble Boosted Tree [12], M1d: Multinomial Logit, and M1e: Hierarchical Logit in MATLAB. The generalized form of detection probability (or classification accuracy) is: p(Y = j| X1 = α1, X2 = α2, …, X14 = α14) where X1,

Results from the cyber-risk assessment module

Table 8 compares the model-building results from the six classification algorithms: the hierarchical logit classifier (M1e) performs best at 84.852% overall accuracy, followed by the multinomial logit (M1d) 83.025% accuracy. The CART-based decision tree (M1b) achieves an overall accuracy of 72.288%, the k-nearest neighbor algorithm (M1a) at 71.453%, and the boosted tree algorithm (M1c) performs at 81.146% overall accuracy. We compared our results with prior studies and found that none of then

Discussion of research findings from results

We discuss the research findings based on the results of the Hierarchical Logit Classifier presented in Table 10 and Eqs. (3), (4), (5). Among forum usage features, we find that days spent in the forum is insignificant in determining hacker expertise across all levels of expertise (β =  − 0.081; β =  − 0.008; β =  − 0.047). Our findings coincide with Benjamin et al. [5,6] and Chen et al. [13], who analyzed knowledge contribution behaviour in an online knowledge exchange community. However,

Acknowledgements

The authors express their sincerest thanks to the anonymous referees for their constructive suggestions. The authors also thank the Editor-in-Chief of this journal, Professor James R. Marsden, for his overall inputs and support throughout the revision process.

Baidyanath Biswas is an Assistant Professor of MIS and Analytics Group at the International Management Institute Kolkata, India. His research has appeared in Decision Support Systems, Electronic Markets, Computers in Industrial Engineering, and the Journal of Enterprise Information Management. Baidyanath is also associated with top peer-reviewed international conferences, namely, HICSS and ICIS. He has a rich industry-experience of nine years working as a mainframe and DB2 database analyst at

References (53)

  • S.J. Kim et al.

    The paradox of (dis) trust in sponsorship disclosure: the characteristics and effects of sponsored online consumer reviews

    Decis. Support. Syst.

    (2019)
  • H.C. Lin et al.

    What motivates health information exchange in social media? The roles of the social cognitive theory and perceived interactivity

    Inf. Manag.

    (2018)
  • A. Mukhopadhyay et al.

    Cyber-risk decision models: to insure IT or not?

    Decis. Support. Syst.

    (2013)
  • I. Park et al.

    Disentangling the effects of efficacy-facilitating informational support on health resilience in online health communities based on phrase-level text analysis

    Inf. Manag.

    (2020)
  • M. Salehan et al.

    Predicting the performance of online consumer reviews: a sentiment mining approach to big data analytics

    Decis. Support. Syst.

    (2016)
  • A. Vishwanath et al.

    Cyber hygiene: the concept, its measure, and its initial tests

    Decis. Support. Syst.

    (2020)
  • J. Wu et al.

    How to increase customer repeated bookings in the short-term room rental market? A large-scale granular data investigation

    Decis. Support. Syst.

    (2021)
  • K. Xie et al.

    Value co-creation between firms and customers: the role of big data-based cooperative assets

    Inf. Manag.

    (2016)
  • X. Yang et al.

    Modeling relationships between retail prices and consumer reviews: a machine discovery approach and comprehensive evaluations

    Decis. Support. Syst.

    (2021)
  • H. Akman et al.

    Co-creating value in online innovation communities

    Eur. J. Mark.

    (2019)
  • A. Armstrong et al.

    Net Gain: Expanding Markets through Virtual Communities

    (1997)
  • V. Benjamin et al.

    Securing cyberspace: identifying key actors in hacker communities

  • V. Benjamin et al.

    Examining Hacker participation length in cybercriminal internet-relay-chat communities

    J. Manag. Inf. Syst.

    (2016)
  • V. Benjamin et al.

    DICE-E: a framework for conducting Darknet identification, collection, evaluation with ethics

    MIS Q.

    (2019)
  • B. Biswas et al.

    G-RAM framework for software risk assessment and mitigation strategies in organizations

    J. Enterp. Inf. Manag.

    (2018)
  • B. Biswas et al.

    “Leadership in action: how top hackers behave” a big-data approach with text-mining and sentiment analysis

  • Cited by (30)

    View all citing articles on Scopus

    Baidyanath Biswas is an Assistant Professor of MIS and Analytics Group at the International Management Institute Kolkata, India. His research has appeared in Decision Support Systems, Electronic Markets, Computers in Industrial Engineering, and the Journal of Enterprise Information Management. Baidyanath is also associated with top peer-reviewed international conferences, namely, HICSS and ICIS. He has a rich industry-experience of nine years working as a mainframe and DB2 database analyst at Infosys and IBM. Currently, Baidyanath serves as the Associate Editor of the Global Business Review journal.

    Arunabha Mukhopadhyay is a Professor of Information Technology & Systems Area at Indian Institute of Management Lucknow (IIM Lucknow). He has obtained his Ph.D. and Post Graduate Diploma in Business Management (PGDBM) from the Indian Institute of Management Calcutta (IIM Calcutta), in the area of Management Information Systems. He was awarded the Infosys scholarship during his Ph.D. He has published in various referred journals and conferences including Decision Support Systems (DSS) Information Systems Frontier (ISF), Journal of Global Information Technology Management (JGITM), JIPS, International Journal of Information Systems and Change Management (IJISCM), Decision, IIMB Review, Hawaii International Conference on System Sciences (HICSS), Americas Conference on Information Systems (AMCIS), Pre-International Conference On Information Systems (ICIS) workshops, etc. He is the recipient of the Best Teacher in Information Technology Management in 2013 and 2011, by Star-DNA group B-School Award and 19th Dewang Mehta Business School Award, in India, respectively.

    Sudip Bhattacharjee is a Professor in the School of Business, University of Connecticut. He currently serves as Senior Research Fellow, US Census Bureau. Previously, he served as Chief, Center for Big Data Research and Applications, US Census Bureau. He is a Visiting Faculty at EMLYON Business School, France, and Indian School of Business. He was a Visiting Professor at GE Global Research Center, USA. He has previously served as the Assistant Dept. Head of Operations and Information Management, and as the Executive Director of MBA Programs, both in the School of Business, University of Connecticut. His research interests include data driven IT and operations management and policy, information systems economics, energy informatics, digital goods and markets, and closed loop supply chains. His research has appeared in premier journals such as Management Science, INFORMS Journal on Computing, Journal of Business, Journal of Law and Economics, ACM Transactions, Journal of Management Information Systems, IEEE Transactions, and other leading peer-reviewed publications. He serves or has served as Associate Editor for Information Systems Research (for 5 years), Special Issue Editor for ACM Transactions on Management Information Systems, guest AE for MIS Quarterly and Decision Sciences Journal.

    Ajay Kumar is an Assistant Professor at the AIM Research Center on Artificial Intellegence in Value Creation, EMLYON Business School in France. His research and teaching interests are in data and text mining, decision support systems, business intelligence and enterprise modelling. He has been Postdoctoral Fellow at Massachusetts Institute of Technology and Harvard Business School. He has published several research papers in reputed journals, including International Journal of Production Economics, Industrial Marketing Management, Telematics & Informatics, Technological Forecasting & Social Change, Annals of Operation Research, International Journal of Production Research, etc.

    Dursun Delen is the holder of Spears and Patterson Endowed Chairs in Business Analytics, Director of Research for the Center for Health Systems Innovation, and Regents Professor of Management Science and Information Systems in the Spears School of Business at Oklahoma State University. He authored/co-authored 100+ journal and 40+ peer-reviewed conference proceeding articles. His research has appeared in major journals including Decision Sciences, Journal of Production Operations Management, Decision Support Systems, Communications of the ACM, Computers and Operations Research, Computers in Industry, Artificial Intelligence in Medicine, International Journal of Medical Informatics, Expert Systems, among others. He has recently published ten books/textbooks in the broad area of Business Intelligence and Business Analytics. He is often invited to national and international conferences and symposiums for keynote addresses, and companies and government agencies for consultancy/education projects on Analytics related topics. He is currently serving as the editor-in-chief, senior editor, associate editor, and editorial board member of more than a dozen academic journals.

    View full text