1 Introduction

Pandemic events (also known as “Black Swan” events) can cause drastic effects to an organisation. Covid-19 is the most recent Black Swan event, where it has affected organisations on a global scale (Elluru et al. 2019; von Winterfeldt 1988; Kilpatrick and Barter 2020). The impact of Covid-19 has resulted in detrimental effects; ranging from placing organisations out of business to significant cost-reduction activities for businesses to survive and remain competitive (Akeem 2017). The detrimental effects of pandemic events have been more adverse on small and medium-sized enterprises (SMEs) (Beglaryan and Shakhmuradyan 2020) as many organisations are forced to operate with reduced resources; for example, business procedures and workforce practices are dramatically altered due to social distancing regulations (Kilpatrick and Barter 2020; Sheng et al. 2020). Unfortunately, although SMEs represent more than 95% of all businesses across the globe; they are more vulnerable to pandemics than conglomerates as the hardest hit sector (Thukral 2021). This is due to reasons such as less reserve cash to substitute for affected revenue streams, and their heavy reliance on supply chains (Beglaryan and Shakhmuradyan 2020). As a result, organisations are actively identifying key areas where capital investments should be focused (Magazine 2020), and are revisiting their day-to-day operations and supply chain activities, in order to progress out of the Covid-19 pandemic and be better prepared for the next Black Swan event, whether another global disease, a trade war or natural disasters. Unsurprisingly, one of the key areas where organisations are exploiting for addressing uncertainties is digital transformation (Kilpatrick and Barter 2020; Magazine 2020; Akter and Wamba 2019), in line with the era of Industry 4.0. This is in accordance with a top priority policy of the Organisation for Economic Co-operation and Development (OECD) to promote digitisation in SMEs (OECD 2021).

Over the past year, the insurgence of Covid-19 has provided a real need for organisations to invest in digital transformation to meet the challenges of a pandemic by leaning towards data-driven operations decisions (Magazine 2020; Sheng et al. 2020; Akter and Wamba 2019). One of the main digital transformation initiatives is big data analytics, which incorporates the utilisation of real-time analytics to better understand both positive and negative effects of an event and for proactive preparations (Kilpatrick and Barter 2020; Sheng et al. 2020). In this respect, Artificial Intelligence (AI) has progressed to undertake the uncertainties associated with probable pandemics imperatives to an organisation’s business model. This is achieved through the implementation of AI-driven systems and tools that allow an organisation’s data to be analysed for decision support purposes (Sheng et al. 2020). The importance of AI is incentized through the new normal of Covid-19, where the collection of digital data and footprints is becoming more prominent through digital devices and altered work forces (Sheng et al. 2020). Predictive analytics has also been deployed for condition monitoring and fault prevention (Tranter 2020).

The use of AI for decision support has been popular in areas such as healthcare during Covid-19 due to the lack of resources, where front-line workers are empowered with human–machine teaming platforms (Debnath et al. 2020; Ndiaye et al. 2020). In an era of pandemic where innovations are becoming a necessity, the design of human-centric systems is now more important than ever (Macdonald et al. 2020). This is because new business paradigms shift through initiatives such as digital transformation must incorporate the perspectives of humans (or users). In this respect, human-centricity ensures user acceptance of AI-based innovative tools, allowing humans to present their opinions to intelligent systems during the reasoning and decision-making processes to address the needs of the business (Griffith et al. 2019). Ultimately, human-centric innovations introduced into the business operations lead to a successful deployment for addressing uncertainties during Black Swan events, and to enable SMEs to see through extended times of business hardship.

The concept of human-centricity is further reinforced as the demands of Industry 4.0 on future workforces consist of both humans and non-human (e.g. intelligent machines) (Marnewick and Marnewick 2020; Shehadeh et al. 2017). Non-human inputs will be required as part of decision-making processes, while human inputs serve as an integral part of knowledge transfer whereby domain experts are engaged to utilise their domain expertise to make business decisions and actions (Marnewick and Marnewick 2020; Kagermann and Helbig 2013). With Industry 4.0 resulting in an excess of IoT (Internet-of-Things) devices (Ndiaye et al. 2020) enabling industries such as predictive maintenance, business environments now become heavily evolved around autonomous and complex systems that empower both digital capabilities and domain experts (Shehadeh et al. 2017). One key consequence of complex systems is that collaboration between humans and robotic systems are becoming critical. The human-AI/robotic corporation structure, and its effectiveness potentially remains a critical gap in research to enable coherent human-AI/robotic co-existence in Industry 4.0 (Shehadeh et al. 2017). As a result of the importance of human-in-the-loop requirements, we propose an integrated decision support framework incorporating both tacit knowledge from domain experts and reasoning and inference capabilities of AI models to achieve business competitiveness.

From the perspective of predictive maintenance, which is an area drastically revolutionized by Industry 4.0 (Marnewick and Marnewick 2020; Kagermann and Helbig 2013), effective strategies that permit predictive planning not only become a competitive advantage, but critical for success in pandemic environments (Kilpatrick and Barter 2020; Williams and Holland 2020). An accurate prediction into the future enables proactive planning that is instrumental in areas such as supply chain risk management and downtime reduction (Kilpatrick and Barter 2020; Tranter 2020). However, decision support systems empowered by AI in predictive maintenance demand a high-level of trust pertaining to user acceptance, in order to promote utilisation, generate throughput and autonomy (Chen et al. 2019). As stated in Nahavandi (2017), autonomy can be broadly divided into human-in-the-loop (HITL) and human-on-the-loop (HOTL) strategies. HITL covers systems/machines that require human commands in their operations, while HOTL encompasses systems/machines that execute tasks independently but with human supervision who can interfere when anomalies occur. In this respect, transitioning from HITL to HOTL requires sustainable trust between humans and machines, in order to fully realise the benefits of autonomous systems. Additionally, in Nahavandi (2019), it is postulated that autonomous systems would become an integral part of human daily activities, which is inevitable as evidenced by the penetration of industrial robots into production floors. Correspondingly, establishing faith by humans toward autonomous robots would be rooted in predictability, dependability, and extended experience with the robots. To ensure AI-based decision support systems obtain the necessary traction to yield business benefits, a key factor of achieving human-centricity is to effectively foster human–machine trust, especially under pandemic environments where uncertainty is prevalent. Unfortunately, not all data samples collected from the real world are usable and useful for AI deployment. Specifically, within predictive maintenance where good asset management practices intrinsically lead into imbalanced data issues, which would likely generate poor performing models with low level of human–machine trust, hence compromising business outputs (Chen et al. 2019).

The recognised challenges of transitioning into predictive maintenance within asset management under pandemic environments denotes the key motivations for this research. The aims and objectives of this paper are to derive an effective decision support framework with supporting machine learning applications, and to evaluate its effectiveness within real-world environments in pandemic environments. The significance of such research will provide theoretical and realistic outcomes that can be implemented or investigated further as part theoretical research or within an SMEs transition to more advance maintenance strategies in asset management. The occurrence of Covid-19 has further highlighted the value of SME organisations to be better prepared under pandemic environments, which is a key motivation of this project.

In this paper, we propose an AI-based decision support framework and demonstrate its applicability to predictive maintenance in asset management under pandemic environments. The AI-based decision support framework is based on a human-centric approach to ensure its user acceptance in the asset management domain where domain experts govern business actions. The framework exploits an enhanced trust-based Behaviour Knowledge Space (T-BKS) ensemble model to tackle issues related to imbalanced data, which is a common challenge in asset maintenance. As an extension of the standard BKS model (Huang and Suen 1993), T-BKS is compared with its baseline method and a recent state-of-the-art ensemble method (Chen et al. 2019) through benchmark data sets. Additionally, a real-world case study from a medium enterprise company will be utilised. This real-world case study incorporates domain expert knowledge pertaining to the outputs of the decision support system in predictive maintenance. Under the pandemic environments, AI-powered predictive maintenance tools with human-in-the-loop strategies are critical to survival (Williams and Holland 2020; Tranter 2020). The human–machine interaction is critical within model maintenance and enabling output as part of a predictive maintenance framework (Samatas et al. 2021), in which the proposed predictive maintenance framework fosters this notion. The results positively demonstrate that AI-based human-centric tools are highly applicable to the context of addressing uncertainties associated with pandemic preparedness, facilitating a paradigm shift through digital innovation for the day-to-day operations of an organisation under a dynamic, challenging environment.

2 Literature review

The importance of condition monitoring and predictive maintenance of complex systems for business to operate efficiently and effectively in pandemic environments has been widely recognised. As a result, new approaches to predictive maintenance have been developed. In this section, we review the literature on methods for condition monitoring and predictive maintenance. In general, condition monitoring and fault diagnostics exploit data extracted from machines, processes or systems for failure detection or prediction purposes (Tiddens 2018). However, transforming highly unstructured data into human interpretable results requires an effective methodology, particularly in the era of Industry 4.0 where voluminous data can be acquired through digital devices easily. Since Industry 4.0 is associated with the mega trend of big data and AI, numerous research studies have explored different data-based AI models to develop advanced diagnostic and prognostic capability for condition monitoring and predictive maintenance in radically changing (Bengtsson 2004; Tiddens 2018; Vachtsevanos and Wang 2001; Lebold et al. 2003).

Bengtsson (2004) examined feature classification within a condition-based maintenance system. A case library containing previously classified measurements and features was designed. All newly classified features (data measurements) were automatically added into the case library for reasoning by users when receiving a new case. The major advantage of this approach was the ability to capture the tacit knowledge of domain experts for incorporation into the condition monitoring system.

Vachtsevanos and Wang (2001) developed a condition monitoring framework that focused on diagnostic and prognostic tasks. The framework stressed the necessity of human intervention (similar to Bengtsson 2004) through a domain expert. In diagnostic applications, the domain expert identified component failures whereas in prognostic application, a case (work order) on the component through data analysis was developed. A ‘predictor’ module was proposed to utilise a dynamic wavelet neural network for feature classification. The domain expert was involved with the initial development of diagnostic and prognostic procedures as well as cost-benefit analysis, but the human-in-the-loop mechanism was absent, compromising acceptance of the outcome by users.

Traini et al. (2019) proposed an Industry 4.0 predictive maintenance framework for condition monitoring of equipment degradation and wear in milling operations. Specifically, the key objectives of the proposed framework, e.g., preventing unexpected breakdowns, optimising processes and improving human–machine interaction, were explained. The framework proposed a data pre-possessing step prior to feature engineering and multi-modelling procedures for utilising machine learning in prognostic tasks. The framework did not include a human–machine cooperation procedure, although notably a major output highlighted of the framework aimed to enhance human–machine interaction. The effectiveness of the proposed framework was evaluated using a real milling data set. The results indicated good levels of modelling performance through numerous regression and classification techniques, as well as improved human–machine collaboration within a production environment. However, the input and feedback to the proposed framework of knowledge workers was not presented.

Kiangala and Wang (2020) proposed an effective predictive maintenance framework for condition monitoring of conveyor motors. The framework consisted of a convolutional neural network for image classification. Detailed algorithmic and machine learning information were provided. Adjustment of the model over numerous iterations was conducted automatically through weight updates corresponding to classification results. The results indicated modelling performance of up to 100% accuracy, whereby the outputs could prolong and avoid conveyor motor breakdowns. Nevertheless, as a purely data-driven approach, the framework did not have a human–machine interaction element.

Bouabdallaoui et al. (2021) proposed a machine learning-based framework for predictive maintenance of building facilities. The proposed framework architecture consisted of multiple chronological procedures including data collection and pre-processing, model development, model deployment, and feedback and model improvement. A key component of the framework was the feedback and model improvement procedure comprising users’ feedback to re-train the machine learning models. Since the feedback procedure was not systematic, expertise was required to re-train the models. The authors evaluated the predictive maintenance framework on a real-world case study, focusing on the implementation process. Data collection challenges and asset uniqueness associated with different buildings were discussed. A long time frame pertaining to implementation of an effective predictive maintenance strategy was required before realising return on investments. As noted, this framework incorporated a human–machine feedback loop from front-line users to maintain models throughout and implementation.

Lebold et al. (2003) proposed an open system framework for condition monitoring and diagnostics, known as the Open System Architecture for Condition Based Maintenance (OSA-CBM). It contained several chronological layers as part of its composition, including data acquisition, data manipulation, condition monitor, health assessment, prognostics, decision support and presentation (Bengtsson 2004; Lebold et al. 2003). Li et al. (2017) proposed a systematic framework focusing on data mining methods for fault diagnostic and predictive maintenance. It served as an extension of the OSA-CBM model to include the latest technologies, such as cloud computing and machine-to-machine communication, for Industry 4.0. Sensory data of multi-modes were pipelined into a data warehouse, which was followed by diagnostics and prognostics through data mining and decision support models. The study also stressed the need for a maintenance strategy to form part of the framework as a ‘maintenance implementation module’. The results indicated the advantage of such approach coupled with a neural network for adaptability and generalization to tackle different failure modes. Conversely, the major challenge identified was the framework’s inability to provide explanatory capability to ensure user acceptance.

For a holistic view of maintenance frameworks, the related studies on predictive maintenance, and the common and different key points are presented in Table 1.

Table 1 A comparison of Internet-of-Things (IoT) and decision support frameworks in predictive maintenance

In summary, it is clear that human-centric AI-based mechanism remains as an obvious research gap in designing and developing intelligent predictive maintenance and decision support frameworks. Categories of AI-based prognostics, condition monitoring and diagnostics, data pre-processing and asset data extraction are well established. Although the well established categories are critical for a predictive maintenance strategy deployment, the domain expert interface enabling front-line throughput through collaboration is quickly becoming a core focus (Shehadeh et al. 2017). This highlights a necessity to further include and investigate human-AI cooperation in complex systems within Industry 4.0.

2.1 Imbalanced data

The issue of imbalanced data has been recognised as one of top ten machine learning challenges (Yang and Wu 2006). The phenomenon of imbalanced data occurs when there are exceedingly more data samples associated with a particular class. Learning from imbalanced data sets incurs great complexity in achieving high accuracy. This is due to the skewed class distribution leading to under-represented information associated with the minority class(es) (He and Garcia 2009). Examples of imbalanced data problems include sentiment analysis (Xu et al. 2015), natural language processing and text mining (Li et al. 2010), software fault detection (Malhotra 2015), medical diagnosis including cancer identification (Krawczyk et al. 2016), credit risk and loan defaults (Birla et al. 2016), and fault diagnostic in condition-based maintenance (Lee et al. 2016). Evidently, the wide range of industries where imbalanced data challenges are applicable further emphasises the importance of addressing this issue in the context of decision support to ensure better preparedness for uncertainties during Black Swan events.

In the realm of predictive maintenance, failure events realistically only cover a small fraction of the overall operation of the processes/systems. From a data perspective, this causes a highly imbalanced data set. Hence, it is essential to realise the consequences of false-positives in terms of predicting a false event and missing out on incoming catastrophic failures. In an ideal machine learning model, the training and test data sets are appropriately split. Researchers have placed effort in identifying effective methods to undertake imbalanced data problems. In tackling the issue of imbalanced data, the 3-level approaches incorporating different methods are recognised as established methods (Sun et al. 2011). These 3-level approaches include data-level approach, algorithm approach and cost-sensitive learning. Within the context of predictive maintenance, data-level approaches are the most practical as algorithm-level and cost-sensitive learning requires fault contextual information of the condition-monitored equipment which is not always obvious or available (Wagner et al. 2016).

A widely recognised data-level approach to imbalanced data is data sampling. There are two general sampling methods: removing samples from the majority class containing non-failure data (under-sampling), or over-sampling the minority class to subsidise minimal training data (Sun et al. 2011; Cernuda 2019). A list of sampling methods (over or under sampling) (Cernuda 2019; Wagner et al. 2016; Sun et al. 2011; Maheshwari et al. 2011) is presented in Table 2.

Table 2 Sampling methodologies

The algorithm level approaches incorporate classifier models to boost learning by introducing appropriate biases (regularisation) for the minority classes. This is dependent on the model chosen and whether the data set consists of one-class or multi-class problems. For the one-class case, the popular learning models include neural networks and support vector machines (SVMs), whereas for the multi-class case, many established algorithms can be utilized (Sun et al. 2011). Boosting is a common method of the algorithm level approach (Wagner et al. 2016; Carbery et al. 2018)

The role of cost-sensitive learning is to minimise a cost function to learn from incorrectly classified data. Such learning approach is able to consolidate the context to compensate for imbalanced data. This can be viewed as an assignment of a penalty to incorrectly classified data. In theory, this technique employs a cost matrix during the model building phase, where the lowest cost model is selected. Sun et al. (2011) explained three main categories of cost-sensitive learning to imbalanced data: updating the weights with cost items, optimising the learning algorithms to include a misclassification cost to be minimised, and utilising probability (e.g. Bayesian theory) to classify data into the ‘lowest risk class’. The use of cost-sensitive learning with an appropriate cost function in Spiegel et al. (2018) proved to be effective in combining sets of information to create a business-orientated predictive maintenance strategy.

3 AI-based human-centric decision support framework

3.1 Proposed framework

Under dynamic business environments, it is essential that organisations remain adaptive to changes. This requirement transcends the importance of adopting a decision support framework for predictive maintenance that promotes adaptivity to reduce uncertainties. Within an asset, changes occur over time due to various internal and external factors, e.g. wear and tear of different parts, influence of weather conditions, and accidents. These challenges further increase the adversity of uncertainty in dynamic environments. As well as being adaptive, it is crucial that such decision support framework establishes a strong human–machine teaming component to ensure user acceptance within real-world environments; allowing business actions and informed decision to be made promptly. Figure 1 depicts the proposed AI-based human-centric decision support framework for predictive maintenance.

Fig. 1
figure 1

AI-based human-centric decision support framework

The proposed framework is divided into the following key interconnected components:

  1. 1.

    Assets: In essence, an asset under monitoring needs to be an IoT-enabled entity with sensory data extraction capabilities, and from which data samples are collected from.

  2. 2.

    Asset Knowledge: Asset knowledge embedded within this decision support framework promotes the compilation and adoption of tacit knowledge from domain experts. This component is continually updated to provide the latest knowledge and guidelines to the users (both experts and non-experts) for them to maintain and gain knowledge in their respective field as well as adapt to business changes. Within this framework, the tacit knowledge solicited from domain experts is transformed into well-defined business rules. These business rules are complemented by intrinsic patterns mined from data samples using AI models, leading to a knowledge base that aims to bridge human–machine trust in using the decision support framework. The knowledge base includes statistical features that dictate when the data samples captured from the asset under scrutiny is performing normally or is failing. Bridging human–machine trust enables business actions to be undertaken. Another relevant implication includes a precaution solution for highly unpredictable fault diagnostics which can be validated by business rules, machine learning inference, and/or domain experts.

  3. 3.

    Machine learning for imbalanced data: The issue of imbalanced data is profound in predictive maintenance, rendering learning to be unconventional as machine learning models tend to better learn the representation of the majority class, i.e. more non-failure data are available for training. As such, we propose a new trust-based BKS (T-BKS) ensemble model is to undertake issues related to imbalanced data with machine learning models. The human-feedback loop of this proposed framework enables re-assessment of machine learning predictions in the event when conflicts between machine predictions and human knowledge pertaining to the asset under monitoring occur.

  4. 4.

    Predictive maintenance (under pandemic environments): Central to this AI-based decision support framework is that feedback from domain experts are taken into consideration within the machine learning process. When a discrepancy between the business rule and machine learning prediction occurs, domain experts are involved in reviewing and validating the outcome. The validated outcome is then used to update the machine learning models. As a result, under the scenario of class imbalance, the machine learning models are able to yield predictions that are formulated in line with the knowledge of domain experts. This leads to building a trust-based relationship between humans and machines. As predictive maintenance under pandemic environments constitutes to the bottom line of a business, a high-level of trust is required.

The core element of the AI-based framework is the domain expert, in which a human–machine teaming cycle is initiated to foster trust from users towards the predictions from machine learning models. Indeed, the framework design hinges on human-centricity, which is inspired through the notion that business processes require domain expert intervention to resolve complex operational issues efficiently and effectively under pandemic environments. The human-centricity involves the amalgamation of asset knowledge and machine learning for imbalanced data analytics. This involves a comparison step where statistical features of asset knowledge and the machine learning features are contrasted by the domain expert. A trust score can be developed if required to assist with a statistical feature comparison. Notable similarity calculation methods can include Euclidean distance or cosine similarity.

3.2 Trust-based behaviour knowledge space ensemble model

Introduced in Huang and Suen (1993), the BKS is a decision combination method that creates a reference table (akin to a pivot table) to combine the predictions from a pool of classifiers. Specifically, it consists of a K-dimensional space corresponding to K classifiers, where each dimension corresponds to the prediction of one classifier. Improvements on the modelling procedure of the BKS have been proposed (Zhang et al. 2001; Yang and Zhang 2006), while the applications of the BKS to image processing have been demonstrated (Souvannavong and Huet 2006; Monson and Kumar 2017).

The BKS method involves two stages: knowledge modelling and operation (Huang and Suen 1993). In the first stage, the K-dimensional space is constructed. Given M target classes, each dimension is discretized to (M + 1) number of grids corresponding to the prediction of a classifier, with the (M + 1)th grid being a “I don’t know” prediction. The BKS forms a pivot table where the combination of predictions from K classifiers points to a BKS cell that records the number of samples coming from the target classes (Huang and Suen 1993).

The second stage involves the BKS operation, where a combined prediction pertaining to the output class with respect to the current input sample is made. This combined prediction indicates the best representative class. Given a knowledge space BKS(\(e(1),e(2),\ldots ,e(K)\)) is notated as \(R_{e(1), e(2) \ldots e(k)}\). The decision rule to calculate \(R_{e(1), e(2) \ldots e(k)}\):

$$\begin{aligned} E(x) ={\left\{ \begin{array}{ll} R_{e(1), e(2) \ldots e(k)}, &{}\quad \text {when } T_{e(1), e(2) \ldots e(k) } > 0\text { and when}\\ &{}\quad {\frac{n_{e(1), e(2) \ldots e(k)}(R_{e(1), e(2) \ldots e(k)})}{T_{e(1), e(2) \ldots e(k)}}}\ge \gamma _{1} \\ M + 1, &{}\quad \text {otherwise} \end{array}\right. } \end{aligned}$$
(1)

The proposed T-BKS ensemble model explicates on the standard BKS method to incorporate a trust element against each individual classifier to reflect its level of confidence in yielding a correct prediction. The T-BKS ensemble model is an algorithm level to better treat the biases involved with training on highly imbalanced data. This is different from the standard BKS method that relies purely on counting the data samples belonging to different target classes and recording them in each cell; which is likely to be skewed to the majority class under class imbalanced scenarios. To solve this limitation, T-BKS elucidates a performance measure for each individual classifier as well as exploits the records in the BKS cells prior to reaching a final prediction of the output class.

The design of T-BKS for a binary classification problem involves two stages; (1) derive the BKS for K classifiers through the standard BKS procedure (Huang and Suen 1993) and record the sensitivity and specificity of each classifier using a validation data set (size determined by ratio \(\omega \)); (2) compute the ’trust’ variable for each decision combination \(C_{n}\) (\(C_{n} = (e(1),e(2),\ldots , e(K)\)) of the K classifiers. During evaluation, the final classification of T-BKS is determined by the larger ‘trust’ score with respect to a particular decision combination \(C_{n}\) pertaining to a given test data sample.

In the first stage, the sensitivity and specificity metrics of each K-classifier are established through a validation procedure. They can be computed using the confusion matrix, as in Eqns. (8) and (9) respectively. This is conducted in parallel with the knowledge-space development. By the end of stage one, a knowledge space of a given data set for a pool of K classifiers (denoted as E) along with the sensitivity and specificity metrics of each classifier are formed.

In the second stage, the ‘trust’ measure for each decision combination \(C_{n}\) is elucidated. The ’trust’ variable for a binary classification problem is computed as follows. Let \(C_{n}\) denote a decision combination of K classifiers such that \([e_{1}(x) = j_{1}, e_{2}(x) = j_{2} , \ldots , e_{K}(x) = j_{K} ]\) and \(n \; \epsilon \; [1,K!] \). Then for a given BKS cell, let \(\sum BKS_{m}(C_{n})\) = \(n_{e(1),e(2),\ldots ,e(K)}(m)\) (notation in Huang and Suen (1993)) be the cell count pertaining to a given classification m from \(C_{n}\). For an imbalanced binary classification problem, let N represent majority classification and F as minority classification, i.e. two possible decision values \(m \; \epsilon \; \{N,F\}\). Then T-BKS can be expressed as a duple \((\text {T-BKS}_{m} \; \epsilon \; \{\text {T-BKS}_{N},\text {T-BKS}_{F}\})\) prior to assigning E(x) = \(R_{e(1),e(2),\ldots ,e(K)}\) = j from the classifier pool E (Huang and Suen 1993).

$$\begin{aligned} \text {T-BKS}_{\text {N}}= & {} w_{1}\frac{\sum \text {BKS}_{N}(C_{n})}{\sum \text {BKS}_{N}(C_{n}) + \sum \text {BKS}_{F}(C_{n})} \nonumber \\&\quad +\,w_{2}P_{e_{1}}(T^{+|-}|D^{+|-}) + w_{3}P_{e_{2}}(T^{+|-}|D^{+|-}) \nonumber \\&\quad +\,\cdots \cdots + w_{K}P_{e_{K-1}}(T^{+|-}|D^{+|-}) \nonumber \\&\quad +\,w_{K+1}P_{e_{K}}(T^{+|-}|D^{+|-}) \end{aligned}$$
(2)
$$\begin{aligned} \text {T-BKS}_{\text {F}}= & {} w_{1}\frac{\sum \text {BKS}_{F}(C_{n})}{\sum \text {BKS}_{N}(C_{n}) + \sum \text {BKS}_{F}(C_{n})} \nonumber \\&\quad +\,w_{2}P_{e_{1}}(T^{+|-}|D^{+|-}) + w_{3}P_{e_{2}}(T^{+|-}|D^{+|-}) \nonumber \\&\quad +\,\cdots \cdots + w_{K}P_{e_{K-1}}(T^{+|-}|D^{+|-}) \nonumber \\&\quad +\,w_{K+1}P_{e_{K}}(T^{+|-}|D^{+|-}) \end{aligned}$$
(3)

where \(P_{e_{i}}(T^{+}|D^{+})\) represents the sensitivity and \(P_{e_{i}}(T^{-}|D^{-})\) represents the specificity rate associated with a given classifier \(e_{i}\). Depending on the classification of a \(e_{i}\) (\(j_{i} = N\) or \(j_{i} = F\)) within a decision combination, the "importance" of each classifier can either be its sensitivity or specificity metric. As an example, if classifier e(1) predicts a majority classification for one test data sample, then only \(P_{e_{1}}(T^{-}|D^{-})\) is used for \(w_{2}\) in calculating both trust measures before a final classification is reached. The relevant equations are as follows:

$$\begin{aligned}&P_{e_{i}}(T^{+}|D^{+}) = \frac{\text {TP}_{{j_{i}=N}|j_{i}=F}}{\text {TP}_{{j_{i}=N}|j_{i}=F} + \text {FP}_{{j_{i}=N}|j_{i}=F}} = \frac{\text {TP}_{j_{i}=m}}{\text {TP}_{j_{i}=m} + \text {FP}_{j_{i}=m}} \end{aligned}$$
(4)
$$\begin{aligned}&\quad \quad P_{e_{i}}(T^{-}|D^{-}) = \frac{\text {FP}_{{j_{i}=N}|j_{i}=F}}{\text {TP}_{{j_{i}=N}|j_{i}=F} + \text {FP}_{{j_{i}=N}|j_{i}=F}} = \frac{\text {TP}_{j_{i}=m}}{\text {TP}_{j_{i}=m} + \text {FP}_{j_{i}=m}} \end{aligned}$$
(5)

where \(w_{z}\) represents an associated weight assigned by prediction \(j_{i}\) of \(e_{i}\) and variable \(\text {T-BKS}_{m}\):

$$\begin{aligned} w_{z=2,3,\ldots ,K+1}= & {} {\left\{ \begin{array}{ll} 0 \; \; \; e_{i}(x) = j_{i} \; \; \text {s.t} \; \; j_{i} \ne (m \; \epsilon \; \text {T-BKS}_{m})\\ \frac{1}{y} \; \; \text {s.t.} \; \; y = K + 1 - \sum [e_{i}(x) = j_{i} \; \; \; \; \; \; \; \; \ne (m \; \; \epsilon \; \; \text {T-BKS}_{m})] \\ \end{array}\right. } \nonumber \\&\forall \; C_{n} \end{aligned}$$
(6)

The weights are assigned to abide \(\sum ^{K+1}_{z}(w_{z}) = 1\), achieving normalisation for the weights with respect to a given \(\text {T-BKS}_{m}\). As \(w_{1}\) is not associated with a classifier \(e_{i}\), \(w_{1}\) cannot be 0. Hence, the standard BKS method is always involved in calculating \(\text {T-BKS}_{m}\)

The best classification of T-BKS can be obtained by selecting the higher of \(\text {T-BKS}_{N}\) and \(\text {T-BKS}_{F}\):

$$\begin{aligned} E(x) = R_{e(1),e(2),\ldots ,e(K)}= & {} {\left\{ \begin{array}{ll} \text {Maj} \; \; \; \text {T-BKS}_{N} > \text {T-BKS}_{F}\\ \text {Min} \; \; \; \text {T-BKS}_{F} \ge \text {T-BKS}_{N}\\ \end{array}\right. } \nonumber \\&\forall \; C_{n} \end{aligned}$$
(7)

4 Results and discussion

4.1 Experimental study

To validate the effectiveness of the proposed T-BKS ensemble model, an evaluation study with benchmark and real-world data is conducted. A methodology to encompass an effective knowledge space method in imbalanced data classification environments is developed. The results are compared with those of standard BKS, majority voting (MV) and a recent state-of-the-art ensemble method proposed for imbalanced data classification. Specifically, the comparison covers:

  1. (i)

    T-BKS ensemble versus MV: The aim is to determine the comparative performances of T-BKS against the well-known and basic ensemble method of MV.

  2. (ii)

    T-BKS ensemble versus standard ensemble BKS: The aim is to determine the comparative performances against standard BKS of which T-BKS is designed from.

  3. (iii)

    T-BKS ensemble versus DBE-DCR (Distance-based Balancing Ensemble with Distance-based Combination Rule) method (a recent state-of-the-art ensemble method) proposed in Chen et al. (2019). The aim is to evaluate the effectiveness of T-BKS against that of a recent ensemble method with excellent performance under class imbalance.

A number of imbalanced data sets from the KEEL repository (Alcala-Fdez et al. 2011) are used for evaluation, as detailed in Table 3. To ensure a fair comparison, the method in Chen et al. (2019) is adopted in the experimental study. A 10-fold cross-validation procedure is used. Additionally, for each fold, the Synthetic Minority Oversampling Technique (SMOTE) method is used to synthetically add more samples to the minority class (Chawla et al. 2002). Note that SMOTE is applied to training and validation samples, but not test samples, in order to preserve the actual data characteristics for evaluating the performance of BKS and T-BKS pertaining to their efficacy in real environments. For the K-classifiers, we select the one-class support vector machines (OC-SVM) as the underlying algorithm with the radial basis function (RBF) as the kernel (Cristianini and Shawe-Taylor 2000). To obtain a robust comparison outcome, a statistical test is used to determine significance in performances of the different ensemble methods against T-BKS. Specifically, the pairwise signed test method is adopted (Dixon and Mood 1946). For statistical significance to occur at 95% confidence (i.e. \(\alpha < 0.05\)) for our evaluation, a minimum of 12 wins out of 15 is required.

Table 3 Details of imbalanced datasets

Accuracy (Acc), sensitivity (Sens), specificity (Spec) are used for the performance comparison studies (i) and (ii) while the AUC (area under the receiver operating characteristic (ROC) curve) are used in all three performance comparison studies. The results are calculated by taking the average of 10 iterations and 10-fold cross-validation (i.e. total aggregate of 100 outputs). From the confusion matrix, the Acc, Sens, and Spec rates can be obtained, i.e. (Table 4)

$$\begin{aligned} \text {Sensitivity}= & {} \text {TPR} = \frac{TP}{TP + FN} \end{aligned}$$
(8)
$$\begin{aligned} \text {Specificity}= & {} \text {FPR} = \frac{TN}{TN + FP} \end{aligned}$$
(9)
$$\begin{aligned} \text {Accuracy}= & {} \frac{TP + TN}{TP + TN + FP + FN} \end{aligned}$$
(10)
Table 4 Confusion matrix

Table 5 depicts the performance results of T-BKS and the baseline methods. Referring to AUC, it is evident that MV is a weak performer across all imbalance ratios. At lower imbalance ratios, the results achieved by all methods depict less fluctuation. To obtain a quantitative result of the benchmark comparison studies across all 4 metrics, the statistical test results are presented in Table 6.

Table 5 Detailed performance results for baseline comparison
Table 6 Pairwise sign test for baseline benchmark (significance at \(\alpha > 0.05\))

Comparing between T-BKS and the majority voting methods, the former yields significantly better AUC and Sens results, based on the signed test. This indicates the effectiveness of the rules introduced in T-BKS in combining the predictions with respect to the data samples from the “minority” category, contributing to better sensitivity and, subsequently, AUC results. Comparing between T-BKS and BKS, the former yields significantly better Acc and Spec outcomes, based on the signed test. This indicates the effectiveness of the rules introduced in T-BKS for combining the predictions with respect to the data samples from the “majority” category, contributing to better specificity and, subsequently, accuracy results.

Table 7 depicts the AUC results of T-BKS and those in [16], where only AUC results are reported. The SVM sub-classifier results from Chen et al. (2019) only contains AUC performance results. The SVM sub-classifier results from Chen et al. (2019) are used for comparison purposes. Both data processing methods of DBE-DCR results have been used for comparison; namely data splitting from clustering (C\DBE-DCR) and data splitting by random (S\DBE-DCR). The results have been computed by following the iterative and cross-validation procedure of Chen et al. (2019), in order to ensure fairness in comparison. Based on the AUC scores and the statistical test outputs presented in Table 8, T-BKS demonstrates a competitive performance as the DBE-DCR model that has been designed specifically to tackle imbalanced data problems.

Table 7 Detailed AUC results for state-of-the-art ensemble benchmark
Table 8 Pairwise sign test for state-of-the-art ensemble benchmark (significance at \(\alpha > 0.05\))

Overall, by taking into consideration the respective Sens and Spec results of individual members in the decision combination heuristic rules formulated in T-BKS, an effective ensemble model to undertake imbalanced classification problems is introduced. This is evidenced in this comparison with the 15 benchmark imbalanced data sets; namely T-BKS outperforms MV in AUC and sensitivity, T-BKS outperforms BKS in accuracy and specificity, and T-BKS achieves comparable results with those of DBE-DCR (Chen et al. 2019), which is a recent state-of-the-art ensemble model, from the statistical signed test outcomes.

The theoretical implication of T-BKS verification’s outlining the proposed ensemble model’s capability to tackle imbalanced data implies it’s fit-for-purpose within predictive maintenance in asset management. As mentioned in Sect. 1, imbalanced data issues challenge front-line throughput which is critical within a Human-AI cooperation system. As a result, T-BKS will be utilised within the real-world case study presented in Sect. 4.2.

4.2 Real-world case study

To verify the effectiveness of the AI-based human-centric decision support framework in predictive maintenance under pandemic environments, a real-world case study is presented. This case study involves a critical component in an asset from Company.X. Each component contains data acquisition and IoT capabilities. Additionally, a volunteer domain expert is also included in the research, who has over 5 years of experience in the maintenance and engineering of the asset. There are excessive numbers of these components collectively across all assets and are deemed critical to day-to-day operations. As a result, these components have been recognised to be highly important, but possess uncertainties with respect to their time-to-failure predictions; contributing to the adverse effects on business operations. To combat this, we undergo this study by deploying the framework to achieve predictive maintenance under pandemic environments. All non-asset related information (i.e. operational data) is intrinsically utilised through the domain expert with knowledge that is closely aligned with business objectives and current contexts.

To deploy T-BKS, we extract statistical features from data collected from the assets. The training data set has been obtained from historical failure data verified by business rules pertaining to the asset knowledge. A normal operation data set has been obtained from the same time period as the failure data set from non-failed assets. Statistical features, which include minimum value, maximum value, median, average, kurtosis, variance, skewness and area under the Receiver Operating Characteristic (ROC) curve of data samples were computed. In consultancy with the domain expert, these statistical features were extracted from certain frequency signals from the asset under monitoring. The results of machine learning for imbalanced data of the real-world case study are presented in Table 9.

Table 9 Detailed performance results for all methods

Referring to the results in Table 9 a similar performance is achieved between MV and BKS. From the average of five-fold cross-validations, T-BKS outperforms the baseline benchmark methods in AUC and Sens. This is achieved through T-BKS’s methodology of incorporating Sens and Spec as of the classifiers as trust metrics prior to assigning the final predicted class. These high performances depict that the proposed T-BKS ensemble method and the modelling approach is effective, and ascertains the practicality of machine learning for imbalanced data in real-world scenarios.

The main objective of the AI-based human-centric framework is to combine domain expert knowledge and machine learning to achieve predictive maintenance. Predictions from machine learning models can be contrasted with known business rules to ensure predictions are in-line with the domain expert’s tacit knowledge. As a result, predictive analytics enabled through this framework permits the company to plan and conduct better maintenance services, which is critical in pandemic environments. The proposed framework has been deployed into operations within the company, where the T-BKS has been proven effective.

Given a discrepancy between the business rule and machine learning prediction, the decision support framework presents both contrasting information to the domain expert for assessment. The domain expert is provided with the statistical features associated with the machine learning prediction and the triggered business rule before making a final decision. If the machine learning prediction meets the domain expert’s knowledge, the respective business rule is altered. If the opposite occurs, the trust score of the machine learning model is penalized through the updated sensitivity/specificity rates. The aim of this human-in-the-loop approach in addressing discrepancies enables better user acceptance so that the day-to-day operations efficiency can be maximised.

The predictions from T-BKS on the asset component over a six-month period are compiled, along with the evaluations of domain expert, to validate the effectiveness of the proposed framework, as shown in Table 10. In total, 11 predictions of the component have been collected. The evaluation results including the business action taken, business benefit, and the similarity score to assist with the ease of statistical feature comparison (calculated using cosine similarity) for the domain expert are also presented.

Table 10 Summary of AI-based human-centric decision support with T-BKS results

Across 11 iterations, it can be seen in Table 10 that the domain experts concluded 9 out of 11 “Yes” evaluations. As a confidence score in percentage, this is approximately 82% over half a year, indicating that the similarity measure is important. A higher similarity indicator increases the probability of an agreed domain experts evaluation and vice-versa. As the similarity scores are compiled based on the domain experts’s knowledge and business rules, this leads to better user acceptance for most new predictions. Furthermore, the human evaluation results depict that the procedure of extracting domain experts knowledge into business rules is effective in encapsulating a human-centric approach to addressing imbalanced data in predictive maintenance. As mentioned previously, the issue of imbalanced data is a bottleneck in bridging human–machine trust; preventing timely business actions to be undertaken in events of uncertainty under pandemic environments.

Referring to the AUC performance metric (Table 9), the T-BKS model is reinforced by an domain experts confidence of approximately 82% (9 out of 11) through statistical feature comparison assisted with a similarity score. Although such metrics (AUC and confidence) are not exact, the retrieved confidence is considered a good outcome for a small evaluation of 11 new predictions. It is also noted that no cases of missed faults have been identified over the six-month period. Additionally, the business action of ’Closely Monitor’ (ID #5) drastically decreases uncertainty of asset failing during operations due to prediction of T-BKS that the asset performance to be anomalous, as well as removal of laborious effort and manual time requirements needed to identify specific assets for close monitoring. For ID #11, the asset demonstrated characteristics similar to a previous verified failure scenario and those of false positives; resulting in rejection by the domain expert. By incorporating the domain expert’s insightful evaluation and the model performance metrics, an informed decision can be reached. Using this two-tier mechanism encompassing human knowledge and machine intelligence, the acceptance of a prediction, or the occurrence of a large discrepancy can initiate a review of either the business rules or the machine learning model.

5 Managerial implications

The importance of AI has become prominent under pandemic environments, particularly Covid-19. Day-to-day activities of SMEs are being reviewed to ensure lean operations for initiatives such as cost-savings as revenue streams are jeopardized. Within the realm of predictive maintenance in asset management, supply chain and maintenance activities have become more reliant on predictive capabilities to ensure cost-effectiveness and success in operations. The uncertainties of asset behaviours and time-to-failure further contribute to the uncertainties in pandemic environments. To combat this, we have proposed an AI-based human-centric framework for decision support to better prepare for effective and efficient business operations during black-swan events. To validate the effectiveness of the framework under real-world scenarios, we elucidate the business benefits achieved for a collaborating medium-enterprise company.

As depicted in Table 9, the business benefits include downtime cost avoidance and decreased asset uncertainty. Correspondingly, there are many extra and associated downstream benefits. Firstly, downtime cost avoidance is enabled through predictive maintenance; preventing the asset from an underlying failure during critical day-to-day operations, and any knock-on costs to cover the downtime under pandemic environments. In the context of the collaborating company, the associated downtime cost avoidance category includes customer, performance and safety improvements. The cost benefits via the AI-based human-centric decision support framework has been elucidated to achieve 100% of cost avoidance individually for the asset under scrutiny. Additionally, better scheduling and supply chain planning is achieved, which is an important factor for tackling uncertainty under pandemic environments.

Similar to downtime cost avoidance, the benefits of decreased asset uncertainty encompass elements of customers, performance, scheduling and planning as well as safety. As an example, for the scenario of a closely monitored business action (e.g., ID #5 in Table 9), a greater level of asset health visibility has been achieved through the proposed framework. The associated action is important for navigating through uncertainties in pandemic environments, since the ability to predict into the future accurately becomes critical to success. A quantitative business benefits realisation is presented in Table 11. The metrics are presented as percentages of the maximum potential cost avoidance per incident, which provide an indication on to budgeted and un-budgeted costs avoidance of downtime and decreased asset uncertainty associated with the assets.

Table 11 Approximate quantitative cost avoidance (%) of decision support framework and T-BKS Results

As organisations are forced to operate leaner to meet financial demands and to ensure stability, many business activities are conducted with reduced human resource support; especially within SMEs where a smaller cash reserve leads to more strict cost-control procedures. Limited resources as the norm and in simultaneity with uncompromising asset management demand from clients is no longer an unrealistic expectation. Positively, it has been demonstrated in this study that the proposed decision support system enables scalability of effective condition monitoring to achieve certainty of asset health without increasing overhead demand within the collaborating company. This is recognised by the company as a significant advantage of using an AI-based decision support tool to maximise the benefits of predictive maintenance, which is deemed a maintenance strategy fit for navigating the realm of pandemics and Black Swan events.

Based on the results depicted in Table 9, the business benefits realisable and discussed in Sect. 5, and Table 11, this real-world case study has shown promising results in mitigating the pandemic environmental impacts. It is evident that the proposed AI-based human-centric decision support system is practical in providing collaborative effects in asset management where imbalanced data challenge the transition into predictive maintenance; a maintenance strategy capable in reducing the adverse impacts of pandemic events such as Covid-19 for asset management organisations.Overall, from the managerial standpoint, the justifications for AI investment are more tangible as businesses begin to transition into digital avenues within Industry 4.0 settings. In particular, the implementation of AI systems is best suited during non-pandemic periods where resources are more available to be devoted to digital innovations, in preparation for unforeseen situations.

6 Conclusion and future direction

The impact of pandemic environments on predictive maintenance has prompted organisations to seek investments in digital transformation initiatives. Comparing with large corporations, the effects of pandemic environments on SMEs are more adverse due to less available resources and more reliance on supply chains (Beglaryan and Shakhmuradyan 2020). The main contribution of this research is the proposal of an AI-based human-centric decision support framework enabling predictive maintenance in asset management under pandemic environments for SMEs. On the one hand, the effectiveness of data-based AI tools is compromised due to imbalanced data issues in predictive maintenance. On the other hand, the lack of human–machine trust from the perspective of users poses another challenge. As a result, we have designed a T-BKS ensemble model coupled with a human-centric approach to decision support in predictive maintenance in dynamic situations under pandemic environments. Our real-world case study positively demonstrates that the proposed AI-based decision support framework has a significant advantage with the human-in-the-loop feature embedding domain expert tacit knowledge. The advantage of digital transformation has been validated with a collaborating company through business benefits in various aspects within predictive maintenance such as downtime cost reduction, effective and efficient scheduling and planning, and better preservation and utilisation of knowledge worker in situations where reduced workforces are common under pandemic environments, especially within SMEs where cost-cutting is inevitable to survive through the challenges in pandemic periods. To distil wider adoption, further research into AI-based human-centric frameworks within different industries that consist of domain experts would be highly beneficial. Additionally, the current framework assumes there is only one human domain expert interaction. The notion of multiple domain experts incorporated within could yield an additional interesting research. This includes domain expert knowledge conflicts amongst themselves and with models; such findings could improve the overall trust development yielding more practicality within real-world scenarios under pandemic environments.