An optimal fault detection threshold for early detection using Kullback–Leibler Divergence for unknown distribution data
Introduction
Fault detection plays a key role in enhancing today׳s technological systems high demands for performance, productivity and security. The sensitivity of the fault detection methods depends on the application main goals. When productivity is the main goal, the fault detection sensitivity required is weak: only large defects should be detected. However, when security is the main goal, undetected faults even with very small severity assessment may result in catastrophic growing failures. Therefore, there is a need for fault detection and diagnosis (FDD) methods, with a high sensitivity to small fault but insensitive to the environment perturbations (noise, temperature, etc.) and to input changes [1].
In the literature, a fault is defined as “a non-allowed and unpredictable deviation of at least one characteristic property or variable of the system” [2]. For industrial process monitoring, when safety is the main priority, it is crucial to be able to detect very slight faults (namely incipient modification) at their earliest stage. Indeed, early detection may provide invaluable warning on emerging problems, and appropriate actions may allow to avoid serious process upset. However, the accurate detection of incipient faults, is a challenge as it requires distinguishing the fault itself from nuisance parameters like noise or environmental unpredictable changes. There exist many sources of noise in industrial processes depending on the applications. As examples we can cite vibrations, electric power fluctuations, stray radiation from nearby electrical equipment, static electricity, turbulence in the flow of gases or liquids, background radiation from natural radioactive elements, etc. [3]. In fact, in real application processes, every kind of slight disturbance can be considered as a nuisance parameter designated here by the general term “noise”. Indeed, this noise can affect the fault detection method performance in terms of false alarm probability (reliability) and missed detection probability (sensitivity).
Moreover, the faults in industrial process may manifest in different forms on the measured signals. For example, some faults change the statistics properties of the signal (Mean, Variance, Skewness, and Kurtosis), other change the spectral properties, and other manifest as a noise added to the signals. Therefore, the fault detection methods should be able to cope all these types of fault signature.
Various methods of fault detection and isolation have been proposed in different industrial contexts. They are generally classified as: model-based and data-driven-based methods. In the model-based methods, fault detection is based on the comparison of the system׳s measured variables with the estimated ones obtained from a mathematical model of the process. These methods include statistical hypothesis testing approach applied on the residuals (e.g., Bayesian, likelihood, and minimax) [4], [5], observer-based approach, interval approach, and parity-space approach [6], [7].
The model-based approach is efficient when an accurate model is available. However poor faut detection performances are obtained due to the model uncertainties, modelling errors and e.g., tricky tuning of the observers.
In contrast to the model-based approaches, where a priori knowledge on the process is needed [6], [7] data-based methods require the availability of a sufficient amount of historical process data to perfectly describe the process behaviour using well chosen descriptive features [8]. These approaches include the latent variable methods, e.g., partial least square (PLS) regression, principle component analysis (PCA), canonical variate analysis (CVA), independent component analysis (ICA), neural network, fuzzy systems and pattern recognition methods [9].
In this paper a data-driven approach is considered using several descriptive features in the Principle Component Analysis (PCA) frame combined with multivariate statistical techniques to develop an efficient fault detection and diagnosis method.
PCA-based monitoring methods can easily handle high dimensional, noisy and highly correlated data generated from industrial processes, and provide superior performance compared to univariate methods [10]. In addition, these process monitoring methods are attractive for industrial practical processes because they only require a good historical data set of healthy operation, which are easily obtainable for computer-controlled industrial processes. The PCA can be used to reduce the m dimensional space of process variables to a lower l-dimensional subspace termed the principal subspace while keeping a maximum information in the new space. The remaining information is in the (m–l)-dimensional subspace named the residual subspace [10], [11], [12], [13], [14].
PCA-based monitoring methods and their extensions have been successfully applied in a wide range of applications and industries, such as in chemical processes, air quality, water treatment, aerospace, agriculture, automotive, electronics, energy, manufacturing, medical devices, and many others [15].
The most common procedure of process monitoring with PCA consists in using some metrics (known as detection indices) to identify faults. Several detection indices have been used with this multivariate technique, which include Hotelling׳s T2 statistic [16] and the squared prediction error SPE [17]. The T2 measures the variations of the principle components at different time samples, while the SPE measures the variations of the residuals. As mentioned in [18], the performances of the T2 and SPE in terms of false alarm and miss detection probability are not so satisfactory. Also, the T2 and SPE are sensitive to modelling errors [19]. Moreover, the control limits of T2 and SPE are based on the assumption that the latent variables follow a multivariate Gaussian distribution. Therefore when the latent variables are non-Gaussian distributed, using T2 and SPE may be misleading [20].
Therefore to overcome the T2 and the SPE performances shortcomings, an approach was proposed in [15], in which the generalised likelihood ratio GLR test is used with the PCA. The GLR test is a hypothesis testing method that has been successfully used in model-based fault detection and was superior to T2 and SPE [15].
To distinguish between the two hypotheses (faulty and healthy), each of which has known parameters, the use of the likelihood ratio test can be justified by the Neyman–Pearson lemma [21], which proves that such a test has the best performances of all competitors.
However, the GLR is a parametric hypothesis test method, so it needs a priori knowledge on the signal distribution type (Gaussian, gamma, etc.) and the parameters affected by the fault (mean, variance, kurtosis, etc.). If this information is not available, or if the signals have a non-usual distribution, the GLR cannot be applied fruitfully. In addition, even if these conditions are fulfilled, if the change amplitude is unknown, the GLR is not optimal.
In the literature, the most popular non-parametric hypothesis test method is the Wilcoxon Rank Sum test (WRST). However, we will show in this paper that this method is not able to be applied in the industrial processes because of its high sensitivity to the noise presence and the random type of the signals.
In this paper, we develop an approach that can be applied without any a priori information either on the distribution type and whatever the type and without knowledge on the parameters affected by the fault, neither the fault amplitude (non-parametric method). This method is based on the Kullback–Leibler Divergence (KLD) in the PCA framework. The Kullback–Leibler Divergence [22] derived from the information theory, has been shown as an alternative to the T2 and SPE criteria for the detection of incipient faults [23], [24]. This measure has been also used for abnormality detection and pattern recognition in different areas. Compared to T2 and SPE, it has been shown that the monitoring strategy with KLD using PCA is conceptually more straightforward and also more sensitive for the detection of incipient faults [23], [25].
Moreover, since the KLD is used to measure the dissimilarity between the probability density function (Pdf) of healthy and faulty data [26], [27], there is no need for a priori information relative to the fault׳s type because all the faults, whatever their types, change the probability density function of the measured signals. Then KLD might be suitable and efficient for fault detection and diagnosis of any type of signals and faults. Conceptually the KLD should be null if the system is healthy and deviates from zero due to fault occurrence or an environmental change (noise presence) affecting the monitored data. The performances of the proposed incipient fault detection method are shown to be dependent of the process environment (noise level) but also of internal parameters named hyperparameters [28] as for example the KLD detection threshold. We propose here to develop a theoretical model of the performances characterised by the false alarm probability and missed detection probability. Afterwards, we go through an optimisation process to minimise the Bayes Risk. For that a deterministic optimisation algorithm is used and the optimised detection threshold is obtained according to the fault severity and noise level.
This paper is organised as follows. Section 2 is devoted to the description and validation of the fault detection method and its comparison to the other aforementioned methods. Section 3 presents the fault diagnosis performances in terms of false alarm and missed detection probabilities. The optimisation of the performances closes this section. Section 4 concludes the paper.
Section snippets
Notation
Let us introduce the following notations:
Let us set such as is the original data matrix where is a column vector of N measurements taken for the jth variable.
, where is the centered matrix; each column of X is subtracted from its mean value.
S is the sample data covariance matrix.
, such as is the loading eigenvectors matrix.
, where is the scores matrix given by
l is the
Performance modelling
A key issue in fault detection is to state the significance of the observed deviation (fault) with respect to random noises, or deterministic uncertainties (also called nuisance parameters) [1]. A main challenge of the statistical methods is their ability to handle noises and uncertainties, to reject nuisance parameters, and to decide between two hypotheses H0 (no faults a=0) and H1 (there exists a fault ).
The performance of the hypothesis test is characterised by False Alarm Probability (PFA
Conclusion
In this paper, we have studied the incipient fault detection capability of the Kullback–Leibler Divergence for unknown distribution data. At first, we have made a brief comparison with the generalised likelihood ratio (GLR) test. The results have shown that even with Gaussian distribution and variance change (optimal case for the GLR), the KLD is still efficient even with a lower detection capability.
However, with unknown distribution data, where GLR test is no longer applicable, the KLD
Acknowledgments
This research was partially supported by the iCODE institute, research project of the Idex Paris-Saclay.
References (38)
- et al.
Optimal statistical fault detection with nuisance parameters
Automatica
(2005) - et al.
A review of process fault detection and diagnosis part Iquantitative model-based methods
Comput. Chem. Eng.
(2003) - et al.
A review of process fault detection and diagnosis part II: qualitative models and search strategies
Comput. Chem. Eng.
(2003) - et al.
A review of process fault detection and diagnosis part III: Process history based methods
Comput. Chem. Eng.
(2003) - et al.
Statistical process control of multivariate processes
Control Eng. Pract.
(1995) - et al.
Statistical fault detection using PCA-based GLR hypothesis testing
J. Loss Prev. Process Ind.
(2013) - et al.
An improved pca scheme for sensor FDIapplication to an air quality monitoring network
J. Process Control
(2006) - et al.
Incipient fault detection and diagnosis based on Kullback–Leibler Divergence using principal component analysis: part I
Signal Process.
(2014) - et al.
Automatic dimensionality selection from the scree plot via the use of profile likelihood
Comput. Stat. Data Anal.
(2006) Distances measures for signal processing and pattern recognition
Signal Process.
(1989)
Introduction to the DAMADICS actuator FDI benchmark study
Control Eng. Pract.
Signal Processing Noise, Electrical Engineering and Applied Signal Processing Series
Mathematical Statistics
Testing Statistical Hypotheses
Fault Detection and Diagnosis in Industrial Systems
Principle Component Analysis
A selective kernel PCA algorithm for anomaly detection in hyperspectral imagery
IEEE ICASSP
Application of classification methods in fault detection and diagnosis of inverter fed induction machine drive: a trend towards reliability
Eur. Phys. J. Appl. Phys.
Cited by (97)
Improved deep PCA and Kullback–Leibler divergence based incipient fault detection and isolation of high-speed railway traction devices
2023, Sustainable Energy Technologies and AssessmentsAn incipient fault diagnosis methodology using local Mahalanobis distance: Fault isolation and fault severity estimation
2022, Signal ProcessingCitation Excerpt :This section reviews the basic idea of these two groups of approaches and summarizes their advantages and limitations to provide an intuitive comparison with the method proposed in this work. KLD combined with PCA feature extraction was first proposed for incipient fault detection [11,14,33]. It extracts data’s principal components as features and uses KLD to assess features’ divergences in probability distribution between healthy reference samples and testing samples.
Context Adaptive Fault Tolerant Multi-sensor fusion: Towards a Fail-Safe Multi Operational Objective Vehicle Localization
2024, Journal of Intelligent and Robotic Systems: Theory and ApplicationsPrivacy and security trade-off in cyber-physical systems: An information theory-based framework
2024, International Journal of Robust and Nonlinear ControlIntelligent structural health monitoring of composite structures using machine learning, deep learning, and transfer learning: a review
2024, Advanced Composite Materials