Elsevier

Signal Processing

Volume 120, March 2016, Pages 266-279
Signal Processing

An optimal fault detection threshold for early detection using Kullback–Leibler Divergence for unknown distribution data

https://doi.org/10.1016/j.sigpro.2015.09.008Get rights and content

Highlights

  • We propose an incipient fault detection method that does not need any a priori information on the signals distribution or the changed parameters.

  • We show that the performance of the technique is highly dependent on the setting of a detection threshold and the environment noise level.

  • We develop an analytical model of the fault detection performances (False Alarm Probability and Missed Detection Probability).

  • Based on the aforementioned model, an optimisation procedure is applied to optimally set the fault detection threshold depending on the noise and the fault severity.

  • Compared to the usual settings, a performed validation of this approach with through simulation results and experimental data is given.

Abstract

The incipient fault detection in industrial processes with unknown distribution of measurements signals and unknown changed parameters is an important problem which has received much attention these last decades. However most of the detection methods (online and offline) need a priori knowledge on the signal distribution, changed parameters, and the change amplitude (Likelihood ratio test, Cusum, etc.). In this paper, an incipient fault detection method that does not need any a priori knowledge on the signals distribution or the changed parameters is proposed. This method is based on the analysis of the Kullback–Leibler Divergence (KLD) of probability distribution functions. However, the performance of the technique is highly dependent on the setting of a detection threshold and the environment noise level described through Signal to Noise Ratio (SNR) and Fault to Noise Ratio (FNR). In this paper, we develop an analytical model of the fault detection performances (False Alarm Probability and Missed Detection Probability). Thanks to this model, an optimisation procedure is applied to optimally set the fault detection threshold depending on the SNR and the fault severity. Compared to the usual settings, through simulation results and experimental data, the optimised threshold leads to higher efficiency for incipient fault detection in noisy environment.

Introduction

Fault detection plays a key role in enhancing today׳s technological systems high demands for performance, productivity and security. The sensitivity of the fault detection methods depends on the application main goals. When productivity is the main goal, the fault detection sensitivity required is weak: only large defects should be detected. However, when security is the main goal, undetected faults even with very small severity assessment may result in catastrophic growing failures. Therefore, there is a need for fault detection and diagnosis (FDD) methods, with a high sensitivity to small fault but insensitive to the environment perturbations (noise, temperature, etc.) and to input changes [1].

In the literature, a fault is defined as “a non-allowed and unpredictable deviation of at least one characteristic property or variable of the system” [2]. For industrial process monitoring, when safety is the main priority, it is crucial to be able to detect very slight faults (namely incipient modification) at their earliest stage. Indeed, early detection may provide invaluable warning on emerging problems, and appropriate actions may allow to avoid serious process upset. However, the accurate detection of incipient faults, is a challenge as it requires distinguishing the fault itself from nuisance parameters like noise or environmental unpredictable changes. There exist many sources of noise in industrial processes depending on the applications. As examples we can cite vibrations, electric power fluctuations, stray radiation from nearby electrical equipment, static electricity, turbulence in the flow of gases or liquids, background radiation from natural radioactive elements, etc. [3]. In fact, in real application processes, every kind of slight disturbance can be considered as a nuisance parameter designated here by the general term “noise”. Indeed, this noise can affect the fault detection method performance in terms of false alarm probability (reliability) and missed detection probability (sensitivity).

Moreover, the faults in industrial process may manifest in different forms on the measured signals. For example, some faults change the statistics properties of the signal (Mean, Variance, Skewness, and Kurtosis), other change the spectral properties, and other manifest as a noise added to the signals. Therefore, the fault detection methods should be able to cope all these types of fault signature.

Various methods of fault detection and isolation have been proposed in different industrial contexts. They are generally classified as: model-based and data-driven-based methods. In the model-based methods, fault detection is based on the comparison of the system׳s measured variables with the estimated ones obtained from a mathematical model of the process. These methods include statistical hypothesis testing approach applied on the residuals (e.g., Bayesian, likelihood, and minimax) [4], [5], observer-based approach, interval approach, and parity-space approach [6], [7].

The model-based approach is efficient when an accurate model is available. However poor faut detection performances are obtained due to the model uncertainties, modelling errors and e.g., tricky tuning of the observers.

In contrast to the model-based approaches, where a priori knowledge on the process is needed [6], [7] data-based methods require the availability of a sufficient amount of historical process data to perfectly describe the process behaviour using well chosen descriptive features [8]. These approaches include the latent variable methods, e.g., partial least square (PLS) regression, principle component analysis (PCA), canonical variate analysis (CVA), independent component analysis (ICA), neural network, fuzzy systems and pattern recognition methods [9].

In this paper a data-driven approach is considered using several descriptive features in the Principle Component Analysis (PCA) frame combined with multivariate statistical techniques to develop an efficient fault detection and diagnosis method.

PCA-based monitoring methods can easily handle high dimensional, noisy and highly correlated data generated from industrial processes, and provide superior performance compared to univariate methods [10]. In addition, these process monitoring methods are attractive for industrial practical processes because they only require a good historical data set of healthy operation, which are easily obtainable for computer-controlled industrial processes. The PCA can be used to reduce the m dimensional space of process variables to a lower l-dimensional subspace termed the principal subspace while keeping a maximum information in the new space. The remaining information is in the (ml)-dimensional subspace named the residual subspace [10], [11], [12], [13], [14].

PCA-based monitoring methods and their extensions have been successfully applied in a wide range of applications and industries, such as in chemical processes, air quality, water treatment, aerospace, agriculture, automotive, electronics, energy, manufacturing, medical devices, and many others [15].

The most common procedure of process monitoring with PCA consists in using some metrics (known as detection indices) to identify faults. Several detection indices have been used with this multivariate technique, which include Hotelling׳s T2 statistic [16] and the squared prediction error SPE [17]. The T2 measures the variations of the principle components at different time samples, while the SPE measures the variations of the residuals. As mentioned in [18], the performances of the T2 and SPE in terms of false alarm and miss detection probability are not so satisfactory. Also, the T2 and SPE are sensitive to modelling errors [19]. Moreover, the control limits of T2 and SPE are based on the assumption that the latent variables follow a multivariate Gaussian distribution. Therefore when the latent variables are non-Gaussian distributed, using T2 and SPE may be misleading [20].

Therefore to overcome the T2 and the SPE performances shortcomings, an approach was proposed in [15], in which the generalised likelihood ratio GLR test is used with the PCA. The GLR test is a hypothesis testing method that has been successfully used in model-based fault detection and was superior to T2 and SPE [15].

To distinguish between the two hypotheses (faulty and healthy), each of which has known parameters, the use of the likelihood ratio test can be justified by the Neyman–Pearson lemma [21], which proves that such a test has the best performances of all competitors.

However, the GLR is a parametric hypothesis test method, so it needs a priori knowledge on the signal distribution type (Gaussian, gamma, etc.) and the parameters affected by the fault (mean, variance, kurtosis, etc.). If this information is not available, or if the signals have a non-usual distribution, the GLR cannot be applied fruitfully. In addition, even if these conditions are fulfilled, if the change amplitude is unknown, the GLR is not optimal.

In the literature, the most popular non-parametric hypothesis test method is the Wilcoxon Rank Sum test (WRST). However, we will show in this paper that this method is not able to be applied in the industrial processes because of its high sensitivity to the noise presence and the random type of the signals.

In this paper, we develop an approach that can be applied without any a priori information either on the distribution type and whatever the type and without knowledge on the parameters affected by the fault, neither the fault amplitude (non-parametric method). This method is based on the Kullback–Leibler Divergence (KLD) in the PCA framework. The Kullback–Leibler Divergence [22] derived from the information theory, has been shown as an alternative to the T2 and SPE criteria for the detection of incipient faults [23], [24]. This measure has been also used for abnormality detection and pattern recognition in different areas. Compared to T2 and SPE, it has been shown that the monitoring strategy with KLD using PCA is conceptually more straightforward and also more sensitive for the detection of incipient faults [23], [25].

Moreover, since the KLD is used to measure the dissimilarity between the probability density function (Pdf) of healthy and faulty data [26], [27], there is no need for a priori information relative to the fault׳s type because all the faults, whatever their types, change the probability density function of the measured signals. Then KLD might be suitable and efficient for fault detection and diagnosis of any type of signals and faults. Conceptually the KLD should be null if the system is healthy and deviates from zero due to fault occurrence or an environmental change (noise presence) affecting the monitored data. The performances of the proposed incipient fault detection method are shown to be dependent of the process environment (noise level) but also of internal parameters named hyperparameters [28] as for example the KLD detection threshold. We propose here to develop a theoretical model of the performances characterised by the false alarm probability and missed detection probability. Afterwards, we go through an optimisation process to minimise the Bayes Risk. For that a deterministic optimisation algorithm is used and the optimised detection threshold is obtained according to the fault severity and noise level.

This paper is organised as follows. Section 2 is devoted to the description and validation of the fault detection method and its comparison to the other aforementioned methods. Section 3 presents the fault diagnosis performances in terms of false alarm and missed detection probabilities. The optimisation of the performances closes this section. Section 4 concludes the paper.

Section snippets

Notation

Let us introduce the following notations:

Let us set X[N×m] such as X=(x1,,xj,,xm)=(xij)i,j is the original data matrix where xj=[x1jxNj] is a column vector of N measurements taken for the jth variable.

X¯[N×m], where X¯=(x¯1,,x¯j,,x¯m) is the centered matrix; each column of X is subtracted from its mean value.

S is the sample data covariance matrix.

P[m×m], such as P=(p1,,pl,pm) is the loading eigenvectors matrix.

T[N×m], where T=(t1,,tl,,tm) is the scores matrix given by T=X¯P.

l is the

Performance modelling

A key issue in fault detection is to state the significance of the observed deviation (fault) with respect to random noises, or deterministic uncertainties (also called nuisance parameters) [1]. A main challenge of the statistical methods is their ability to handle noises and uncertainties, to reject nuisance parameters, and to decide between two hypotheses H0 (no faults a=0) and H1 (there exists a fault a0).

The performance of the hypothesis test is characterised by False Alarm Probability (PFA

Conclusion

In this paper, we have studied the incipient fault detection capability of the Kullback–Leibler Divergence for unknown distribution data. At first, we have made a brief comparison with the generalised likelihood ratio (GLR) test. The results have shown that even with Gaussian distribution and variance change (optimal case for the GLR), the KLD is still efficient even with a lower detection capability.

However, with unknown distribution data, where GLR test is no longer applicable, the KLD

Acknowledgments

This research was partially supported by the iCODE institute, research project of the Idex Paris-Saclay.

References (38)

  • M. Bartys et al.

    Introduction to the DAMADICS actuator FDI benchmark study

    Control Eng. Pract.

    (2006)
  • R. Isermann, in: Fault-Diagnosis Systems, Springer Science and Business Media, Berlin, Germany,...
  • V. Tuzlukov

    Signal Processing Noise, Electrical Engineering and Applied Signal Processing Series

    (2010)
  • A. Borokov

    Mathematical Statistics

    (1998)
  • E. Lehman

    Testing Statistical Hypotheses

    (1996)
  • E. Chiang et al.

    Fault Detection and Diagnosis in Industrial Systems

    (2001)
  • I. Jollife

    Principle Component Analysis

    (2002)
  • Y. Gu et al.

    A selective kernel PCA algorithm for anomaly detection in hyperspectral imagery

    IEEE ICASSP

    (2006)
  • C. Delpha et al.

    Application of classification methods in fault detection and diagnosis of inverter fed induction machine drive: a trend towards reliability

    Eur. Phys. J. Appl. Phys.

    (2008)
  • Cited by (97)

    • An incipient fault diagnosis methodology using local Mahalanobis distance: Fault isolation and fault severity estimation

      2022, Signal Processing
      Citation Excerpt :

      This section reviews the basic idea of these two groups of approaches and summarizes their advantages and limitations to provide an intuitive comparison with the method proposed in this work. KLD combined with PCA feature extraction was first proposed for incipient fault detection [11,14,33]. It extracts data’s principal components as features and uses KLD to assess features’ divergences in probability distribution between healthy reference samples and testing samples.

    View all citing articles on Scopus
    View full text