Keywords

1 Introduction

Freezers are durable consumer goods that are manufactured in mass volumes. In order to assure the quality during the manufacturing process, a certain number of freezer units are randomly selected from each production batch and the selected freezer units are tested for various types of defects including cosmetic defects and functional defects. In general, a human expert conducts tests, interprets data and results, and concludes a decision. The way of detecting defective units by means of human experts is prone to errors and takes time. It is highly desirable to automate the process such that forecasting of product quality is performed by means of anomaly detection preferably by applying machine learning and data-driven methods.

Our ultimate aim is to design a system that automates the detection of defective units during cooling tests of freezer units manufactured in high volumes in a factory of one of the leading home appliances manufacturers, Arçelik (Beko). In the design of such a system, extra attention should be paid to the sensitivity (accuracy on detecting defective units), since missing a defective freezer unit might potentially lead to a totally defective batch that may be delivered to the market. However, false alarms (false positives) would only lead to an extra manual test which is a small drawback comparing to a miss (false negative). We start by analyzing the data of the test units sampled from the batches of freezer units. Data is then embedded onto two-dimensions to visualize its distribution. Such a visualization may yield particularly structures and outliers existing in the data. Clustering is then applied to see if the data can be grouped into two classes. As off-line approaches, state-of-the-art classifier methods including one-class-classifier are employed. Finally, a deep learning method for time-series analysis combined with a classifier is applied as an on-line approach.

2 Related Work

Anomaly detection is a major forecasting method used in assessing the product quality in real-world applications. Several methods have been proposed to detect anomalies in data [1, 3]. Traditionally, statistical methods such as cumulative sum (CUSUM) and exponentially weighted moving average (EWMA) were employed [2]. There exist also methods based on Support Vector Machines (SVM) [7, 11]. When anomaly detection is defined as outlier detection, the solution may come from one-class support vector machine as well [13]. Technology companies such as Twitter and Netflix have also proposed their own solutions for this problem [5, 6]. With its reemergence, Long Short Term Memory Networks (LSTM) [4] became the most popular method for time series modeling and forecasting. Various methods exist to incorporate LSTMs; both stacked LSTMs and an LSTM-based encoder-decoder for detecting anomalies in time series data have been described [9, 10]. However, a set of rules must be set in order to decide, if the predicted points are indeed anomalies; Shipomon et al. compared different rules for anomaly detection [12].

3 Data

The results of every freezer unit that has been tested between 2016 and 2018 at Arçelik (Beko) Refrigerator Plant are available as data. During the test of a randomly selected unit, two sensors measure the temperature inside the freezer unit and another sensor measures the ambient temperature. Finally, a fourth sensor measures the power consumed by the compressor. The freezer compressor is both a motor and a pump that move the refrigerant through the system. These measurements are recorded at each minute. Since the two sensors inside the freezer unit give out the same temperature, we make use of only one of them. The ambient temperature stays constant during the test. Therefore, only one temperature sensor data is considered in the rest of this study.

Fig. 1.
figure 1

Sensor reading data of a non-defective freezer unit during cooling test. (Color figure online)

Expected behavior of a non-defective freezer unit is given in Fig. 1 and it can be described as follows. Temperature sensors start from ambient temperature and drops down until around \(-20\,^{\circ }\)C, while power is consumed steadily. After reaching the target temperature, freezer unit starts its cycling phase. During this phase, compressor stops and thus power sensor starts reading zero. Simultaneously, temperature starts rising for a few degrees. Compressor then turns on starts consuming power and also cooling the freezer unit. In Fig. 1, data drawn with red color correspond to the temperature inside the freezer unit while the consumed power data is given by purple color. A freezer unit is labeled as defective if it is unable to reach its target temperature in a few hours. The test goes on until the human expert decides whether the unit is defective or not. This means that the tests vary in terms of time depending on the experience of the expert. Even though most tests are concluded in 120 min, there are tests that last more than 220 min. Especially, the defective units are tested for several hours so that the cause of failure becomes clear. In addition, freezer unit model also affects the test time. Different models may take different lengths of test time to reach their target temperatures. Similarly, there are some events that disrupt the test process. During the test, a unit might be of subject for further examination or it might be tested for extreme cases. As a result, we have data of units for which the human expert indicates to be OK with potentially anomalous measurements. If a unit is labeled to be defective, a metadata is recorded including model, product identification number, batch number, test date, and error code. Furthermore, defective units are grouped according to their types of defect and labeled with respective error codes.

Fig. 2.
figure 2

Plot of temperature sensor measurements of a sub-set of the original dataset for 150 min of the test. (Color figure online)

Figure 2 presents the plot of temperature sensor measurements of a subset of the original dataset. Data drawn with red color represent the non-defective freezer units whereas data drawn with blue color refer to the defective freezer units.

Fig. 3.
figure 3

Embedding of temperature sensor data from 150-dimensional space onto 2-dimensional space using t-SNE. (Color figure online)

4 Initial Analyses of Data

4.1 Embedding the Data in 2D Space

150 min temperature sensor data can be considered as a 150 dimensional feature vector. We have embedded 150 dimensional feature vectors in a lower dimensional space using t-Distributed Stochastic Neighbor Embedding (t-SNE) [8] with its perplexity parameter set to 50. The embedding of data from 150-dimensional space into 2-dimensional space is shown in Fig. 3; the data points with red color correspond to defective freezer units while data points with black color correspond to non-defective freezer units. Most of the data points corresponding to defective freezer units are grouped in a cluster while the data points corresponding to non-defective units are spread out. It is interesting to individually analyze each of the defective-labeled freezer units which are embedded among or nearby non-defective units. About 95\(\%\) of the defective freezer units can be separated from non-defective units by a classifier based on a Support Vector Machine with linear kernel.

Fig. 4.
figure 4

Result of k-means clustering based on temperature sensor measurement values for 150 min of test.

4.2 Clustering the Data

We wanted to see if the data could be clustered with respect to the temperature sensor measurement values into two groups as properly working and defective freezer units. 150 dimensional feature vectors are used for a subset of freezer units in the dataset. The result of applying k-means clustering algorithm with \(k=2\) is shown in Fig. 4. Even though the data looks clustered neatly, resulting cluster labels do not match with the original labels shown in Fig. 2.

4.3 Applying Classifiers

We have applied state-of-the-art classification algorithms to the original dataset in order to have a baseline for further improvements. Since the dataset is imbalanced, we have applied under-sampling to the data corresponding to non-defective freezer units. We have applied the classifiers with 10-fold cross validation and Table 1 shows the average accuracy and sensitivity values for the tests.

4.4 One-Class Classification

One-class classification is a very common method employed in outlier detection. We have used One-Class Support Vector Machines in order to be able to define a hyper-plane containing of all data points corresponding to non-defective units. Any data point that is outside this hyper-plane is considered to be an anomaly (or defective). The results of this method greatly depend on the strictness of the hyper-plane. When we select a hyper-plane that contains all of the data points corresponding to non-defective units, only \(70\%\) of the data points corresponding to defective units remain outside the hyper-plane whereas the other \(30\%\) are inside the hyper-plane. This gives an accuracy score of 85\(\%\) and a sensitivity score of 70\(\%\). By changing the hyper-plane, these scores may change. However, the best scores reached with this method are 85\(\%\) of accuracy and 80\(\%\) of sensitivity.

Table 1. Average accuracy and sensitivity values for different classifiers.

4.5 Applying Long Short-Term Memory Network Model

In order to make use of the time-series property of the data, we consider applying Long Short-Term Memory (LSTM) network models. The behavior of non-defective freezer units under the cooling test is modeled by training a LSTM network model with only data items corresponding to non-defective freezer units. A fixed-length sliding window of a data item is taken as input in order first to train for and then eventually to predict the value at the subsequent time step. We then calculate the error between the predicted value and the real measurement value at that particular time step. We hypothesize that the error would be higher for data items corresponding to defective freezer units compared to those corresponding to non-defective units.

Fig. 5.
figure 5

Plot of the data of the two sensors (temperature sensor and power sensor) of a defective unit along with the prediction values of the trained LSTM network model. (Color figure online)

Fig. 6.
figure 6

Plot of the data of the two sensors (temperature sensor and power sensor) of a non-defective unit along with the prediction values of the trained LSTM network model.

Figure 5 gives the plot of a data item of the two sensors (temperature sensor and power sensor) for a defective unit along with the predictions of trained LSTM network model. The actual temperature sensor values are shown by red, temperature prediction values by blue, actual consumed power values by cyan and finally power prediction values by black. Power values are multiplied by 10 in order to be able to show the details. Figure 6 gives the plot of the data of the two sensors of a non-defective unit. It can be observed that prediction values and the real values are very close in the non-defective unit graph. However, in the defective case, the error between the prediction and actual values is high.

A threshold value for the error should be set in order to be able to decide if the data item corresponding to a freezer unit is defective or not. There is again a trade-off between the accuracy and the sensitivity score. We have tried several window sizes; starting from a window size of 1 min and going up to 90 min. In addition, different test duration values from 70 min up to 150 min and different threshold values are tested. The best result is achieved by using a 40 min window length: 91\(\%\) of accuracy and 88\(\%\) of sensitivity. Figure 7 shows the effect of the threshold level on the accuracy. By lowering the threshold, sensitivity can be increased. However, this also increases the number of false positives. An ideal threshold should be determined according to the needs of the manufacturer.

Fig. 7.
figure 7

ROC curve of the best model.

Reducing the test time is one of the objectives of this study. We have picked the best two performing window sizes and tested them with shorter test duration. Table 2 shows the accuracy and sensitivity values for these settings. As expected, as the test time reduces accuracy scores drop as well.

Table 2. Effect of test duration on the accuracy and sensitivity.

Aside from the increase in accuracy scores, LSTM network model also has the advantage of being an on-line algorithm. This means that, the model can be run simultaneously with the product test.

5 Conclusion

Anomaly detection is an essential method in order to forecast the product quality in manufacturing plants. The quality can be assured through tests performed on sample units randomly chosen from a batch of manufactured units in such a plant. With the aim to build an automated test system in Arçelik’s freezer factory manufacturing in high volumes, our ultimate goal is to detect defective units during the cooling test among the sample units as early as possible in terms of test time and as accurate as possible. For this purpose, we analyzed the cooling test data of the units sampled from the batches of manufactured freezer units. The first steps of the analysis were composed of the embedding and clustering of the data. Traditional classification algorithms were applied and their performances were assessed. Finally, a deep learning method for time-series analysis combined with a classifier was applied. Our analysis results showed the feasibility of such an automated system. However, the classifier models described in this study should be further studied and customized to be deployed under the factory conditions.

An automated test system can be initially deployed to assist the human expert performing the test. By this way, more data can be collected. Ultimately, the automated test system should perform continual learning so that it would learn and adapt itself in real-time. Various models of freezer units are produced in the factory and therefore, building individual and customized test systems for freezer models may yield better results. Since the products are sold to the customers and problems occur at the customer side when the products are operational in the field, it would be very beneficial to find possible links between the problems in the field and the results of quality tests performed in the factory.