Abstract

A great number of fruits are grown around the world, each of which has various types. The factors that determine the type of fruit are the external appearance features such as color, length, diameter, and shape. The external appearance of the fruits is a major determinant of the fruit type. Determining the variety of fruits by looking at their external appearance may necessitate expertise, which is time-consuming and requires great effort. The aim of this study is to classify the types of date fruit, that are, Barhee, Deglet Nour, Sukkary, Rotab Mozafati, Ruthana, Safawi, and Sagai by using three different machine learning methods. In accordance with this purpose, 898 images of seven different date fruit types were obtained via the computer vision system (CVS). Through image processing techniques, a total of 34 features, including morphological features, shape, and color, were extracted from these images. First, models were developed by using the logistic regression (LR) and artificial neural network (ANN) methods, which are among the machine learning methods. Performance results achieved with these methods are 91.0% and 92.2%, respectively. Then, with the stacking model created by combining these models, the performance result was increased to 92.8%. It has been concluded that machine learning methods can be applied successfully for the classification of date fruit types.

1. Introduction

Date fruit (Phoenix dactylifera), which has about 200 types and more than 2500 species worldwide, is an edible and a nutritive fruit [13]. Date fruit can be classified by evaluating with image analysis and pattern recognition techniques. The differences in view, distance, and lighting exposure are the obstacles encountered in terms of performing a reliable classification. In order to make a successful classification, interclass similarities and differences should be handled cautiously. Therefore, the studies on fruit recognition and classification have been carried out based on the visual features extracted from images.

In short, easily determining the changes in the surface area and color values of the agricultural products with image analysis techniques facilitates the classification studies [4]. In the literature, there are numerous automatic classification and sorting systems based on image processing for various fruits, such as citrus, apple, date fruit, strawberry, mango, lemon, tomato, and pulses [510]. Morphological features are frequently used in the classification of fruits and vegetables [11, 12]. In another study carried out with seven different date fruit types, the k-nearest neighbor (cityblock), k-nearest neighbor (Euclidean), discriminant analysis, and neural networks classification methods have been tested by properly preparing 15 different visual features on the image data. The highest accuracy rates achieved as a result range between 89% and 99% [13]. In addition to the local binary pattern (LBP) and Weber local descriptor (WLD) methods used in order to extract the details of a date fruit’s tissue pattern, the feature extraction method based on the Fisher discriminant ratio (FDR) was also applied to select more important features than these two methods. The data obtained through these methods were classified using the multiclass support vector machine (SVM) [14]. The data obtained as a result of the segmentation of the images obtained for the determination of the date fruits’ ripening stages with the Otsu method was classified with the support vector machines (SVM) method, and an accuracy of 92.5% was achieved on 160 images [15]. In another study, in which 6 features extracted from date fruit images are used, it is stated that the SVM classifier, ANN, random forest (RF), and decision trees (DT) give better results than the classification approaches. As a result of the classification of features obtained from date fruit images, with two different neural network models, which are back propagation and radial basis function (RBF) networks, and the method of multilayer perceptron (MLP), 87.5% and 91.1% success performances were achieved, respectively [16].

In a study using barley grains, features were extracted using the Spatial Pyramid Partition Ensemble computer vision method, and machine learning methods were classified. They achieved 75% success with the kNN model and 100% success with the J48 model [17].

Date fruit, which has many varieties throughout the world, is used in the production of food, medical, and cosmetic products [18]. Expert opinion is needed to distinguish date fruit varieties due to different nutritional value, different consumption times, different prices, and quality differences [13]. Computer vision systems are used without the need for an expert in order to quickly distinguish the quality, size, and type of date varieties [19, 20]. In this way, the process from the production stage to the consumption stage can be shortened [21]. With this type of agricultural technologies, it is possible to increase the productivity in the agricultural industry [22].

When the literature is examined, it is seen that various machine learning methods have been tried for the classification of the date fruit. There are not many studies in which hybrid methods such as the stacking model is used. Within the scope of this study, as well as the models based on the LR and ANN, the stacking model that combine works according to the results of these two models have been tried. By using these models, date fruits were classified with a total of 34 features such as morphological, shape, and color characteristics obtained from date fruits. While determining these features, other studies on the field were also considered, and it was investigated whether the classification success is impacted by the number of features extracted.

In the second chapter, information on the methods of image acquisition, image processing, feature extraction, and success performance analysis will be given. In the third chapter, the LR, ANN, and stacking methods will be explained. In the fourth chapter, the results of the study will be presented. Last, in the fifth chapter, suggestions will be made together with the evaluation of the study.

2. Materials and Methods

In this section, acquisition of date fruit images, features extraction, and also the logistic regression (LR), artificial neural network (ANN), and stacking methods used for the classification process will be explained. Besides, the performance metrics of the classification results will be given. Figure 1 shows the general process steps of the study.

Accordingly, the steps performed are image acquisition, images processing, and feature extraction. Afterwards, the performance analyzes were evaluated by completing classification processes carried out according to LR, ANN, and stacking methods.

2.1. Image Acquisition

In this study, the classification process was performed for 7 different date fruit types, that are, Barhee, Deglet Nour, Sukkary, Rotab Mozafati, Ruthana, Safawi, and Sagai. Barhee 65, Deglet Nour 98, Sukkary 204, Rotab Mozafati 72, Ruthana 166, Safawi 199, and Sagai 94 samples were used from palm varieties, respectively. Date fruit is a fruit grown in countries neighboring Turkey. Therefore, access to varieties is very easy. There is a market where many types of dates are sold. In the study, ready-to-eat date fruits obtained from these markets were used. A CVS was set up such as the system given in Figure 2, for the image acquisition and captured images of the date fruits were transferred to the computer environment. The camera used in the setup is placed on a closed box with an LED light system. The future robot S100 series smart camera used to capture images has a resolution of 1280 × 1024. This camera, which has a CMOS-type 1.3 MP sensor, captures 90 FPS images in order to quickly take images in the installed conveyor system.

The images of date fruits were obtained via a mechanism that does not receive external light, so that preprocesses can be completed quickly. During image processing, the background and shadows should be cleaned according to the band speed and ambient light conditions [23]. Via automatically cleaning the shadows and background color that emerged during the illumination, clear date fruit images were obtained. Green color is used in the background in order to easily distinguish the palm image from the background. RGB values of green color were determined as R = 106,  = 210, and B = 175. Using these color values, the image was color filtered, and the date fruit image was obtained without a background [24]. This process is shown in Figure 3. In the study, classification processes were performed with features extracted from 898 images of 7 different date fruit types.

2.2. Image Processing and Extraction of Morphological Features

The obtained images were converted into grayscale and binary images for feature extraction. Basically, the operations were performed on methodologies of threshold and pixel information. At the end of the image processing, each date fruit was examined separately, and a set of features were extracted from them. The Otsu method, one of the commonly used image thresholding techniques, was used within the scope of the study [25]. This method specifies a variable that can distinguish between two or more groups found in nature. Generally working on gray level images, the method only checks how many times each color is present on the image. Therefore, the color distributions of the images are first calculated, and then, all processes are performed on this distribution sequence.

The progress of the method can be briefly summarized as follows:(1)The distribution and probabilities of each density level are calculated(2)Initial setting and (3)Step by step t = 1 maximum density for all possible thresholds(a)Update for and (b)Calculation for (4)The desired threshold corresponds to the maximum value

A total of 34 features, including also 12 morphological [26], 4 shape [27], and 18 color [28] features, were extracted [29]. The features used in the study are given in Table 1.

2.3. Date Fruit Features Dataset

The date fruit types selected to be examined in the study are Barhee from the Palestinian region, Deglet Nour from the Algeria region, Sukkary, Safawi, Sagai, and Ruthana from the Saudi Arabia Riyadh and Medina region, and Rotab Mozafati from the Iran region. These selected date fruits are the most common and frequently grown types in their region of belonging. Table 2 provides the general characteristics of the date fruit types used in the study, while Table 3 provides places to statistical averages of the features obtained from the date fruits.

2.4. Performance Analysis

In numerous fields of science, classification techniques have been applied for many problems. There are several ways to evaluate the classification algorithms. To evaluate the classification algorithms, the correct metric analysis must be interpreted correctly [30, 31]. The confusion matrix summarizes the correct and incorrect classifications, in the form of a table [32].

True positive (TP) refers to correctly classified positive samples, true negative (TN) refers to correctly classified negative samples, false positive (FP) refers to incorrectly classified positive samples, and false negative (FN) refers to incorrectly classified negative samples. A seven-class confusion matrix was used since a seven-class classification problem was worked on in the research. The multiclass confusion matrix used is given in Table 4.

To make the detailed performance analysis of the models, there are different metrics besides the classification success [33]. F-score, precision, recall, and specificity metrics are the other metrics utilized to measure models’ performance. Calculation of performance metrics with a seven-class confusion matrix is given in Table 5.

AUC (area under the curve)–ROC (receiver operating characteristic curve), which is another method to be used when it is necessary to control or visualize the performance of multiclass classification problems, is one of the most important evaluation criteria to check the performance of any classification model [34]. It is a performance measurement for classification problem at various threshold settings. Each point on the ROC curve represents a sensitivity/specificity pair corresponding to a certain decision threshold. A test with perfect separation has a ROC curve that passes through the upper left corner (100% sensitivity and 100% specificity). Therefore, the closer the ROC curve is to the upper left corner, the higher the overall accuracy of the test [35].

3. Classification Models

In order to evaluate the proposed classification methods, a dataset was created with the features extracted from date fruits. LR and ANN models and the stacking model created by combining these two models were used in classification processes.

3.1. Logistic Regression Analysis

The logistic regression method is often used to analyze the probability of the outcome emergence using the relationship between two or more variables. The result is obtained by fitting the achieved logarithmic ratios and explanatory variables to a linear model [36]. Logistic regression analysis is calculated aswhere is the binary variable, are the “n” explanatory variables, and are the regression coefficients to be estimated based on the data. The logistic regression model is an analysis method that is frequently used in many fields from physical sciences to social sciences, such as engineering [37], biomedicine [38], social sciences [39], and agriculture [40]. Consequently, logistic regression analysis is used to analyze a dataset with one or more independent variables that determine a result [41, 42].

3.2. Artificial Neural Network

An artificial neural network (ANN) is a set of algorithms trying to recognize the main associations in a dataset that mimics the way of human brain works [43, 44]. In computer science, on the other hand, the neural network (ANN) model is a simple basic mathematical model that describes a function of in which nonlinear relationships between input variables on the X side and output variables on the Y side can be determined [45]. It is used in many basic fields such as engineering [46], medicine [47], agriculture [48], and social sciences [49]. The “neuron” in a neural network is a mathematical function that collects and classifies the information according to a particular architecture. The network has a strong resemblance to statistical methods such as curve fitting and regression analysis [50].

3.3. Stacking

Stacking is a machine learning algorithm in which two or more models are trained to solve the same problem and combined to achieve better results. Mainly, it aims at obtaining more accurate and/or robust models when the weak models are combined correctly and can be obtained. For this reason, it has been observed that the classification performance is improved by combining the estimations of multiple different classifiers under a single robust estimation [51, 52]. By combining the ANN and LR methods and with the results obtained with these models, the stacking model was created. The created model is shown in Figure 4.

4. Experimental Results

In order to classify the date fruits used in the study, the features of 898 preprocessed date fruits were extracted. In total, 34 features were extracted for each date fruit based on 3 main features, which are morphological, shape, and color features. A dataset was created with these extracted features. For the classification process, ANN and LR models were created and performance metrics were obtained. In addition, the performance metrics of the stacking model created by combining these two models were compared with both models. The cross-validation method, a method for splitting the dataset into parts, was used to evaluate the classification models and to train the model [53]. With this method, the dataset is split into a certain number of subsets as training and testing. In the study shown in Figure 5, the number of sample repetitions (k) was determined as 10.

The performance results were evaluated with the ROC curve and confusion matrix. Table 6 provides the AUC value, accuracy, F1 score, precision value, recall value, and specificity value, which are the performance metrics of classification measurements, respectively. According to the table, ANN and LR methods have achieved success over 90%. By obtaining the highest accuracy of 92.8% with the stacking model, a combination of these two models, high success was obtained from both models applied separately. The confusion matrix values are given in Table 7 for all models.

According to the confusion matrix, Safawi is the date fruit with the highest classification success for all models, while Barhee is the model with the lowest classification success.

In machine learning, AUC-ROC curve is also utilized for evaluating the performance of a classification problem. In the AUC-ROC curves (Figure 6) obtained for each date fruit in the study, the x-axis shows the FP ratio and the y-axis shows the TP ratio. When the AUC-ROC curves are examined, it is seen that the success performance is high in Safawi date fruit. The closer the upper left corner of the curve is to 1, the better the success performance.

5. Conclusions

In this study, a system has been put forth, with the aim of classifying the date fruits automatically without needing time-consuming and complex physical measurements. When previous studies are examined, it is seen that LR and ANN basic machine learning methods have been tried in classification with extracting more features from a date fruit. Furthermore, better results were obtained through the stacking method created by combining these two methods. In classification studies, high performance results in classification can be achieved not only with common machine learning methods but also with new stacking methods to be created by combining two or more of these methods. More successful results can be obtained by developing end-to-end classification models using the images of date fruit with deep learning models. Inspired by this study, it can be presented to the service of the users with the date fruit classification program with the help of a software on mobile phones. With the developed smartphone application, it is thought that consumers can have information about the type by classifying any date fruit sold over the counter. By extracting more features in classification studies, it is thought that success rates can be increased in the classification of not only date fruits but also other vegetables, fruits, legumes, or any object.

Data Availability

The dataset used in the study can be accessed from the link https://muratkoklu.com/Date_Fruit_Datasets.xlsx.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This project was supported by the Scientific Research Coordinator of Selcuk University.