Abstract

Deep learning (DL) has achieved breakthrough successes in various tasks, owing to its layer-by-layer information processing and sufficient model complexity. However, DL suffers from the issues of both redundant model complexity and low interpretability, which are mainly because of its oversimplified basic McCulloch–Pitts neuron unit. A widely recognized biologically plausible dendritic neuron model (DNM) has demonstrated its effectiveness in alleviating the aforementioned issues, but it can only solve binary classification tasks, which significantly limits its applicability. In this study, a novel extended network based on the dendritic structure is innovatively proposed, thereby enabling it to solve multiple-class classification problems. Also, for the first time, an efficient error-back-propagation learning algorithm is derived. In the extensive experimental results, the effectiveness and superiority of the proposed method in comparison with other nine state-of-the-art classifiers on ten datasets are demonstrated, including a real-world quality of web service application. The experimental results suggest that the proposed learning algorithm is competent and reliable in terms of classification performance and stability and has a notable advantage in small-scale disequilibrium data. Additionally, aspects of network structure constrained by scale are examined.

1. Introduction

In recent years, deep learning (DL) has dominated the research field of artificial intelligence (AI) and achieved dramatic successes in terms of speech recognition, protein structure prediction, drug discovery, image and video processing, and others. [1]. At present, the mainstream deep learning models are mostly constructed based upon neural networks, referring to multiple-layered parameterized McCulloch–Pitts neurons inspired by the biological neuron [2]. Neural networks as black boxes are not only extensively studied in the field of artificial intelligence but also highly applied to the industry of information technology [3, 4]. The appearance of deep neural networks pushes the development of neural networks to a new peak. However, given the numerous difficulties and problems they face, including the lack of a theoretical foundation for explanation [5, 6], fairness [7, 8], and causal discovery [9, 10], neural networks tend to become stuck in a relatively inert state. In pace with the increasing attention of interpretative theory, it is essential to urgently require newer and better discoveries and more valuable research orientations to avoid blindness for the promotion of scientific and technological progress.

From the perspective of understanding the mechanics of artificial neural networks, various methods and contributions are introduced [11, 12]. Through the application of the Monte Carlo simulation for quantifying the variable’s importance, Olden et al. justified the rightness of the connection weight calculation method in neural networks [13]. Statistics pointed out that Sarle et al. proclaimed the relations between neural networks and the generalized linear model, maximum redundancy analysis, projection tracking, clustering analysis, and other statistical models and also transformed terms in neural networks into statistical terms [14]. Similarly, certain relations between artificial neural networks and statistical methods were proposed in [15].

Despite putting forward the abovementioned studies, the bottleneck of neural networks is not addressed. Originating from the simulation of neurons in biological concepts, the artificial neural network takes artificial neurons as nodes to construct a complete conduction structure. As a basic information processing unit, the neuron is formed by a dendrite, cell body (soma), axon, and synapse, as shown in Figure 1(a). To be specific, the dendrite receives information from the outside world, the cell body processes the information, and the axon and synapse transmit the signal and pass it on to other neurons, respectively [16]. The structure of biological neurons can be traced back to the 1940s, when McCulloch and Pitts jointly published the abstract neuron model McCulloch–Pitts (MP) [17] for the first time, as illustrated in Figure 1(b). Then, in 1949, the Hebb rule [18] was proposed based on the theory of the variability of synapse connections of neurons within the human brain. The adjusting weight method was introduced into machine learning, thereby laying the foundation for the learning algorithms.

Inspired by the structure of biological neurons and MP, Todo et al. proposed the dendritic neural model (DNM) [19]. Different from the traditional neural network model, the dendritic neural model is designed based on neuron conduction and single neuron. DNM obtains the support of the biological theory to simulate the biological neuron. Furthermore, DNM compensates for some defects of the homologous perceptron model, such as the inability to solve the XOR problem. At the same time, the novel study of the human brain [20] is also brought about.

As a classifier, shown in Figure 1(c), DNM has been applied to various classification problems. For example, Sha et al. classified the breast cancer dataset, and Jiang et al. detected the liver disorder [21] for assisted disease diagnosis. Apart from the development of medical aid applications, an unconventional method was also applied to the financial field. To improve the classification performance, metaheuristic algorithms were introduced to train the hyperparameters of DNM [22, 23]. Through the use of the decision tree, Luo et al. initialized the model to realize better effectiveness [24]. For solving a generalized large-scale classification problem, Jia et al. suggested a reconciliation method with DNM by using a particle antagonism mechanism, and Ji et al. proposed a DNM-based multiobjective evolutionary algorithm [25]. In terms of feature selection, Song et al. addressed the high-dimensional challenge [26], and Gao et al. also showed the expansibility and flexibility of DNM for diverse applications [27]. Utilizing the multiplication operation that is useful to the information processing for a single neuron, the computing in synapses is imaginatively described using sigmoid functions. It is advantageous to establish the morphology of a neuron by determining the values of the parameters in synapses since the output of synapses can effectively represent signals. Nevertheless, it is noted that the single neuron is limited in partial application scenarios. In [28], the binary classification results of DNM were incorporated to undertake multiple classification tasks and thus recognize the multiclassification datasets.

By adopting the quality of service (QoS) as the evaluation dataset, this study implements the multiclassification of web service selection. QoS is defined as the fact that a network utilizes a range of basic technologies to provide superior service capabilities for the designated network communication [29]. As a security mechanism of the network and a technology, QoS is carried out to deal with network delay [30], blocking [31], and other problems. For a general situation, the common network bandwidth as a significant metric is instanced in order to illustrate QoS. When the standard of service quality has not appeared, the network environment treats all services and applications in an equal way, resulting in a disordered situation, as shown in Figure 2(a) where the colored area stands for different web services and applications. In other words, when a network device does not have the capacity of QoS, the network environment will be threatened, and a bottleneck will be created [32]. As shown in Figure 2(b), prioritization from the perspective of QoS provides a more orderly, efficient, and stable network environment. QoS contains a set of nonfunctional attributes, which is the measure and criteria of such characteristics of the web services, such as reliability and response time, to effectively classify and sort different services.

Web services refer to some software modules running on the network, which are service-oriented and based on distributed programs. Due to the fact that the web service employs general Internet standards, such as HTTP and XML (a subset of the standard generalized markup language) [33], human beings then have access to data on the web via various terminal devices in various places. In this article, the described web service is different from the common network application. It generally refers to some application modules, such as the network protocol and method, which is the basis of network applications. With the development of the Internet, many candidate services have implemented the same task, and most of them have the same functions but different nonfunctional characteristics. As a result, these services are divided into different service quality levels. Overlapping is seen to be inevitable because of the existence of a wide range of web services on the network. Based on the QoS, web service selection is considered an effective solution [34]. As network technology and operation concepts develop rapidly, web services are becoming the latest technology and development trend for constructing distributed, modular, and service-oriented applications.

Based on DNM, this article proposes a multiple dendritic neural network (MDNN) with multiple single neurons to achieve the multiclassification of web service selection based on QoS. To adjust the multiclassification mechanism, the structure of DNM introduced in Figure 3 is reconstructed. For the purpose of accelerating the gradient descent and improving the multiclassification accuracy, the backpropagating algorithm and adaptive moment estimation optimization are derived for the first time. Experiments are carried out on the Quality of Web Service dataset and nine UCI multiclassification datasets [35]. In the comparison between MDNN and nine state-of-the-art classifiers, the superiority of the proposed method is demonstrated.

The contributions are majorly classified into the aspects as follows: 1) a novel multiple single-neuron neural network for multiclassification tasks is developed. 2) The potential and application scenarios of the dendritic neural network are explored. 3) A new approach for QoS-driven web service selection is proposed.

Given as follows is the organization of the remaining parts of this article: Section 2 presents the structure of the multiple dendritic neural networks. Section 3 elaborates on the learning processes of the proposed method and expounds on the optimization strategies. The comparison with other algorithms and experimental results are shown in Section 4. At last, Section 5 concludes the paper and formulates future work.

2. The Dendritic Neuron Network-Based Multiclassifier Approach

The proposed multiclassifier is constructed by multiple single neurons. The general architecture is shown in Figure 4. As for each neuron, , the input of the model is preprocessed by using a nonlinear sigmoid filter. To differentiate neurons, the function introduces the subscript , which is defined as follows:where is the number of attributes of the sample, is the number of nodes within the hidden layer, and is the number of classifications of output results. In addition, the weight and threshold denote the neural network parameters in the training stage and are randomly initialized within (0, 0.01) and 0, respectively.

In contrast to the perceptron model, a quadrature method is adopted for the hidden layer to not only rule out the inhibited neuronal excitation but also enhance the activated neuronal excitation. The formula is described as follows:

Eq. (2) means that all of the hidden layers are activated, which is equivalent to a logical AND. Eq. (3) is equivalent to a logical OR where all inhibited neuronal excitations from the former layer are suppressed exclusively and the rest are reactivated. As a result, the multiclassification structure of multiple neurons is formed.

Apart from the dendritic mechanism, MDNN utilizes the normalized exponential function to output final results. For ease of consistency in representation, the illustrated style of the normalized exponential function is followed. To be noted, the output of multiple neurons is processed by all information from the previous layer instead of being directly conveyed, expressed as follows:where is the possibility of the prediction for each class. The normalized exponential function converts the output value of the upper layer, , to the probability distribution with the range of [0, 1], and the sum of the probability values of each neuron being 1. The formula first converts the results into an exponential function, ensuring the nonnegative probability, and then normalizes the probability values into 1.

Since the prediction result follows the rules of the probability distribution, the cross-entropy function as the loss function is considered a proper substitute for mean square error, which is defined as follows:where represents the similarity of probability distribution between the prediction of the model and the actual classification and is the actual classification label.

3. Learning Mechanism and Optimization Strategies

The existing learning algorithms cannot be directly applied since MDNN is a new dendritic neuron model containing multiplication operators in its calculation. Accordingly, in this section, we for the first time derive the learning algorithms for our proposed MDNN, specifically one is the traditional error backpropagation, and the other is an Adam-like learning algorithm.

3.1. Backpropagation

In the course of learning samples, the model is promoted by the stochastic gradient descent of parameters and , which is described as follows:where as the learning rate is a positive constant. and denote the current iteration and the previous iteration in the training stage, respectively.

The error of the proposed MDNN is calculated by the cross-entropy function. According to the calculated error, the error backpropagation algorithm is introduced as the learning scheme. In backpropagation, all the samples or a batch of samples are involved. To better realize intuition, the relation among layers is shown in Figure 5.

and are expressed by the partial differential form as follows:

Since the model is trained by batches, and obtained by the gradient descent are finally calculated as follows:where denotes the size of input data within the current iteration.

Following the chain rule, the derivation procedures and results are presented according to the backpropagation. Firstly, the partial differential of error is calculated. By the empirical evidence of normalized exponential function, and are computed collectively instead of computing them separately.

The forward propagation for the multiclassification is not directly corresponding. Thus, the derivation of error is discussed in Cases (1) and (2). To avoid confusion, the subscripts of and are redefined as and , respectively. Their relation is simplified as follows:

On the basis of Equations. (5) and (4), when , there is Case (1):

When , there is Case (2):

We incorporate Cases (1) and (2) into the following formula:

Thus, is expressed as follows:

For the rest layers of MDNN, they are derived according to Equations. (3) and (2) as follows:

Taking the derivative of Equation. (1) with the sigmoid function, and are obtained as follows:

3.2. Adam-Like Optimization

For improving the convergence and classification ability of the proposed model, inspired by the well-known adaptive moment estimation (Adam) [36], an Adam-like learning algorithm for MDNN is also introduced to accelerate the gradient descent without diverging. The way of updating weights in each iteration is optional. The traditional way mentioned in Section 3.1 or Adam can be altered according to the user’s setting.

As an extended optimization strategy of stochastic gradient descent (SGD) [37], momentum [38, 39] is introduced to reduce the oscillation and accelerate the gradient descent. The fundamental concept of gradient descent with momentum lies in updating the weight by calculating the exponentially weighted average of the gradient as follows:where is a positive constant to smooth out the gradient descent process. Intuitively, and are interpreted as the acceleration in physics. and are regarded as the velocity, and is seen as the friction. In addition, and accelerate the gradient descent and gain the velocity and , and the friction prevents the acceleration.

In this case, the updates of parameters and are modified as follows:

Serving as a crucial part of Adam, the root mean square prop (RMSprop) [40] auxiliary accelerates the gradient descent as follows:where is a positive constant similar to . is calculated as follows:

Similarly, is obtained by

In order to avoid the bias of exponentially weighted average in the initial learning stage, Equations. (17), (18), (21), and (23) are modified to obtain more accurate results as follows:

For the acceleration of the gradient descent, Adam combines RMSprop with momentum. Thus, based on Equations. (23) and (24), the parameter updating equations optimized by Adam are expressed as follows:where is an infinitesimal so as to prevent the computation overflow.

4. Experimental Evaluation

4.1. Experimental Setup

The quality of web service (QWS) dataset [41, 42] is a real-world dataset based on the quality of service. Several versions of QWS are available. In this study, experiments use the original version, which consists of 364 web services. Its quality is described by a total of 10 nonfunctional attribute indexes. The QWS dataset divides the web service into 4 levels from the highest to the lowest, which are platinum, gold, silver, and bronze.

To avoid overfitting while improving the accuracy of results, data preprocessing strategies are adopted in the experiments. The raw data are normalized by using the rule of standardization:

Moreover, the normalized data are randomly divided into three parts: 70 percent for the training process, 15 percent for the testing process, and the remaining data for the validation, to reduce unnecessary time consumption.

4.2. Optimal Parameter Settings

All of the hyperparameters are illustrated in Table 1, which presents their description and values. Sigmoid, tanh, Rectified Linear Unit (ReLU), and Leaky ReLU are available to freely choose the suitable active function. For large datasets, the samples are divided into mini-batch and shuffled for the gradient descent during the training stage. If is equal to the number of samples during the iteration, then the batch gradient descent will be executed.

However, it is tricky to determine the epoch size. To enable the model to reach optimal performance, a self-adaptive appending training epoch is arranged in the training stage according to the convergence of the validation process. Beginning with the default configuration, the then adaptively increases pivoting on whether the gradient descent is approaching stagnation. At the same time, the initial value is set to 100 to avoid a higher epoch causing the time consumption. For general neural networks, experimental results are highly affected by the combination of parameters. Therefore, parameters, which are , , , and , are adjusted by the orthogonal experiment with 4 factors and 3 levels. The specific design is listed in Table 2. The optimal parameters of MDNN for QWS are finally shown in Table 3. The other parameters comply with the setting in Table 1.

4.3. Performance and Discussion

This section is divided into two subsections: experimental results of QWS analyzed by a variety of evaluation indicators are elaborated in Section 4.3.1, and Section 4.3.2 compares the proposed model with other multiclassification methods to demonstrate the prominent superiority of the proposed model.

4.3.1. Experimental Results

For the comprehensive performance evaluation of MDNN, the following statistical indicators are used: precision, recall, score, accuracy, and area under the curve (AUC) [43]. It is worth noting that the precision refers to the classification precision. For not only the convenience of the intuitive evaluation but also the justification of the following comparison, the macroaverage, which is the arithmetic average of performance indicators of all categories instead of instances, is adopted to statistically process classification results.

Table 4 shows the classification results of MDNN on the 4-class QWS dataset. The average values and optimum values represent the classification performance of MDNN. Although the dataset is technically unbalanced, such as the number of classes of QWS in Table 5, the stability and generalization ability of MDNN are considered to be effectively validated.

The five statistical indicators suggest that the classification achieved by MDNN for each class is effective, stable, and reliable. In their mean values, it is indicated that MDNN has good classification performance. The gradient descent optimization strategy effectively reduces the errors. Moreover, MDNN accelerates the gradient descent to maintain a continuous downward trend, thereby finally guaranteeing the generalization and robustness of MDNN.

4.3.2. Comparison of Methods

To further verify the efficiency of MDNN, nine classifiers in total are used to compare with MDNN on nine multiclassification datasets from the UCI machine learning repository and the QWS dataset. The information of datasets and parameter settings of MDNN are listed in Table 5. The nine classifiers consist of BP, SVM, KNN, CART, naïve Bayes, LDA, QDA, J48, and random forest [44]. In addition, the ten datasets include Iris, Wine, Vehicle, Balance scale, CMC, Seed, Vowel, Thyroid, Robot navigation, and QWS.

Experimental results are shown in Table 6 where the best result for each dataset among all compared methods is highlighted in bold. According to five statistical indicators, it can be found that MDNN has the most optimal values in comparison with other classifiers. On the Iris, Wine, Seed, and Thyroid datasets, MDNN gives the best performance, with perfect outcomes of 100 percent correctness. Also, MDNN performs well with unbalanced data such as Vowel, Thyroid, and QWS. As a result, MDNN’s classification performance is more constant than that of other methods. Nevertheless, MDNN appears to have a minor disadvantage on the large datasets, such as Balance scale, Vehicle, CMC, and Robot navigation, which seem to be constrained by the distribution of the network structure. In the comparison between MDNN and other classifiers, the superiority and effectiveness of the multiple dendritic neuron structure are verified.

The receiver operating characteristic (ROC) curves of ten multiclassification methods show the correct classification coverage of each class of the QWS dataset in Figure 6. It can be found that MDNN not only has a consistent performance on each class of QWS but also outperforms the other classifiers, thus indicating the effectiveness and stability of MDNN for the QWS classification and unbalanced multiclassification applications. Besides, experiments also demonstrate the efficiency and superiority of MDNN in terms of classification performance and stability.

4.4. Morphology and Logical Circle Realization

For the display of a data sample, the shuffle operation set in the pretraining period is at disposal. According to the initialization of and within Section 2, the synapses were calculated around 0.5 in the previous state. Through the training stage, the weights and were gradually stabilized. As shown in Figure 7, synaptic changes, thus, yield to accomplish the pruning of the redundant network structure.

It can be easily observed that neuron 1, neuron 2, and neuron 4 are fully inhibited in accordance with the rule of Equation. (2) and Equation. (3). Consequently, the structure of neuron 1, neuron 3, and neuron 4 is ruled out. To specify the states of dendrites, a total of four scenarios are listed as follows:

For the remainder neuron, the residual dendrite morphology is formed in Figure 8(a), which places the line of dashes indicating pruning. As mentioned previously in Section 2, logic-based inherent relations have existed within dendritic structures. Finally, since constant-1 holds no substantive impact on the attributes, the connections among dendrites are equivalent to logic OR. Thus, the hardware realization is transformed as illustrated in Figure 8(b), where the multiplexer is a 1 : 2 numerical compactor. In addition to showing the extendibility of MDNN, the overhead also indicates that MDNN can avoid overfitting by increasing the dendritic matrix sparsity.

5. Conclusion and Future Directions

This article puts forward a novel extended network of dendritic neurons, namely, the multiple dendritic neural network (MDNN). The architecture of MDNN is completely different from the previous DL models which are based on MP neuron models. By deriving its new learning algorithms, MDNN is for the first time able to resolve the multiclassification problems in comparison with previous single dendritic neuron models. Besides, we propose an approach to improve the interpretability of artificial neural networks with the theoretical support of neuroscience. Experiments are mainly carried out on a QoS-related application. In the comparison between MDNN and other classifiers, the superior performance of the proposed model is shown, and MDNN is also highly advantageous to small-scale unbalanced data. In view of this, the performance and efficiency of the proposed neural network are limited by scale. In the follow-up work, the deficiency of this experiment will be made up to improve the generalization ability [45, 46] and study the capabilities and limitations. Meanwhile, the exploration of applicable domains for MDNN will be conducted in the following aspects: 1) expanding research on more computer-related data mining to solve practical engineering problems, such as quality of service of mobile networks [47] and security bug report [48, 49]; 2) practicing in other forms of data structures, e.g., semantic [50]; 3) focusing on the unbalanced data [51] and simplifying the network structure adequately [52] with the practice.

Data Availability

The classification dataset could be downloaded freely at https://archive.ics.uci.edu/ml/index.php.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Authors’ Contributions

Qianyi Peng was involved in the investigation, methodology, software, visualization, validation, and writing the original draft. Shangce Gao was responsible for conceptualization, methodology, supervision, validation, and writing, review, and editing. Yirui Wang was involved in the conceptualization, software, methodology, writing, review, and editing, as well as supervision. Junyan Yi, Gang Yang, and Yuki Todo were responsible for conceptualization, methodology, and writing, review, and editing.

Acknowledgments

This research was partially supported by the JSPS KAKENHI Grant Number JP22H03643 and JP19K22891, JST SPRING Grant Number JPMJSP2145, and JST (the establishment of university fellowships towards the creation of science technology innovation) Grant Number JPMJFS2115.