Neural Computing Enhanced Parameter Estimation for Multi-Input and Multi-Output Total Non-Linear Dynamic Models

Liu, Longlong; Ma, Di; Azar, Ahmad Taher; Zhu, Quanmin

doi:10.3390/e22050510

Open AccessArticle

Neural Computing Enhanced Parameter Estimation for Multi-Input and Multi-Output Total Non-Linear Dynamic Models

¹

School of Mathematical Sciences, Ocean University of China, Qingdao 266000, China

²

Robotics and Internet-of-Things Lab (RIOTU), Prince Sultan University, Riyadh 11586, Saudi Arabia

³

Faculty of Computers and Artificial Intelligence, Benha University, 13511 Benha, Egypt

⁴

Department of Engineering Design and Mathematics, University of the West of England, Frenchy Campus Coldharbour Lane, Bristol BS16 1QY, UK

^*

Author to whom correspondence should be addressed.

Entropy 2020, 22(5), 510; https://doi.org/10.3390/e22050510

Submission received: 28 March 2020 / Revised: 24 April 2020 / Accepted: 26 April 2020 / Published: 30 April 2020

(This article belongs to the Special Issue Nonlinear Dynamics and Entropy of Complex Systems with Hidden and Self-Excited Attractors II)

Download

Browse Figures

Versions Notes

Abstract

:

In this paper, a gradient descent algorithm is proposed for the parameter estimation of multi-input and multi-output (MIMO) total non-linear dynamic models. Firstly, the MIMO total non-linear model is mapped to a non-completely connected feedforward neural network, that is, the parameters of the total non-linear model are mapped to the connection weights of the neural network. Then, based on the minimization of network error, a weight-updating algorithm, that is, an estimation algorithm of model parameters, is proposed with the convergence conditions of a non-completely connected feedforward network. In further determining the variables of the model set, a method of model structure detection is proposed for selecting a group of important items from the whole variable candidate set. In order to verify the usefulness of the parameter identification process, we provide a virtual bench test example for the numerical analysis and user-friendly instructions for potential applications.

Keywords:

parameter estimation; total non-linear model; neural networks; neuro-computing; gradient descent algorithm

1. Introduction

Because a total non-linear model can provide a very concise representation for complex non-linear systems and has good extrapolation characteristics, it has attracted the attention of academic research and applications. Compared with the polynomial non-linear auto-regressive moving average with exogenous input (NARMAX) model, the total non-linear model is an extension of the polynomial model, which can be defined as the ratio of two polynomial expressions [1,2,3]. The introduction of denominator polynomials makes the NARMAX model non-linear in parameters and regression terms. Therefore, compared with the polynomial model, the model identification and the controller design of the total non-linear model are much more challenging [4,5]. In view of the difficulty of parameter estimation of a total non-linear model, using simple and effective algorithm and machine learning should be considered for extracting the information from measurement data.

1.1. Literature Survey

At present, a variety of model structure detection techniques and parameter estimation algorithms are developed for non-linear models, including the orthogonal model structure detection and parameter estimation program [6], the generalized least square estimator [7,8], the prediction error estimator [9,10], the Kalman filter estimator [11,12], the genetic algorithm estimator [12,13], the artificial neural network estimator [14,15,16,17], etc. However, most of these algorithms are parameter estimators for polynomial non-linear models. Zhu and Billings have done a lot of research work on the parameter identification of a total non-linear model [7,8], and they put forward the parameter estimation method of a total non-linear model based on a back-propagation (BP) algorithm in 2003. They discussed the advantages of BP calculation in recognition of the classical model to provide the best combination of classical and neural network methods and provided a powerful tool for analyzing a large number of systems.

In [18], a back-propagation estimation formula based on neuro-computing was presented for estimating the total non-linear model parameters, where a pack of solutions were derived for the problems of parameter initialization, learning rate selection, stop criteria and model structure detection, and the convergence of a back-propagation estimator (BPE). However, Reference [18] only proposed a parameter estimation method for single-input and single-output (SISO) systems, and correspondingly the case studies. Expanding [18], this paper presents solutions for the parameter estimation of a total non-linear multi-input and multi-output (MIMO) model. Due to the complexity of a MIMO system, it is more difficult to estimate the model parameters, but they are more general in academic research and applications. For example, the parameters of a MIMO system are many more than those of a SISO system, and the parameters to be estimated each time will be multiplied, which increases the difficulty of estimation. Moreover, due to the coupling of multiple systems, the parameter values of each system also affect each other. The algorithms to estimate these parameters are not independent but interactive and complex. Because the components of different MIMO systems are different, the total connection neural network structure adopted in [18] is not suitable for estimating the parameters of MIMO systems. When the MIMO system is mapped into a neural network, the network structure is often asymmetric or non-completely connected (the neurons in the hidden layer are not connected with all the neurons in the input layer). That is to say, the network is not a common completely connected feedforward neural network, and the general BP algorithm cannot be directly applied to the estimation of the parameters. Therefore, the learning algorithm of the parameters must be properly derived. Due to the asymmetry of the network, the convergence of the network is also facing challenges. It is necessary to analyze the convergence of the network and give the specific conditions of the network convergence. A MIMO system needs to identify the parameters of a SISO system several times, and a MIMO system can have multiple inputs. In the simulation experiment, the parameters of the system should be estimated under different combinations of multiple inputs, and the performance of the network estimator should be verified. Therefore, the parameter identification of a MIMO system is much more challenging.

1.2. Motivation and Contributions

The authors of [19] presented a thorough analysis that included two kernel components, the SISO model and the orthogonal algorithms are parameter estimators for polynomial non-linear models such as predictive and back propagation computation. Since then, rational model identification has gone to diversified directions, such as more theoretical considerations of a non-linear least squares algorithm [4], a maximum likelihood estimation [3], and a biased compensation recursive least squares algorithm [2]. It has been noted that the MIMO rational model identification has seldom attracted research, probably due to the complexity in algorithm formulation and the coupling effect. However, this MIMO rational model identification should be a research agent now because of recent applications and increasing computing facilities.

The total non-linear system model, which is relatively new, is the alternative name of the NARMAX rational model, which was defined by a survey paper on the rational model identification [19]. The total non-linear model emphasizes the non-linearity in both the parameters and control inputs, and it has been taken as a challenging structure for designing non-linear dynamic control systems [1]. The rational model gives more consideration as expanded polynomials in math, structure detection, and parameter estimation in the field of system identification [2,3]. Therefore, the main contribution of the new study is to use neural computing algorithms for a MIMO model parameter estimation. The new study is a complement to those classical NAMAX approaches.

The rest of the paper is organized as follows. The total non-linear model is described in Section 2. Section 3 presents the gradient descent calculation of parameter estimation. Next, model structure detection is discussed in Section 4. A convergence analysis of an algorithm is presented in Section 5. Simulation results and discussions are demonstrated in Section 6. Finally, Section 7 includes the paper conclusions and some of the future aspects.

2. Total Non-Linear Model

In mathematics, the dynamic total non-linear model of a MIMO system with error can be defined as

y_{i} (t) = {\hat{y}}_{i} + e_{i} (t) = \frac{a_{i} (t)}{b_{i} (t)} + e_{i} (t) = \frac{a_{i} (u_{1}, u_{2}, \dots, u_{J}, y_{1}, y_{2}, \dots, y_{I}, e_{1}, e_{2}, \dots, e_{I})}{b_{i} (u_{1}, u_{2}, \dots, u_{J}, y_{1}, y_{2}, \dots, y_{I}, e_{1}, e_{2}, \dots, e_{I})} + e_{i} (t) i = 1, 2, \dots, I

(1)

a_{i} (t) = \sum_{k = 1}^{N} p_{k}^{n} (t) θ_{k}^{n}, b_{i} (t) = \sum_{k = 1}^{D} p_{k}^{d} (t) θ_{k}^{d} i = 1, 2, \dots, I

(2)

where

y (t) = [y_{1} (t), y_{2} (t), \dots, y_{I} (t)] \in R^{I}

and

\hat{y} (t) = [{\hat{y}}_{1} (t), {\hat{y}}_{2} (t), \dots, {\hat{y}}_{I} (t)] \in R^{I}

are the measured output and model output, respectively;

u (t) = [u_{1} (t), u_{2} (t), \dots, u_{J} (t)] \in R^{J}

is the input;

e (t) = [e_{1} (t), e_{2} (t), \dots, e_{I} (t)] \in R^{I}

is the model error; and

t = [1, 2, \dots, T] \in Z^{+ T}

is the sampling time index. Numerator

a_{i} (t) \in R

and denominator

b_{i} (t) \in R

as represented by polynomials, regression term

p_{k}^{n} (t)

, and

p_{k}^{d} (t)

are products of past inputs, outputs, and errors, such as

u_{1} (t - 1) y_{2} (t - 3)

,

u_{1} (t - 1) e_{2} (t - 2)

,

y_{2}^{3} (t - 1)

.

θ^{n} = [θ_{1}^{n}, θ_{2}^{n}, \dots, θ_{N}^{n}] \in R^{N}

, and

θ^{d} = [θ_{1}^{d}, θ_{2}^{d}, \dots, θ_{D}^{d}] \in R^{D}

are the parameter sets of

a_{i} (t)

and

b_{i} (t)

, respectively.

The task of parameter estimation is to extract the relevant parameter values from the measured input and output data for a given model structure. To form a regression expression for parameter estimation, multiplying

e_{i} (t)

of both sides of Formula (1) gives

y_{i} (t) b_{i} (t) - a_{i} (t) = b_{i} (t) e_{i} (t)

(3)

To consider the neuro-computing approach for parameter estimation, a total non-linear model is expressed into a non-completely connected feedforward neural network, as shown in Figure 1.

We define the network with an on both sides, Formula (11) is obtainedinput layer, a hidden layer, and an output layer, where:

(i): The input layer consists of regression terms $p_{k}^{n} (t) (k = 1, \dots, N)$ and $p_{k}^{d} (t) (k = 1, \dots, D)$ ; here, a neuron in the hidden layer is not connected to all the neurons in the input layer, that is, the network is a non-completely connected feedforward neural network.
(ii): The action function of the neurons in the hidden layer is linear, and the output of the hidden layer neurons is $a_{i} (t)$ or $b_{i} (t)$ .
(iii): The action function of the output layer neurons is linear, and the output of the ith output layer neuron is $b_{i} (t) e_{i} (t)$ .
(iv): The connection weights between the input layer neurons and the hidden layer neurons are the parameters $θ_{k}^{n}$ and $θ_{k}^{d}$ of the model.
(v): The connection weight between the hidden layer neurons and the ith output layer neurons are $- 1$ and the observed output $y_{i} (t)$ .

Leung and Haykin proposed a rational function neural network [20] but did not define a generalized total non-linear model structure or consider the relevant errors. Therefore, their parameter estimation algorithm could not provide an unbiased estimation for noise damaged data, which was essentially a special implementation of Zhu and Billings’s [7,8] methods in the case of no noise data. The method proposed in this paper is a further study of the method in Zhu [18]. The characteristics of a total non-linear model (1) are as follows:

(i): By setting parameter $i = 1$ , Zhu’s [18] model can be a special case of the model in Formula (1).
(ii): The model is non-linear in parameters and regression terms, which was caused by denominator polynomials.
(iii): When the denominator $b_{i} (t)$ of the model is close to 0, the output deviation would be large. In this paper, considering this point, division operation was avoided in the action function of the neuron when the neural network model was being built.
(iv): The structure of the neural network corresponding to the total non-linear model is a non-completely connected feedforward neural network, or a partially connected feedforward neural network. Therefore, the convergence of the network becomes a big problem, which is the difficulty of this paper.
(v): The model has a wide range of application prospects. In many non-linear system modeling and control applications, the total non-linear model has been gradually adopted. Some non-linear models, such as the exponential model $e^{x}$ , which describes the change of dynamic rate constant with temperature, cannot be directly used. The exponential model can be firstly transformed into a non-linear model ( $e^{x} = \frac{1 - \frac{x}{2} + \frac{x^{2}}{12}}{1 + \frac{x}{2} + \frac{x^{2}}{12}}$ ), and then, system identification can be implemented [19,21,22].

3. Gradient Descent Calculation of Parameter Estimation

For the convenience of the following derivations, set the output of neuron

i

in the output layer of the neural network as

f_{i} (t)

.

f_{i} (t) = b_{i} (t) e_{i} (t)

(4)

Define the error measure function of one iteration of network as:

E (t) = \frac{1}{2} \sum^{​} {(y_{i} (t) - \hat{y_{i}} (t))}^{2} = \frac{1}{2} \sum^{​} {(e_{i} (t))}^{2}

(5)

The Lyapunov method is often used to analyze the stability of a neural network [23]; similarly, the network parameters are estimated by minimizing the network error based on the Lyapunov method. It should be noted that when the total non-linear model is represented in the neural network structure of Figure 1, the parameter estimation of the model can be described as the training of neural network weight by minimizing the error

E (t)

in Formula (5).

In order to train the weights of the network, the learning algorithm based on the gradient descent is given by Formulas (6) and (7):

Δ θ_{k}^{n} = - η_{n} \frac{\partial E}{\partial θ_{k}^{n}} = - η_{n} e_{i} (t) \frac{\partial e_{i} (t)}{\partial θ_{k}^{n}}

(6)

Δ θ_{k}^{d} = - η_{d} \frac{\partial E}{\partial θ_{k}^{d}} = - η_{d} e_{i} (t) \frac{\partial e_{i} (t)}{\partial θ_{k}^{d}}

(7)

where

η_{n}

and

η_{d}

are learning rates.

By deriving Formula (4) from

θ_{k}^{n}

on both sides, Formula (8) is obtained:

\frac{\partial f_{i} (t)}{\partial θ_{k}^{n}} = \frac{\partial b_{i} (t)}{\partial θ_{k}^{n}} e_{i} (t) + b_{i} (t) \frac{\partial e_{i} (t)}{\partial θ_{k}^{n}} \frac{\partial e_{i} (t)}{\partial θ_{k}^{n}} = \frac{1}{b_{i} (t)} (\frac{\partial f_{i} (t)}{\partial θ_{k}^{n}} - \frac{\partial b_{i} (t)}{\partial θ_{k}^{n}} e_{i} (t)) = \frac{1}{b_{i} (t)} \frac{\partial f_{i} (t)}{\partial θ_{k}^{n}} = \frac{1}{b_{i} (t)} \frac{\partial (y_{i} (t) b_{i} (t) - a_{i} (t))}{\partial θ_{k}^{n}} = - \frac{1}{b_{i} (t)} \frac{\partial a_{i} (t)}{\partial θ_{k}^{n}} = - \frac{p_{k}^{n} (t)}{b_{i} (t)}

(8)

Substituting Formula (8) into Formula (6) to get Formula (9), we can then get Formula (10):

Δ θ_{k}^{n} = - η_{n} e_{i} (t) \frac{\partial e_{i} (t)}{\partial θ_{k}^{n}} = η_{n} e_{i} (t) \frac{p_{k}^{n} (t)}{b_{i} (t)}

(9)

θ_{k}^{n} (t + 1) = θ_{k}^{n} (t) + Δ θ_{k}^{n} = θ_{k}^{n} (t) + η_{n} e_{i} (t) \frac{p_{k}^{n} (t)}{b_{i} (t)}

(10)

By deriving Formula (4) from

θ_{k}^{d}

on both sides, Formula (11) is obtained:

\frac{\partial f_{i} (t)}{\partial θ_{k}^{d}} = \frac{\partial b_{i} (t)}{\partial θ_{k}^{d}} e_{i} (t) + b_{i} (t) \frac{\partial e_{i} (t)}{\partial θ_{k}^{d}} \frac{\partial e_{i} (t)}{\partial θ_{k}^{d}} = \frac{1}{b_{i} (t)} (\frac{\partial f_{i} (t)}{\partial θ_{k}^{d}} - \frac{\partial b_{i} (t)}{\partial θ_{k}^{d}} e_{i} (t)) = \frac{1}{b_{i} (t)} (\frac{\partial (y_{i} (t) b_{i} (t) - a_{i} (t))}{\partial θ_{k}^{d}} - \frac{\partial b_{i} (t)}{\partial θ_{k}^{d}} e_{i} (t)) = \frac{1}{b_{i} (t)} (y_{i} (t) p_{k}^{d} (t) - p_{k}^{d} (t) e_{i} (t)) = \frac{1}{b_{i} (t)} (y_{i} (t) - e_{i} (t)) p_{k}^{d} (t) = \frac{1}{b_{i} (t)} \frac{a_{i} (t)}{b_{i} (t)} p_{k}^{d} (t) = \frac{a_{i} (t)}{b_{i}^{2} (t)} p_{k}^{d} (t)

(11)

Substituting Formula (11) into Formula (8) to get Formula (12), we then get Formula (13):

Δ θ_{k}^{d} = - η_{d} e_{i} (t) \frac{\partial e_{i} (t)}{\partial θ_{k}^{d}} = - η_{d} e_{i} (t) \frac{a_{i} (t)}{b_{i}^{2} (t)} p_{k}^{d} (t)

(12)

θ_{k}^{d} (t + 1) = θ_{k}^{d} (t) + Δ θ_{k}^{d} = θ_{k}^{d} (t) - η_{d} e_{i} (t) \frac{a_{i} (t)}{b_{i}^{2} (t)} p_{k}^{d} (t)

(13)

The gradient descent algorithm for parameter estimation of a total non-linear model is summarized in Algorithm 1.

Algorithm 1. Gradient Descent Algorithm

1: Initialization: The weights of the neural network (parameters of a total non-linear model) are set as random little numbers with uniform distribution; the average value is zero, and the variance is small. Set the maximum number of iterations T, the minimum error ε, and the maximum number of samples

P

.
2: Generate training sample set {

X

,

Y

} of the neural network according to Formula (1), where

X = {X_{1}, X_{2}, \dots, X_{I}}

,

Y

= {

Y_{1}, Y_{2}, \dots, Y_{I}

},

X_{i} ∋

{

p_{1}^{n} (t)

,

p_{2}^{n} (t)

,…,

p_{N}^{n} (t), p_{1}^{d}

,

p_{2}^{d}, \dots, p_{D}^{d}

},

Y_{i}

={

y_{i} (t)

}.
3: Input a training sample p to the neural network.
4: Calculate the output value

a_{i} (k)

,

y_{i} (t) e_{i} (t)

and

f_{i} (t)

of the neurons in the hidden layer and the output layer according to Formulas (2), (3), and (4), respectively.
5: Adjust the weight of the neural network according to Formulas (10) and (13).
6: Calculate the error

E (t)

according to Formula (4) and calculate the total error according to Formula (14).

E = \sum^{​} E (t)

(14)

7:

p = p + 1

8: If p > P, then t = t + 1; otherwise, run step 3.
9: If

E <

ε or

t > T

, stop training; otherwise, run step 3.

4. Model Structure Detection

Model structure detection is to select important items from a rather large model set (usually called the whole item set) and determine the sub-model with important items [18]. Because of the powerful self-learning and associative memory function of an artificial neural network [24], it is the first-choice tool to identify the model structure. When identifying systems with unknown structures, it is important to avoid losing these important items in the final model. For the structure detection of a total non-linear model, the connection weight estimation in the neural network, that is, the parameter estimation of the total non-linear model, could be used to select the significant terms.

For the important and unimportant items in the whole model item set, the knock-out algorithm is adopted. First, remove the items that lead to the increase of network error, and then knock out the items with lighter weight according to the requirements of significance. Finally, test the error of the non-linear model composed of the remaining items. The specific algorithm is summarized in Algorithm 2.

Algorithm 2. Knock-Out Algorithm

1: Using the network structure shown in Figure 1, all the items contained in the whole items set are taken as the input of the network.
2: The algorithm in Section 3 is used to train the network, and network error

E_{1}

is obtained.
3: A new network structure is obtained by randomly removing a network input. The algorithm in Section 3 is used to train the new network, and network error

E_{2}

is obtained. If

E_{2} \leq E_{1}

, then

E_{1} = E_{2}

. Otherwise, this operation should be invalid (the input is reserved).
4: Another input is selected, and step 3 is executed again until all the input items are executed once.
5: The N connection weights between the input layer and the hidden layer are sorted in descending order. The first n weights are selected to make the significance reach 95%. Meanwhile, Formulas (15) and (16) are met, and the network input items corresponding to the first n weights are retained.

\frac{\sum_{i = 1}^{n} | w_{i} |}{\sum_{i = 1}^{N} | w_{i} |} \geq 0.95

(15)

\frac{\sum_{i = 1}^{n - 1} | w_{i} |}{\sum_{i = 1}^{N} | w_{i} |} < 0.95

(16)

In the above process, the neural network is not only used to estimate the parameters of the model but also to detect the structure of the model and analyze the significance of the regression term.

5. Convergence Analysis of the Algorithm

Convergence proof:

Assuming that a connection weight of the neural network shown in Figure 1 is changed, this weight can take any value. When the weight

θ_{k}^{n}

corresponds to the regression term parameter of the numerator of the total non-linear model, the resulting network error changes as follows (remove the lower corner marks in Formula (2) for the convenience of proof):

y (t) = \frac{a (t)}{b (t)} + e (t)

(17)

Substitute Formula (2) into Formula (1) to get Formula (18):

y (t) - \frac{\sum_{n = 1}^{N} p_{k}^{n} (t) θ_{k}^{n}}{b (t)} = e (t)

(18)

When

θ_{k}^{n}

is updated, (18) becomes (19):

y (t) - \frac{\sum_{n = 1, n \neq j}^{N} p_{k}^{n} (t) θ_{k}^{n} + p_{k}^{n} (t) (θ_{k}^{n} + Δ θ_{k}^{n})}{b (t)} = \tilde{e} (t)

(19)

\tilde{e} (t)

is the new error of the neural network after the weight has been updated. Subtract Formula (18) from Formula (19) to get Formulas (20) and (21):

\tilde{e} (t) - e (t) = - \frac{p_{k}^{n} (t) Δ θ_{k}^{n}}{b (t)} = - η_{n} {(\frac{p_{k}^{n} (t)}{b (t)})}^{2} e (t) \tilde{e} (t) = (1 - η_{n} (\frac{p_{k}^{n} (t)}{b (t)})^{2}) e (t)

(20)

\tilde{e} {(t)}^{2} = (1 - η_{n} (\frac{p_{k}^{n} (t)}{b (t)})^{2})^{2} e {(t)}^{2}

(21)

In order to ensure

\tilde{e} {(t)}^{2} \leq e {(t)}^{2}

,

- 1 \leq 1 - η_{n} {(\frac{p_{k}^{n} (t)}{b (t)})}^{2} \leq 1

, namely:

{\begin{matrix} η_{n} {(\frac{p_{k}^{n} (t)}{b (t)})}^{2} \leq 2 \\ η_{n} {(\frac{p_{k}^{n} (t)}{b (t)})}^{2} \geq 0 \end{matrix}

(22)

Solving Formula (22) gives:

0 \leq η_{n} \leq \frac{2 b {(t)}^{2}}{p_{k}^{n} {(t)}^{2}}

(23)

When the changed weight

θ_{k}^{d}

corresponds to the regression parameter of the denominator of the total non-linear model, the resulting network error change is as follows:

y (t) - \frac{a (t)}{\sum_{d = 1}^{D} p_{k}^{d} (t) θ_{k}^{d}} = e (t)

(24)

y (t) - \frac{a (t)}{\sum_{d = 1, d \neq j}^{D} p_{k}^{d} (t) + p_{k}^{d} (t) (θ_{k}^{d} + Δ θ_{k}^{d})} = \tilde{e} (t)

(25)

Subtracting Formula (24) from Formula (25) gives

\tilde{b} (t)

as the new denominator of the neural network after the weight has been updated.

\tilde{e} (t) - e (t) = \frac{(\tilde{b} (t) - b (t)) a (t)}{\tilde{b} (t) b (t)} = p_{k}^{d} (t) Δ θ_{k}^{d} \frac{a (t)}{\tilde{b} (t) b (t)} = - η_{d} e (t) \frac{\partial e (t)}{\partial θ_{k}^{d}} p_{k}^{d} (t) \frac{a (t)}{\tilde{b} (t) b (t)} = - η_{d} e (t) \frac{a {(t)}^{2}}{\tilde{b} (t) b {(t)}^{3}} p_{k}^{d} {(t)}^{2}

(26)

\tilde{e} {(t)}^{2} = {(1 - η_{d} \frac{a {(t)}^{2}}{\tilde{b} (t) b {(t)}^{3}} p_{k}^{d} {(t)}^{2})}^{2} e {(t)}^{2}

(27)

In order to satisfy

\tilde{e} {(t)}^{2} \leq e {(t)}^{2}

, namely,

- 1 \leq 1 - η_{d} \frac{a {(t)}^{2}}{\tilde{b} (t) b {(t)}^{3}} p_{k}^{d} {(t)}^{2} \leq 1

, that is:

{\begin{matrix} η_{d} \frac{a {(t)}^{2}}{\tilde{b} (t) b {(t)}^{3}} p_{k}^{d} {(t)}^{2} \leq 2 \\ η_{d} \frac{a {(t)}^{2}}{\tilde{b} (t) b {(t)}^{3}} p_{k}^{d} {(t)}^{2} \geq 0 \end{matrix}

(28)

Because the learning coefficient is too large, the training effect of the network is not effective; accordingly, we take

0 \leq η_{n} \leq 1

to get

\tilde{b} (t) b (t) > 0

, and thus, it has:

0 \leq η_{d} \leq \frac{2 \tilde{b} (t) b {(t)}^{3}}{a {(t)}^{2} p_{k}^{d} {(t)}^{2}}

(29)

To sum up, the network is convergent when the following conditions are met:

1. 0 \leq η_{n} \leq \frac{2 b {(t)}^{2}}{p_{k}^{n} {(t)}^{2}}

2. 0 \leq η_{d} \leq \frac{2 \tilde{b} (t) b {(t)}^{3}}{a {(t)}^{2} p_{k}^{d} {(t)}^{2}}

Under these two conditions, this algorithm provides a convergence estimate for the parameters of the total non-linear model.

6. Simulation Results and Discussions

Consider a representative example of a total non-linear model:

y_{1} (t) = \frac{0.5 y_{1} (t - 1) + 0.8 y_{2}^{3} (t - 2) + u_{1} (t - 1)}{1 + y_{1}^{2} (t - 1) + u_{2}^{2} (t - 1)} + r_{1} (t) = \frac{θ_{1} y_{1} (t - 1) + θ_{2} y_{2}^{3} (t - 2) + θ_{3} u_{1} (t - 1)}{1 + θ_{4} y_{1}^{2} (t - 1) + θ_{5} u_{2}^{2} (t - 1)} + r_{1} (t)

(30)

y_{2} (t) = \frac{0.2 y_{2} (t - 1) - 0.5 y_{1}^{2} (t - 2) + u_{2} (t - 1)}{1 + y_{2}^{2} (t - 1) + u_{2}^{2} (t - 1)} + r_{2} (t) = \frac{θ_{6} y_{2} (t - 1) - θ_{7} y_{1}^{2} (t - 2) + θ_{8} u_{2} (t - 1)}{1 + θ_{9} y_{2}^{2} (t - 1) + θ_{10} u_{2}^{2} (t - 1)} + r_{2} (t)

(31)

Because the disturbance of input data will cause interference to the estimation of parameters [25], in this section, the parameter estimation for different inputs was selected. Firstly, for the simulation system without noise, 2000 pairs of input/output data were used as data sets for uniform sampling in 20 cycles, and the learning rate was designed as a linear attenuation sequence (in 50 iterations, the learning rate decreases from η₀ = 0.5 to η_end = 0.02). The algorithm in this paper was used to estimate 10 parameters at the same time. Table 1, after 50 iterations, shows the estimated values and mean square deviation of parameters.

The inputs

u_{1} (t)

and

u_{2} (t)

of the system are either a sine wave or square wave with an amplitude of 2. Figure 2 shows the difference between the measured value

y_{1} (t)

(the real output value of Formula (6.1)) and the output value

{\hat{y}}_{1} (t)

obtained using the parameter estimator when inputs

u_{1} (t)

and

u_{2} (t)

are both sine waves. In the same way, Figure 3 shows the difference between measured value

y_{2} (t)

and output value

{\hat{y}}_{2} (t)

when inputs

u_{1} (t)

and

u_{2} (t)

are both sine waves. Figure 4 shows the difference between measured value

y_{1} (t)

and output value

{\hat{y}}_{1} (t)

when input

u_{1} (t)

is a sine wave and

u_{2} (t)

is a square wave.

Figure 5 shows the difference between measured value

y_{2} (t)

and output value

{\hat{y}}_{2} (t)

when input

u_{1} (t)

is a sine wave and

u_{2} (t)

is square wave. Figure 6 shows the difference between measured value

y_{1} (t)

and output value

{\hat{y}}_{1} (t)

when input

u_{1} (t)

and

u_{2} (t)

are both square waves. Figure 7 shows the difference between measured value

y_{2} (t)

and output value

{\hat{y}}_{2} (t)

when input

u_{1} (t)

and

u_{2} (t)

are both square waves. It can be seen that when the inputs are both sine waves, the accuracy of parameter estimation is the highest, while when the inputs are square waves, the estimation accuracy of the parameters is relatively lower. This is because when the inputs are square waves, the output of the system has an overshoot, that is to say, the observation value of the system itself has an error. Training the network with error data will certainly lead to an error of parameter estimation; especially the estimation error of

θ_{2}

is the highest. Here,

θ_{2}

is the parameter of

y_{2}^{3} (t - 2)

because

y_{2}^{3} (t - 2)

is the third power of the output, which further amplifies the error. Substituting the value of

y_{2}^{3} (t - 2)

into the update of

θ_{2}

would inevitably lead to an estimation error.

For a system with noise interference, it is more difficult to estimate its parameters because of the error of the measured value itself [26,27]. In order to further verify the performance of the estimator, a system with noise was selected for parameter estimation, that is, by adding a noise signal to the original system. The noises

r_{1} (t)

and

r_{2} (t)

were random uniform sequences, where each mean value was zero, each variance was 0.1, and each signal-to-noise ratio was 10. We repeated the previous experiment for the system with noise, and the experimental results are shown in Table 2. The difference between the measured value and the estimated value of the system output is shown in Figure 8, Figure 9, Figure 10, Figure 11, Figure 12 and Figure 13.

In order to detect the structure of the model, 20 items including 10 items in Formula (6.1) were used as the whole items set of models. The newly added numerator term is of order 1, the denominator term is of order 2, and the input lag and output lag are both of order 1. Using the knock-out algorithm in Section 4, the final 10 items of the model are in good agreement with those in Formula (6.1).

From the above experimental results, the estimation accuracy of the algorithm proposed in this paper is acceptable, and the mean square deviations are all less than 0.003. This level of error is acceptable.

7. Conclusions

In this paper, the parameter estimation of a SISO rational model was extended to that of a MIMO total non-linear model. A method of parameter estimation of a MIMO non-linear rational model based on a gradient descent algorithm was proposed, and the convergence condition was proposed for the asymmetry of the network. It was proven that the estimator is properly effective by mathematical derivation and simulation. This estimation method has a strong generalization property and could be widely used in many fields, such as non-linear system modeling and control applications. Some systems that could not directly use this method, such as the exponential model describing the change of the kinetic rate constant with the temperature, could first be converted into a rational model and then use the developed estimation method. Some of the future work could be foreseen as (1) estimating the parameters of the state space model based on an artificial neural network, (2) estimating the parameters of a MIMO state space model, (3) estimating the parameters of the non-linear state space model, and (4) estimating the parameters of total non-linear spatial state models.

Author Contributions

Conceptualization, L.L., D.M., Q.Z.; Formal analysis L.L., D.M., Q.Z., A.T.A.; Funding acquisition, A.T.A.; Investigation, D.M., A.T.A., Q.Z; Methodology, L.L., D.M., Q.Z., A.T.A.; Resources, L.L., Q.Z., A.T.A.; Software, L.L., D.M.; Supervision Q.Z., A.T.A.; Visualization, L.L., D.M., A.T.A; Writing – original draft, L.L., D.M., Q.Z.; Writing – review & editing, L.L., D.M., Q.Z., A.T.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research is funded by Prince Sultan University, Riyadh, Kingdom of Saudi Arabia. Special acknowledgement to Robotics and Internet-of-Things Lab (RIOTU), Prince Sultan University, Riyadh, Saudi Arabia. We would like to show our gratitude to Prince Sultan University, Riyadh, Saudi Arabia.

Conflicts of Interest

The authors declare no conflict of interest.

References

Billings, S.A.; Chen, S. Identification of non-linear rational systems using a prediction-error estimation algorithm. Int. J. Syst. Sci. 1989, 20, 467–494. [Google Scholar] [CrossRef]
Billings, S.A.; Zhu, Q.M. Rational model identification using an extended least-squares algorithm. Int. J. Control 1991, 54, 529–546. [Google Scholar] [CrossRef]
Sontag, E.D. Polynomial Response Maps. Lecture Notes in Control & Information Sciences; Springer: Berlin/Heidelberg, Germany, 1979; Volume 13. [Google Scholar]
Narendra, K.S.; Parthasarathy, K. Identification and control of dynamical systems using neural networks. IEEE Trans. Neural Netw. 2002, 1, 4–27. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Zhu, Q.M.; Ma, Z.; Warwick, K. Neural network enhanced generalised minimum variance self-tuning controller for nonlinear discrete-time systems. IEE Proc. Control Theory Appl. 1999, 146, 319–326. [Google Scholar] [CrossRef]
Billings, S.A.; Zhu, Q.M. A structure detection algorithm for nonlinear dynamic rational models. Int. J. Control 1994, 59, 1439–1463. [Google Scholar] [CrossRef]
Zhu, Q.M.; Billings, S.A. Recursive parameter estimation for nonlinear rational models. J. Syst. Eng. 1991, 1, 63–67. [Google Scholar]
Zhu, Q.M.; Billings, S.A. Parameter estimation for stochastic nonlinear rational models. Int. J. Control 1993, 57, 309–333. [Google Scholar] [CrossRef]
Aguirre, L.A.; Barbosa, B.H.G.; Braga, A.P. Prediction and simulation errors in parameter estimation for nonlinear systems. Mech. Syst. Signal Process. 2010, 24, 2855–2867. [Google Scholar] [CrossRef]
Huo, M.; Duan, H.; Luo, D.; Wang, Y. Parameter Estimation for a VTOL UAV Using Mutant Pigeon Inspired Optimization Algorithm with Dynamic OBL Strategy. In Proceedings of the 2019 IEEE 15th International Conference on Control and Automation (ICCA), Edinburgh, UK, 16–19 July 2019; pp. 669–674. [Google Scholar]
Zhu, Q.M.; Yu, D.; Zhao, D. An Enhanced Linear Kalman Filter (EnLKF) algorithm for parameter estimation of nonlinear rational models. Int. J. Syst. Sci. 2016, 48, 451–461. [Google Scholar] [CrossRef] [Green Version]
Türksen, Ö.; Babacan, E.K. Parameter Estimation of Nonlinear Response Surface Models by Using Genetic Algorithm and Unscented Kalman Filter. In Chaos, Complexity and Leadership 2014; Erçetin, S., Ed.; Springer Proceedings in Complexity; Springer: Cham, Switzerland, 2016. [Google Scholar]
Billings, S.A.; Mao, K.Z. Structure detection for nonlinear rational models using genetic algorithms. Int. J. Syst. Sci. 1998, 29, 223–231. [Google Scholar] [CrossRef]
Plakias, S.; Boutalis, Y.S. Lyapunov Theory Based Fusion Neural Networks for the Identification of Dynamic Nonlinear Systems. Int. J. Neural Syst. 2019, 29, 1950015. [Google Scholar] [CrossRef] [PubMed]
Kumar, R.; Srivastava, S.; Gupta, J.R.P.; Mohindru, A. Diagonal recurrent neural network based identification of nonlinear dynamical systems with lyapunov stability based adaptive learning rates. Neurocomputing 2018, 287, 102–117. [Google Scholar] [CrossRef]
Chen, S.; Liu, Y. Robust Distributed Parameter Estimation of Nonlinear Systems with Missing Data over Networks. IEEE Trans. Aerosp. Electron. Syst. 2019. [Google Scholar] [CrossRef]
Zhu, Q.M. An implicit least squares algorithm for nonlinear rational model parameter estimation. Appl. Math. Model. 2005, 29, 673–689. [Google Scholar] [CrossRef]
Zhu, Q.M. A back propagation algorithm to estimate the parameters of non-linear dynamic rational models. Appl. Math. Model. 2003, 27, 169–187. [Google Scholar] [CrossRef]
Zhu, Q.M.; Wang, Y.; Zhao, D.; Li, S.; Billings, S.A. Review of rational (total) nonlinear dynamic system modelling, identification, and control. Int. J. Syst. Sci. 2015, 46, 2122–2133. [Google Scholar] [CrossRef]
Leung, H.; Haykin, S. Rational function neural network. Neural Comput. 1993, 5, 928–938. [Google Scholar] [CrossRef]
Jain, R.; Narasimhan, S.; Bhatt, N.P. A priori parameter identifiability in models with non-rational functions. Automatica 2019, 109, 108513. [Google Scholar] [CrossRef]
Kambhampati, C.; Mason, J.D.; Warwick, K. A stable one-step-ahead predictive control of nonlinear systems. Automatica 2000, 36, 485–495. [Google Scholar] [CrossRef]
Kumar, R.; Srivastava, S.; Gupta, J.R.P. Lyapunov stability-based control and identification of nonlinear dynamical systems using adaptive dynamic programming. Soft Comput. 2017, 21, 4465–4480. [Google Scholar] [CrossRef]
Ge, H.W.; Du, W.L.; Qian, F.; Liang, Y.C. Identification and control of nonlinear systems by a time-delay recurrent neural network. Neurocomputing 2009, 72, 2857–2864. [Google Scholar] [CrossRef]
Verdière, N.; Zhu, S.; Denis-Vidal, L. A distribution input–output polynomial approach for estimating parameters in nonlinear models. Application to a chikungunya model. J. Comput. Appl. Math. 2018, 331, 104–118. [Google Scholar] [CrossRef]
Li, F.; Jia, L. Parameter estimation of hammerstein-wiener nonlinear system with noise using special test signals. Neurocomputing 2019, 344, 37–48. [Google Scholar] [CrossRef]
Chen, C.Y.; Gui, W.H.; Guan, Z.H.; Wang, R.L.; Zhou, S.W. Adaptive neural control for a class of stochastic nonlinear systems with unknown parameters, unknown nonlinear functions and stochastic disturbances. Neurocomputing 2017, 226, 101–108. [Google Scholar] [CrossRef] [Green Version]

Figure 1. Structure of a neural network corresponding to a total non-linear model.

Figure 2. Error of

y_{1} (t)

with sine–sine input.

Figure 2. Error of

y_{1} (t)

with sine–sine input.

Figure 3. Error of

y_{2} (t)

with sine–sine input.

Figure 3. Error of

y_{2} (t)

with sine–sine input.

Figure 4. Error of

y_{1} (t)

with sine–square input.

Figure 4. Error of

y_{1} (t)

with sine–square input.

Figure 5. Error of

y_{2} (t)

with sine–square input.

Figure 5. Error of

y_{2} (t)

with sine–square input.

Figure 6. Error of

y_{1} (t)

with square–square input.

Figure 6. Error of

y_{1} (t)

with square–square input.

Figure 7. Error of

y_{2} (t)

with square–square input.

Figure 7. Error of

y_{2} (t)

with square–square input.

Figure 8. Error of

y_{1} (t)

with noise and sine–sine input.

Figure 8. Error of

y_{1} (t)

with noise and sine–sine input.

Figure 9. Error of

y_{2} (t)

with noise and sine–sine input.

Figure 9. Error of

y_{2} (t)

with noise and sine–sine input.

Figure 10. Error of

y_{1} (t)

with noise and sine–square input.

Figure 10. Error of

y_{1} (t)

with noise and sine–square input.

Figure 11. Error of

y_{2} (t)

with noise and sine–square input.

Figure 11. Error of

y_{2} (t)

with noise and sine–square input.

Figure 12. Error of

y_{1} (t)

with noise and square–square input.

Figure 12. Error of

y_{1} (t)

with noise and square–square input.

Figure 13. Error of

y_{2} (t)

with noise and square–square input.

Figure 13. Error of

y_{2} (t)

with noise and square–square input.

Table 1. Parameter estimation of a noiseless system.

$u_{1} (t)$	$u_{2} (t)$	$θ_{1}$	$θ_{2}$	$θ_{3}$	$θ_{4}$	$θ_{5}$	$θ_{6}$	$θ_{7}$	$θ_{8}$	$θ_{9}$	$θ_{10}$	MSE
sine	sine	0.5002	0.8025	1.0003	1.0034	1.0000	0.2006	0.5010	1.0004	1.0018	0.9991	2.351E-06
sine	square	0.5000	0.8000	1.0000	1.0000	1.0000	0.1996	0.4982	1.0182	0.9677	1.0473	0.0003
square	square	0.4973	0.8760	1.0110	1.0031	1.0153	0.2013	0.5072	1.0354	0.9744	1.0840	0.0015

Table 2. Parameter estimation of a noisy system.

$u_{1} (t)$	$u_{2} (t)$	$θ_{1}$	$θ_{2}$	$θ_{3}$	$θ_{4}$	$θ_{5}$	$θ_{6}$	$θ_{7}$	$θ_{8}$	$θ_{9}$	$θ_{10}$	MSE
sine	sine	0.5003	0.8041	1.0005	1.0054	1.0001	0.2008	0.5014	1.0005	1.0016	0.9987	5.342E-06
sine	square	0.5000	0.8001	1.0000	1.0001	1.0000	0.2045	0.5019	1.073	1.1364	1.0898	0.0032
square	square	0.4953	0.8765	1.0085	1.0327	1.0095	0.2969	0.7030	0.9971	1.0007	0.9953	0.0058

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, L.; Ma, D.; Azar, A.T.; Zhu, Q. Neural Computing Enhanced Parameter Estimation for Multi-Input and Multi-Output Total Non-Linear Dynamic Models. Entropy 2020, 22, 510. https://doi.org/10.3390/e22050510

AMA Style

Liu L, Ma D, Azar AT, Zhu Q. Neural Computing Enhanced Parameter Estimation for Multi-Input and Multi-Output Total Non-Linear Dynamic Models. Entropy. 2020; 22(5):510. https://doi.org/10.3390/e22050510

Chicago/Turabian Style

Liu, Longlong, Di Ma, Ahmad Taher Azar, and Quanmin Zhu. 2020. "Neural Computing Enhanced Parameter Estimation for Multi-Input and Multi-Output Total Non-Linear Dynamic Models" Entropy 22, no. 5: 510. https://doi.org/10.3390/e22050510

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Neural Computing Enhanced Parameter Estimation for Multi-Input and Multi-Output Total Non-Linear Dynamic Models

Abstract

1. Introduction

1.1. Literature Survey

1.2. Motivation and Contributions

2. Total Non-Linear Model

3. Gradient Descent Calculation of Parameter Estimation

4. Model Structure Detection

5. Convergence Analysis of the Algorithm

6. Simulation Results and Discussions

7. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI