Review of rational (total) nonlinear dynamic system modelling, identification, and control

This paper is a summary of the research development in the rational (total) nonlinear dynamic modelling over the last two decades. Total nonlinear dynamic systems are defined as those where the model parameters and input (controller outputs) are subject to nonlinear to the output. Previously, this class of models has been known as rational models, which is a model that can be considered to belong to the nonlinear autoregressive moving average with exogenous input (NARMAX) model subset and is an extension of the well-known polynomial NARMAX model. The justification for using the rational model is that it provides a very concise and parsimonious representation for highly complex nonlinear dynamic systems and has excellent interpolatory and extrapolatory properties. However, model identification and controller design are much more challenging compared to the polynomial models. This has been a new and fascinating research trend in the area of mathematical modelling, control, and applications, but still within a limited research community. This paper brings several representative algorithms together, developed by the authors and their colleagues, to form an easily referenced archive for promotion of the awareness, tutorial, applications, and even further research expansion.


Introduction
The nonlinear autoregressive moving average with exogenous input (NARMAX) model set has been extensively studied in theory Haber & Unbehauen, 1990;Sontag, 1978) and gradually adopted in applications (Proll & Karim, 1994;Wang, 1994). There are two main streams of sub-model sets: polynomial and rational models. A polynomial NARMAX model is defined as linear in the parameters and nonlinear in the regression terms, and can be used to represent a wide range of linear and nonlinear systems. The advantage for identification of polynomial models comes from the fact that the model is linear in the parameters. The rational model Sontag, 1979) represents an extension of the polynomial model set and is defined as a ratio of two polynomial expressions. Therefore, rational model is nonlinear in both the parameters and the regression terms. This is induced by the denominator polynomial. Accordingly, the identification and control of rational models are more challenging based on the total nonlinear structures.
Rational models have been gradually adopted in various applications of nonlinear system modelling and control (Ford, Titterington, & Kitsos, 1989;Kambhampati, It should be noted that so far it is still an open research question on how to design rational model-based control systems analytically. It is hoped that the U-model framework explained in this paper will provide a concise analytical solution in future studies. The rest of the paper is organised into the following sections. In Section 2, rational model is defined to provide a framework for the following algorithm development. Subsequently, an extended least squares (ELS) parameter estimator is presented and a full, step-by-step implementation in computation is listed as a guiding procedure for user's applications. In Section 3, a BP algorithm is tailored for rational model parameter estimation together with a step-by-step implementation of the algorithm. In Section 4, orthogonal computation is introduced to provide a solution for rational model structure detection (term selection) and parameter estimation. In Section 5, the latest development of correlation-based model validation procedure is presented for validating identified linear and nonlinear models including neural networks and fuzzy logic models. In Section 6, both U-model and U-control are explained conceptually, in terms of the U-transform, for control system design. In Section 7, a brief conclusion is drawn to summarise the results.

Rational (total) model
Rational or total nonlinear dynamic systems can be mathematically described with a ratio of two polynomials and are sometimes commonly known as rational models: y(t) =ŷ(t) + e(t) = a(t) b(t) + e(t) = a u(t − 1), . . . , u(t − n nu ), y(t − 1), . . . , y(t − n ny ), e(t − 1), . . . , e(t − n ne ) b u(t − 1), . . . , u(t − n du ), y(t − 1), . . . , y(t − n dy ), e(t − 1), . . . , e(t − n de ) + e(t), where y(t) andŷ(t) are the measured and one-step-ahead predicted model outputs, respectively, u(t) is the input, e(t) is the model error, and t ( = 1, 2, . . .) is the time index. Generally, the numerator a(t) and denominator b(t) are functions of past inputs, outputs, and errors, and can be expressed in terms of polynomials: (2. 2) The regression terms p nj (t) and p dj (t) are the products of past inputs, outputs, and errors, such as u(t − 1)y(t − 3), u(t − 1)e(t − 2), y 2 (t − 1), and θ nj , θ dj are the associated parameters. The task of model structure detection is to select candidate regression terms from a pre-assigned large candidate base. The task of parameter estimation, for a given or detected model structure, is to estimate the associated parameters from the measured inputs and outputs. Model validation is the final step to diagnose the feasibility of an identified model if it is a real representative of the underlying system. The correlation-based model validation uses data sequences of input, output, and residual (difference between measured system output and model predicted output) to form a series of correlation functions to determine if the residual has been reduced to an uncorrelated sequence with zero mean and finite variance.
Several remarks relating to the characteristics of the rational model of Equation (2.1) are noted below.
(1) The rational model includes almost all the other smooth linear and nonlinear models as its subsets. For example, the polynomial NARMAX model is a special case of rational model (2.1) by setting denominator polynomial b(t) = 1. Many neuro-fuzzy systems have been expressed as nonlinear rational models (Wang, 1994), for example, fuzzy systems with a centre defuzzifier, product inference rule, singleton fuzzifier, and Gaussian membership function. The normalised radial basis function network is also a type of rational model. When the centres and widths are estimated, this can become a rational model parameter estimation problem.
(2) The rational model can be much more concise than a polynomial expansion, for example, 3) The rational model can be frequently used to represent complicated system structures with a fairly low degree in both the numerator and denominator polynomials. For example, large and quick deviations in a system output can be expressed as When u(t−1) approaches −1, the model output response will be quickly increased. The power to capture quick and large changes is gained by introducing the denominator polynomial.

Bias analysis using the ordinary least squares algorithm
Directly estimating parameters of model (2.1) is difficult in formulation and time consuming in computation even though a prediction error algorithm has been developed . This is because the model is nonlinear in the parameters. An alternative approach is to multiply the denominator polynomial out (2.1) to yield a model expressed as linear in the parameters, then least square algorithms can be applied with proper reformulation. To this end, multiplying the denominator b(t) on both sides of Equation (2.1) and then moving all the terms except the first term y(t)p d1 (t)θ d1 on left-hand side to the right-hand side gives (2.6) To understand the noise effects to the parameter estimation in applying least squares algorithms, from Equation (2.5), in the case of noise free (e(t) = 0), the parameter estimates are unbiased. If noise (even white noise) exists, a very practical situation directly using least square algorithms will give biased estimates; simply the noise-contaminated output appears in both the dependent variable and regression terms associated with the denominator polynomial b(t). This problem will be discussed and resolved in the next section.

ELS parameter estimation
This has been derived into following least squares formulation : whereˆ is the estimate of parameter vector ; σ 2 e is the model error variance; is the regressor matrix and formulated with data length N, ; (2.8) ϒ is the dependent variable vector and composed of is the error matrix associated with T and formulated as (2.10) ψ is the error vector associated with T ϒ and formulated as . (2.11) An iterative procedure has been developed to obtain the unbiased parameter estimates, which is listed below.
Step 1. Use an ordinary least squares algorithm (set to a null matrix and vector ψ in Equation (2.7)) to compute parameter vectorˆ . This estimate provides initial values for subsequent iterative computations.
Step 5. Go back to Step 2 and repeat until the parameter estimate vectorˆ and model error variance σ 2 e converge to constant values.

Statistics of the estimator
The bias and variance of the estimates have been analysed  in case of model error e(k) is reduced to an uncorrelated noise sequence, which are given as follows: is the expectation function. (2.12)

Summary
Expressing the rational model in terms of a linear in the parameters model makes it possible to use least squares algorithms for parameter estimation. On the other hand, the expansion induces inherent correlated errors to give biased estimates while existing linear squares algorithms are applied directly. Therefore, the ELS algorithm has been developed to set up a basis for concise and unbiased rational model parameter estimation from data sequences.

BP parameter estimation
The rational model can be treated as a neural network, as shown in Figure 1, which is structured with an input layer of regression terms p nj (t) and p dj (t), a hidden layer of a(t) and b(t) with linear activation functions, an output layer of y(t) with a ratio activation function. The activation function at the output layer is the division operation of the numerator polynomial divided by the denominator polynomial. Leung and Haykin (1993) presented a rational function neural network. The shortcomings compared with the current study are that Leung and Haykin's network does not have a generalised rational model structure, and correlated errors are not accommodated. Hence Leung and Haykin's parameter estimation algorithm cannot provide unbiased estimates with noise-corrupted data and is essentially a special implementation of the procedure of Billings and Zhu (1991) and Zhu and Billings (1991) in the case of noise-free data.
To derive the parameter estimation algorithm, with reference to model expressions of Equations (2.5) and (2.6), define the error between the measured output and the model output as The task is to determine a set of parameters in model (2.5) so that the squared error in Equation (3.1) is minimised. The parameter determination can be described as parameter training when the model is interpreted as a neural network structure. To train the parameters θ nj associated with the numerator polynomial a(t), the local gradients are given by To train the parameters θ dj associated with the denominator polynomial b(t), the local gradients are given by To carry out recursive computation, a time index k is introduced to the parameter sets θ nj (k) and θ dj (k), as they are updated following the time sequence. Convergence has been proved with lim k→∞ θ nj (k) → θ nj lim k→∞ θ dj (k) → θ dj (Zhu, 2003). Therefore, the parameter variation at time k + 1 can be defined as follows: where η nj and η dj are the learning rates. The integrated parameter estimation procedure can be summarised in terms of the BP computation as follows.
(1) Initialisation Set all model parameters to uniformly distributed random numbers with zero mean and small variance, or set up the initial parameters using the prior information extraction techniques, such as a least squares estimator.

Initialisation
The first step in the above computation is to initialise the parameters. A sensible choice will provide tremendous help for the resultant parameter estimation. A wrong choice of the initial parameters can lead to two critical phenomena, premature saturation, and divergence. In the case of premature saturation, the instantaneous sum of squared errors remains almost constant for some period of time during the learning process. In the case of divergence, the sum of squared errors tends to become very large. A general initialisation technique (Haykin, 1994) is to uniformly randomise the initial parameters with zero mean and small variance. In this study, the initial parameters will be determined using a prior information extraction technique. Since the model structure is assumed to be known in advance, an ordinary or ELS algorithm can be applied to obtain initial estimates. Although biased, they should not be far away from the true parameters in Euclidean distance.

Learning rate
The back propagation estimator algorithm provides an approximation to the trajectory in parameter space computed in terms of the steepest descent. The smaller the learn-ing rate, the smaller the change to the model parameters will be from one iteration to the next and the smoother the trajectory will be in parameter space. However, a small learning rate will possibly cause slower convergence of the parameter estimation. If the learning rate is set too large, the estimator may become unstable. The choice of learning rates is discussed below: (1) Constant: (3.8) The disadvantage is that the learning rate cannot properly adjust in the early stage (large learning rate) and in the later stage (small learning rate) of the learning process.
(2) Inversely decayed sequence: The disadvantage is that the learning rate decays too fast and this may not be suitable when long training data sequences are available.
(3) Generalised delta rule (Jacobs, 1988;Rumelhart, Hinton, & Williams, 1986): 3.10) where α nj and α dj are positive numbers called momentum constants. This has been extensively studied and can increase the learning rate and avoid instability. However, this choice involves more computation and requires the know-how to set up the parameters α nj and α dj . (4) Linearly decayed sequence To overcome the above disadvantages, a new learning rate sequence has been proposed as follows (Zhu, 2003): where L is the number of iteration epoch, N is the training data length, l ( = 0, . . . , L − 1) for the lth epoch, and η 0 η end are positive constants for the initial and end learning rates, respectively. Figure 2 shows the learning rate variation.

Stopping criterion
Two criteria (Kramer & Sangiovanni-Vincentelli, 1989) have been used to stop neural network training. These are formulated by means of local or global minimum of the error surface.
(1) Gradient stopping criterion This states that the BP converges when the Euclidean norm of the gradient vector reaches a sufficiently small gradient threshold. The drawback of this criterion is longer training and complexity in the computation of the gradient vector.
(2) Error measure stopping criterion This states that the BP converges when the absolute rate of change in the mean squared error per epoch is sufficiently small. Typically, 0.01% per epoch is used. The drawback is that the error may still be correlated for non-linear models, even though the mean squared error or its variation rate is very small (Billings & Voon, 1983;Billings & Zhu, 1994b).
To overcome the above drawbacks, a higher order correlation test (Billings & Zhu, 1994b, 1995, introduced for nonlinear model validation, can be used as a stopping criterion. The tests are described in the following: where e 2 (t). (3.13) When the higher order correlation functions, R χe 2 (τ ) and R χu 2 (τ ), satisfy R χe 2 (τ ) = k 0 τ = 0 0 otherwise (3.14) R χu 2 (τ ) = 0 ∀τ, the estimated parameters are considered to be unbiased. Otherwise, the training procedure will continue until the above conditions are satisfied. In practice, the 95% confidence limits, ±1.96/ √ N , are used as the stopping thresholds.

Statistics of the estimator
For the bias, (3.15) Therefore, the estimator will produce unbiased estimates of the nonlinear rational model parameters. For the variance, Var θ nj = E θ nj (k + 1) − E θ nj (k + 1) 2 = σ 2 e σ 2 nj , (3.17) Therefore, the variance of each parameter estimate is associated with the noise variance and the second-order moment of the regression term.

Summary
The BP-based algorithm provides an alternative approach to estimate rational model parameters. Obviously, there is no need to expand rational models into linear in the parameter expressions. Some techniques, such as learning rate and stopping threshold, play important roles in the iterative training and learning process, which need to be carefully considered in the computations.

Orthogonal parameter estimation
Consider an orthogonal transformed expression of Equation (2.5): where Define the orthogonal regression matrix W as where is the regressor matrix defined in Equation (2.8).
There are several typical algorithms to compute the elements of T, such as Gram-Schmidt, Householder, or Givens transforms (Billings & Zhu, 1994a where over-bar denotes time average value. The computation of the elements in Equation (4.6) can be found in Zhu and Billings (1993). Accordingly, the original parameter vector in Equation (2.5) can be recovered by

Structure detection (term selection)
The orthogonal property can be applied to select regression terms and estimate the associated parameters simultaneously. A criterion of error reduction ratio (ERR) has been introduced to select the most important regression terms to contribute to reduce estimation errors from a large candidate term base (Billings & Zhu, 1994a). In definition, individual ERR from numerator and denominator polynomials is calculated, respectively, by (4.8) It should be noticed that the larger the ERR, the more contribution of its regression term to reduce the estimation errors. Therefore, the candidate term with the largest ERR in each selection sequence should be taken in as the model regression terms.
An iterative learning procedure for rational structure detection has been developed as follows.
Step 1. Fit a deterministic rational model (that is no noise model) initially with 3-4 regression terms (choice of the terms comes from experience, actually it is not critical) in numerator and denominator polynomials, respectively. Therefore, the model error sequence e(t) and its variance σ 2 e can be approximately obtained.
Step 2. Set up a series of ascending weights of the model variance σ 2 e (such as 0.1, 0.25, 0.5, 0.75, 0.95, 1, 1, . . . , 1). Set up a cut-off point (COP), which can be determined by trial and error approach, as (4.9) Step 3. Do a search through a pre-setup full rational model candidate term base to select significant terms according to the ERR values and estimate their associated parameters.
Step 4. Repeat Steps 2 and 3 until the computations converge, the preset maximum number of the iterations exceeded, or a specified number of terms are selected in the model.

Summary
This algorithm delivers a more realistic procedure to fit models to data in case of which terms should be included in the models before estimating their associated parameters.

Introduction to correlation-based model validation
Consider a generalised single-input and single-output (SISO) nonlinear parametric model: is the identified nonlinear model. y t−1 , u t−1 , and ε t−1 are measured output, input, and residual (the difference between the measured output and the one step ahead model predicted output) vectors with delayed elements from 1 to r, respectively. It should be noticed that when above model is properly identified, the residual ε(t) should be reduced to a random noise sequence denoted by e(t) in this section (without being confused with those used in other sections) with zero mean and finite variance (Billings & Zhu, 1994a). In other words, the residual sequence from a properly identified model should be uncorrelated to the delayed residuals, inputs, and outputs (Ljung, 1999). It should be noticed that all validation methods developed based on nonlinear models have included all linear model validation as their simplified cases. However, the linear model-based validation test methods can and often do fail when applied to nonlinear model validation.

(5.2)
Combined omnidirectional cross-correlation functions (ODCCF) is defined as follows: In the above definition, where r * * (τ ) denotes correlation function and the prime ( ) in Equations (5.4) and (5.5) denotes that the mean level has been removed from the corresponding data sequence. Then, the validity tests for a properly identified model are derived as Compared with the other correlation test-based methodologies, this approach enhances the power of nonlinear model validity tests and significantly reduces the number of correlation plots. For large N, the correlation function estimates given in Equations (5.2) and (5.3) are still asymptotically normal with zero mean and finite variance in accordance with the central limit theorem (Bowker & Lieberman, 1972). The standard deviations are 1/ √ N and the 95% confidence limits are therefore approximately at ±1.95/ √ N .

Summary
It should be mentioned that the method is not only applicable to rational models, but also for validation of identified neural networks and fuzzy logic models (Zhang, Zhu, & Longden 2009).

Introduction
So far, almost all control system design approaches, no matter if they are linear or nonlinear plant-based, have taken a unique procedure, set up a specified control system performance/index and then by inverting the integrated function of the plant and design performance in some way, obtain the controller output (i.e., the input to the plant).
There is no problem at all with such procedures for all linear plant-based control system design subject to stability considerations, because the linear superposition principle makes the inverse function easily resolved. However, when the plants are subject to nonlinear dynamic equations (particularly non-affine models), the inverse functions are more complicated, intractable, or even impossible with analytical solutions, except step-by-step numerical computations.
Research question 1: Can a class of nonlinear dynamic plants, described by smooth nonlinear models in terms of polynomials, be designed directly using the approaches developed from linear control systems?
Research question 2: Is there a general approach to resolve the inverse function of nonlinear plants?
With such insight, the U-model methodology has been proposed, where the origin, but not in general U-model expression, appeared in one of the authors' Ph.D. thesis (Zhu, 1989). In progression along this route, Zhu, Warwick and Douce (1991) further proposed to use a Newton-Raphson algorithm (Gerald, 1978) root solver to obtain the controller output, which makes the U-model type of control system feasible. The first time the U-model was named in a study of pole-placement controller design for nonlinear plants was by Zhu and Guo (2002), which is a simple mapping from ordinary linear and nonlinear difference equations to timevarying polynomials in terms of the plant input u(t − 1) (i.e., the controller output). The U-model covers almost all existing smooth nonlinear discrete time models as subsets.
Since then, within the first decade, the U-model has been used in designing feedforward control of multi-input and multi-output nonlinear systems (Ali, Fouad, Muhammad, & Jamil, 2010), nonlinear leaking minimum square algorithm for inverse adaptive control (Butt, Muhammad, & Tahir, 2005), minimum square error IMC (internal model control) (Muhammad & Haseebiddon, 2005), and general predictive control of nonlinear plants (Du, Wu, & Zhu, 2012) have also been investigated. It should be noted that the U-model associated publications are still in a very beginning stage. These have had no rigorous analytical description and top journal publication till today.
The other research question: How is the U-model design procedure different from classical procedures in terms of efficiency and effectiveness?
The questions have been the justification for initiating the study. Xu, Zhu, Zhao and Li (2013) have presented a comprehensive survey for the first decade of development on U-model-based control system design.

U-model
Consider a general SISO discrete time dynamic plant described by the following polynomial model (P-model): − 1), . . . , y(t − n), u(t − 1), . . . , u(t − n)), where y(t) ∈ R and u(t − 1) ∈ R are the output and input signals of the plant, respectively, at discrete time instant t(1, 2, . . .) and n is the plant order. P ( * ) = P (y(t − 1), . . . , y(t − n), u(t − 1), . . . , u(t − n)) ∈ R L+1 is the regression variable vector spanned from the delayed outputs and inputs, and = [θ 0 · · · θ L ] ∈ R L+1 is the associated parameter vector. Function f(·) is a smooth linear or nonlinear function. The P-model can be further expressed in terms of regression equation as below: where the regression terms p l (t) are the products of past inputs and outputs such as u(t − 1)y(t − 3), u(t − 1)u(t − 2), y 2 (t − 1), and θ l are the associated parameters. Typically, for example, linear time-invariant difference equationbased plant models and NARMA (nonlinear autoregressive moving average) models.
The U-model is defined as, under a U-mapping from P-model, a control-oriented model expression below: Correspondingly, its regression equation is given as below: This is expanded from the above nonlinear function f(.) as a polynomial with respect to u(t − 1), where M is the degree of model input u(t − 1), parameter λ j (t) is a function of past inputs, outputs (u(t−2), . . . , u(t−n), y(t−1), . . . , y(t−n)), and the parameters (θ 0 . . . θ L ).

U-control -U-model-based control system design procedure
Define the desired plant output as U (t), it is clear that the setting, then λ j (t)u j (t − 1). (6.4) Figure 3. A general U-model-based closed-loop control system.
Accordingly, the task of the design is to determine the desired plant output U (t) according to a specified performance index, for example, pole placement control (PPC) (Zhu & Guo, 2002), RU (t) = T w(t) − Sy(t) and general predictive control (GPC) (Du, Wu, & Zhu, 2012), , and then by resolving one of the roots of Equation (6.4) to obtain the controller output. The block diagram, Figure 3, shows a general U-modelbased closed-loop control system structure. It should be noted that so far it is still an open research question of how to use the U-model approach to design rational model-based control systems. Therefore, fundamental research should be performed in the future research development.

Summary
Design of nonlinear dynamic control systems has been widely recognised as a challenging issue. The key point in the design is a general model prototype with conciseness, flexibility, and manipulability, while keeping little loss of precision. This is the origin of the U-model with insight and procedure. In terms of time-varying parameter polynomials, the U-model almost covers all existing smooth nonlinear discrete time models as its subsets. Based on the U-prototype, without complex transformation to its original model and no need for linearisation at all, a nonlinear control system can be directly designed by linear control system design approaches, such as PPC, GPC, sliding mode control (SMC), and so on. Even for linear control system design, the U-model provides another procedure for solutions.
The major contribution of the U-model-based design procedure can be listed in the following order.
(1) In methodology, those well-known approaches developed from linear systems can be directly applied to nonlinear control system design, which significantly reduces the design complexity and effectively provides straight-forward computational algorithms.
(2) In design, it obtains desired plant output first (compared to designing desired controller output in classical framework) and then works out the controller output from U-model in a relaxed root-resolving routine (compared to resolving complex solutions from the inverse of the whole designed systems).
(3) For linear control system design, it provides new insight and solutions within a more general and effective frame work.

Conclusions
With regard to the presented algorithms, the ELS algorithm sets up a solid and concise basis for expanding this classical approach into nonlinear rational model term selection and parameter estimation. The insight presented in the study could be a useful indication to stimulate other complex model identification with proper extension of classical least squares algorithms. The BP algorithm provides a different angle to study the model parameter estimation issues. However, the model structure detection via PB computation (i.e., regression term selection) still remains unresolved. The OLS algorithm provides a step-by-step procedure to select model regression terms and estimate their associated parameters simultaneously. The new model validation method provides a generic routine to effectively examine the model validity with significantly reduced correlation plots. In control system design, the U-model and hence U-control provide a new procedure from a classical framework, which could provide a concise solution for rational model-based control systems design in the future studies. It should be noted again that the structure of the rational model brings various advantages, which cannot be easily contributed through polynomial model sets. On the other hand, the rational model presents a number of new challenges for research, demonstration, and applications.