Gradient-Based Particle Filter Algorithm for an ARX Model With Nonlinear Communication Output

—A stochastic gradient (SG)-based particle ﬁlter (SG-PF) algorithm is developed for an ARX model with non-linear communication output in this paper. This ARX model consists of two submodels, one is a linear ARX model and the other is a nonlinear output model. The process outputs (outputs of the linear submodel) transmitted over a communication channel are unmeasurable, while the communication outputs (outputs of the nonlinear submodel) are available, and both of the two-type outputs are contaminated by white noises. Based on the rich input data and the available communication output data, a SG-PF algorithm is proposed to estimate the unknown process outputs and parameters of the ARX model. Furthermore, a direct weight optimization method and the Epanechnikov kernel method are extended to modify the particle ﬁlter when the measurement noise is a Gaussian noise with unknown variance and the measurement noise distribution is unknown. The simulation results demonstrate that the SG-PF algorithm is effective.

computational efforts and cannot be used to estimate the parameters recursively by new data, while the RLS and SG algorithms can be used as online algorithms and can update the parameters based on new data [12]. Furthermore, these online algorithms have less computational efforts, thus widely used in engineering application.
The online algorithms often have the assumption that all the system data are available. However, in engineering practice, some data of the systems may be missing or unmeasurable. The auxiliary model is an important tool which is usually applied for systems with missing data [13]. Its basic idea is to replace the missing data with the outputs of an auxiliary model. For example, Wang and Ding [14] studied an auxiliary model-based RLS algorithm for multivariable systems. Jin et al. [13] investigated an auxiliary model-based identification algorithm for multivariable OE-like systems with missing outputs. Such methods only use an auxiliary model to predict the missing outputs, while the measurable outputs are not applied to improve the predicted outputs, which makes the auxiliary model have slow convergence rates.
With the development of communication networks, control systems often use network to transmit the sampled outputs via a communication channel [15]- [17]. However, the complex network and laboratory analysis process also bring some identification constraints, such as the packet dropouts, the communication time-delays, the nonlinear signals and the unmeasurable sampled outputs [18]- [20]. There exist many identification methods for systems with such constraints. For example, Guo et al. [21] proposed an identification algorithm for FIR systems, in which the outputs are constrained by both binary-valued quantization and communication unreliability. Xiong et al. [22] investigated an EM algorithm for nonlinear systems with unmeasurable outputs, in which those unmeasurable outputs were estimated by an auxiliary model. Xie et al. [23] developed an EM algorithm for multirate systems with random time-delays, where the output model is linear and the process model is not influenced by process noise, by using an auxiliary model and an offline algorithm, the missing outputs and the parameters were estimated simultaneously.
The Kalman filter and the particle filter are two effective tools which are often applied for state-space systems [24]- [26]. In [27], a multi-innovation gradient algorithm is developed for a linear state-space system with time-delay, in order to improve the convergence rate, a Kalman filtering-based multi-innovation gradient algorithm is proposed. In [3], an EM-based particle filter algorithm is proposed for nonlinear parameter varying state-space systems, where the proposed EM algorithm is an offline algorithm. Unlike the auxiliary model, the Kalman filter and the particle filter apply the measurable data to improve the priori estimates, which makes them be more accurate. However, these two filters are often used for state-space systems.
This paper takes the above described literature into study and develops an SG-based particle filter (SG-PF) algorithm for an ARX model with nonlinear communication output. The considered system in this paper is a nonstate-space system with different kinds of measurement noises, and the proposed algorithm is an online algorithm. A particle filter is derived to estimate the process outputs by the communication outputs. Then based on the estimated process output data and the rich input data, the unknown parameters can be estimated by an SG algorithm. The problem discussed in this paper is more complicated and challenging than those in [3], [4], and [23] and the main contributions are listed below.
1) Study an online identification method for an ARX model which consists of two submodels, one is an equationerror model to reflect process dynamics, and the other is an equation reflecting nonlinear output. 2) Propose a particle filter instead of an auxiliary model for nonlinear nonstate-space systems, which can estimate the unknown variables and can increase the estimation accuracy. 3) Apply the direct weight optimization method and the Epanechnikov kernel method to modify the particle filter for the situations that the measurement noise with unknown variance and unknown distribution. Briefly, the rest of this paper is organized as follows. Section II presents an SG-PF algorithm. Section III develops two modified particle filters. In Section IV, three illustrative examples are provided. Section V concludes this paper and gives future directions.

II. SG-BASED PARTICLE FILTER ALGORITHM
For the model depicted in Fig. 1, define the information vector ϕ(t) and the parameter vector θ as . . , a n , b 1 , . . . , b m ] T ∈ R n+m then the ARX model can be rewritten as follows: In (1), the information vector contains unknown process outputs x(i), i = 1, 2, . . . , thus the classical SG algorithm cannot be applied to (1) directly. The auxiliary model-based algorithms are often used to identify systems with unmeasurable variables [13], [14], [28]. For example, the corresponding auxiliary model-based SG algorithm for the system model in (1) and (2) is as follows: in whichθ (t − i − 1) is the estimate of θ at t − i − 1. However, here the SG-AM algorithm cannot be applied to deal with this model. Remark 1: Unlike the models in [13], [14], and [28], all the process outputs in this ARX model are unknown, and if we usex(t) =φ T (t)θ (t − 1) to replace x(t), then (t) in (6) becomes zero, thus the SG-AM algorithm proposed in (3)- (7) is invalid for this ARX model.
The key characteristic of our treatment is to develop a particle filter-based online algorithm for this ARX model.

A. Particle Filter
The particle filter is a tool for nonlinear state-space systems with unmeasurable states. In what follows we will extend this tool to nonlinear nonstate-space systems.
The basic idea of the particle filter is to use a series of particles with associated weights to approximate the posterior density function. Based on the particle filter method in [3] and [29], the posterior density function p( where ω j is the normalized weight associated with the jth particle and δ(·) denotes the Dirac delta function. Once the particles and their weights have been obtained, we can compute any desired statistical measure of the posterior density function. For example, the mean of the posterior density function can be computed asx is the weight associated with the jth particle and However, it is difficult to draw particles from this posterior density function directly. In order to get the new particles, an importance density q(·) is given by From (1) and (10), one gets Remark 2: If the variance ς is unknown, then the density function in (11) would not be computed, and the particles could not be updated. In this scenario, the auxiliary model can be applied to draw the particles, e.g., From (11), we can get the new particles. The next is to update the weight of each new particle. Based on [3], the weight ω j (t) can be updated as follows: According to (2), the density function p(y(t)|x j (t)) can be computed by Substituting (13) into (12) gets Normalizing Then we havex When apply the particle filter to estimate the unknown variables, the degeneracy phenomenon is an inevitable problem [30]. After a few iterations, the weights of some particles become small, which means that heavy computational efforts are utilized to update those particles whose contributions are negligible. In order to avoid this phenomenon, a sample size N eff is defined as Given a threshold N hold in prior, N eff < N hold means severe degeneracy. Thus we should use resampling, and then the weight of each particle is assigned asω j = (1/N). The particle filter proceeds by performing the following steps.
1) Initialization: At time t, collect the input-output data {u (1), 2) Let the process noise variance be ς 2 and the measurement noise variance be σ 2 , and assign a positive number N hold . (15). 7) Compute the missing outputx(t − i) by (16). 8) Compute N eff by (17) and compare N eff with N hold , if N eff < N hold , reset the weight of each particle withω j (t− i) = (1/N) and go to next step; otherwise, go to next step. 9) Let i = i − 1, if i ≥ 0 go to step 4; otherwise, terminate the procedure.

B. Identification Algorithm
In order to estimate x(t), two methods are discussed here. One is to keep the estimates of the unknown process outputs Draw N particles from the following importance density function: . . .
From (18), we can see that only one process output data Unlike the first method, in order to get the parameter estimates at t−i, the second method is to estimate all the unknown process outputs up to and including time t − i by using the parameter estimateθ (t − i − 1).
Draw N particles from in whichx . . .
From (16), we can obtain the estimatesx(t−i). Then according to the estimates The density function in (19) declares that one should estimate t − i process outputs at each sampling time t − i − 1.
Remark 3: The estimates of the process output data by using the second method are more accurate than those by using the first method. However, compared with the first method, the second method has heavier computational efforts.
In order to estimate the unmeasurable data more accurately, we use the second method in this paper. Then the SG-PF algorithm is given bŷ The SG-PF algorithm proceeds by performing the following steps.
ε, then obtain the estimated parameter vectorθ (t); otherwise, let t = t + 1 and go back to step 3. Remark 4: From the SG-PF algorithm in (20)-(23), we can see that the unknown process output is estimated by (22), obviously,x(t) −φ T (t)θ (t − 1) = 0. Thus the SG-PF algorithm can be utilized to estimate the unknown parameters.

III. TWO MODIFIED PARTICLE FILTERS
In the above section, we assume that the measurement noise is a Gaussian noise with known variance. However, in engineering practice, the variance of the Gaussian noise is often unknown. Moreover, the distribution of the measurement noise may be unknown. In this section, we consider the cases that the measurement noise is a Gaussian noise with unknown variance and the measurement noise distribution is unknown, and develop two modified particle filters to estimate the process outputs.

A. Gaussian Measurement Noise With Unknown Variance
Since the variance of the measurement noise is unknown, the weight w j (t) in (14) cannot be computed directly. In order to circumvent this difficulty, we introduce a direct weight optimization method to update the weight of each particle.
Draw N particlesx j (t), j = 1, . . . , N from (19) and define The next is to find the weights λ j (t), j = 1, . . . , N so that the nonlinear function f (x(t)) can be approximated by Based on the direct weight optimization method in [31], the weights λ j (t), j = 1, . . . , N can be obtained by minimizing the following function: Define Taking the conditional expectation on (24) yields Let

t)e(t) .
According to the direct weight optimization method proposed in [31], we can find the weights to make sure that Prob(z ≥ γ (t)) is minimized. In this paper, we choose γ (t) as The choice of γ (t) can keep the number of particles unchanged and can prevent sample impoverishment. Since the measurement noise e(t) is a Gaussian noise, the density function of z can be expressed as Minimizing Prob(z ≥ γ (t)) is equivalent to .
Then the weight can be computed by Substituting (27) into (12) gets and the normalized weight can be expressed as Remark 5: Equations (27) and (28) imply that the weight of each particle can be obtained without the knowledge of the variance of the measurement noise.

B. Measurement Noise With Unknown Distribution
When the distribution of the measurement noise is unknown, both density function in (13) and (26) cannot be obtained. In order to solve this problem, we use the Kernel density estimation method to update the weight of each particle. Rewrite Since f (·) is a continuous function, we can conclude that the smaller the γ j (t) is, the more important role of the jth particle at time t in estimating the true output x(t) plays. The Epanechnikov function is a significant type of kernel function and is optimal in a mean square error sense.
Here, the Epanechnikov function is introduced to estimate p(y(t)|x j (t)) [32]. Since then the density function is given by By normalizing the density function, one has Substituting (29) into (12) gets Normalizing ω j (t) yields .
It means that the sth particle plays a more important role in estimating the estimatex(t) than the lth particle. Theorem 1: Assume that p(y(t)|x j (t)) is expressed by (29) and the distribution of the measurement noise is unknown, then the following inequality holds: The detailed derivation is given in Appendix A. Theorem 1 shows that the estimation accuracy can be improved by using the updated weights.
Remark 7: When the measurement noise is a Gaussian noise with unknown variance, the Kernel density estimation method also can be applied to update the weight of each particle. On the other hand, the direct weight optimization method cannot be utilized to update the weight of each particle when the distribution of the measurement noise is unknown.
Theorem 2: Let λ d j (t) be the weight updated by (27), λ k j (t) be the weight updated by (29). Assume that the measurement noise is a Gaussian noise with unknown variance, then The proof is given in Appendix B. Theorem 2 means that the direct weight optimization method is better than the Kernel density estimation method when the measurement noise is a Gaussian noise with unknown variance.

IV. EXAMPLES
Example 1: Consider the following ARX model: in which x(t) is unmeasurable, y(t) is measurable, the variances of {v(t)} and {e(t)} are both 0.10 2 . In simulation, we choose 100 particles.
Apply the SG-PF algorithm to estimate the parameters of the ARX model. The simulation data of the inputs, process outputs, and nonlinear communication outputs are shown  in Fig. 2 (samples 1-200). The parameter estimation errors τ := θ − θ / θ versus t are depicted in Fig. 3, and the parameter estimates and their errors are presented in Table I. The estimated process outputs, the true process outputs, and their errors are shown in Fig. 4 (samples 800-900).
Example 2: Consider the ARX model in Example 1 The variance of {v(t)} is 0.10 2 , while the distribution of {e(t)} is unknown. In simulation, we also choose 100 particles.
Apply the Epanechnikov function to improve the weights of the 100 particles and the SG-PF algorithm to identify the ARX model. The estimation errors τ versus t are depicted in Fig. 5, and the parameter estimates and their errors are presented in Table II. The estimated process outputs, the true process outputs and their errors are shown in Fig. 6 (samples  800-900).    Tables I and II witness that the parameter estimation errors through the SG-PF algorithm with known measurement noise are smaller than those through the SG-PF algorithm with unknown measurement noise. Example 3: Consider an experiment setup of a water tank system in Fig. 7, where u(t) is the valve opening, and x(t) is the liquid level. There is a pressure sensor at the bottom of tank 2 which can transmit the liquid level of tank 2 over a communication network. The measurable output is y(t). The system can be expressed by the following ARX model:   and we manually amplify the output as This kind of nonlinear function f (x(t)) = x(t) + 0.2x 2 (t) can be seen in [22] and [33]. In simulation, the input {u(t)} is a filtered random binary signal sequence, {v(t)} and {e(t)} are Gaussian white noise sequences and both of their variances are 0.10 2 . In simulation, we use 50 particles.
The simulation data of the inputs, process outputs and nonlinear outputs are depicted in Fig. 8 (samples 1-200). The estimation errors τ versus t are depicted in Fig. 9, and the parameter estimates and their estimation errors are presented in Table III. The estimated process outputs, the true process outputs and their errors are shown in Fig. 10 (samples 800-900).   From Fig. 9 and Table III, we can conclude that as t increases, the parameter estimation errors approach zero. From Fig. 10, we find that the estimated process outputs can track the true unmeasurable outputs well.

V. CONCLUSION
An SG-PF algorithm is proposed for an ARX model mixed with a nonlinear communication output model in this paper. This model consists of two submodels, one is an ARX model and the other is a nonlinear communication output model. Since the outputs of the ARX model are unmeasurable, a particle filter is utilized to estimate these unknown outputs. Then based on the measurable input data and the estimated process output data, the unknown parameters of the ARX model can be estimated by the SG algorithm. Furthermore, two modified particle filters are developed for the ARX model with the assumptions that the measurement noise is a Gaussian noise with unknown variance and the measurement noise distribution is unknown.
The purpose of this paper is to develop an online identification method for systems with nonlinear communication output. There are still some interesting topics not discussed in this paper. For example, if the structure of the nonlinear function is unknown, how to adjust the weights of the particles? Another topic is how to prove the convergence properties of the SG-PF algorithm. These topics will remain as open issues in future.
According to (29), one can get Since the denominator of the last term of the above equation is greater than zero, one only need to consider the following equation: Clearly, the following inequality holds: The weight which updated by the direct weight optimization method is expressed as while the weight updated by the Kernel density estimation method is written by .