Parallel computations using multithreaded processors
¬¬¬¬Abstract
The Kalman Filter and its variants have been highly successful in numerous applications in technology. However, the kalman filter is under heavy computations burden. Under big data, it becomes pretty slow. On the other hand, the computer industry has now entered the multicore era with hardware computational capacity increased by adding more processors (cores) on one chip, the sequential processors will not be available in near future, so we should have to move to parallel computations.
This paper focuses on how to make Kalman Filter faster on multicore machines, our Parallel Kalman Filter can achieve nearly linear speed-up.
Section (1)
Introduction
The Kalman filter (KF) is one of the most widely used state estimator which is an optimal recursive data processing algorithm. "Optimal "Here means it minimizes error in some respect and "recursive" means that, it does not require all previous data to be kept in storage and reprocessed every time a new measurement is taken.
Applications of Kalman filter include spacecraft orbit determination, target tracking, vehicle guidance and navigation, geophysical subsurface estimation, demographic modelling and industrial processes control.
Practical implementation of the Kalman filtering with a huge number (n) of variables requires an expensive operational time.
The arithmetic operations required for implementing the Kalman filter with (n) state variables can be approximated to O(n^3) [1] , for each state updated if the implementation is serial (sequential), This time complexity is not suitable for many applications, So that parallel processing seems to be a means of easing this computational bottleneck.
In this paper we propose a natural parallel kalman filter based on SIMD model witch we spread all the (n) states on various number of cores.
The algorithm was produced on MathWorks MATLAB environment Version R2016b (9.1.0.441655) – 64bit, Using "Parallel Computing Toolbox" version 6.9. It was tested on 8-cores server machine – Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz. On Microsoft Windows NT10.0 (Windows server 2016) – 64bit
The rest of this paper is organized as follows. In this section (2), the estimated process and computational origins of kalman filter equations are presented, in section (3), some concepts of parallel computations are proposed, a brief summary of sound model and how kalman filter in 1-D can readily be applied for removing the noise from sound wave are given in section (4), the algorithm of Kalman Filter in sequential form is described in section (5), in section (6), the utilization of Kalman Filter equations in parallel form is provided And the parallel implementation is in section (7), Results are provided in section (8), finally the conclusion is given in section (9).
Section (2)
The Computational Origins of kalman filter
The Kalman filter is named after Kalman .R.E, who in 1960 published his famous paper describing a recursive solution to the discrete-data linear filtering problem [2].
In that paper, Kalman addresses the general problem of trying to estimate the state x ∈ R^n of a discrete-time controlled process that is governed by the linear stochastic difference equations
x_k=Ax_(k-1)+Bu_k+ w_(k-1) (1)
With the measurement z ∈ R^m that is
z_k=Hx_k+v_k (2)
Where
z_k is an observation at time k
x_k is the system state at time k
A,B,H are linear scaling
The random variable w_k is process noise
The random variable v_k is measurement noise
They are assumed to be independent of each other, white, and normal probability distribution.
In practice, the process noise covariance Q and measurement noise covariance R are assumed to be constant, and with normal probability distribution
p(w) ~ N(0,Q) (3)
p(v) ~ N(0,R) (4)
The n×n matrix A in the difference equation (1) relates the state at the previous time step k-1 to the state at the current step k, in the absence of either a driving function or process noise. Note that in practice A might change with each time, but it is assumed here to be constant.
The n×1 matrix B relates the optional control input u∈R^1 to the state x.
The m×n matrix H in the measurement equation (2) relates the state to the measurement z_k . In practice H might change with each time step, but it is assumed to be constant
Define x ̂_k^-∈R^n to be our apriori state estimate at step k given knowledge of the process prior to step k, and x ̂_k∈R^n to be our a posteriori state estimate at step k given measurement z. The " 〖^-〗 " superscript denotes that the estimate is apriori, the next timeline showing apriori and a posteriori state estimates and estimation error covariances
So we can then define apriori and a posteriori estimate errors as
e_k^- ≡ x_k- x ̂_k^- (5)
e_k≡ x_k- x ̂_k (6)
The apriori estimate error covariance is then
P_k^-=E[e_k^- e_k^(-T)] (7)
and the a posteriori estimate error covariance is
P_k=E[e_k e_k^T] (8)
In deriving the equations for the Kalman filter, the beginning will be with the goal of finding an equation that computes an a posteriori state estimate x ̂_k as a linear combination of an apriori estimate x ̂_k^- and a weighted difference between an actual measurement z_k and a measurement prediction Hx ̂_k^- as shown in equation (9).
x ̂_k= x ̂_k^-+K(z_k-Hx ̂_k^-) (9)
The difference (z_k-Hx ̂_k^- ) in equation (9) is called the measurement innovation, or the residual. The residual reflects the discrepancy between the predicted measurement Hx ̂_k^- and the actual measurement z_k . A residual of zero means that the two are in complete agreement.
The n×m matrix K in equation (9) is chosen to be the gain or blending factor that minimizes the a posteriori error covariance equation (8). This minimization can be accomplished by first substituting equation (9) into the above definition for e_k , substituting that into equation (8), performing the indicated expectations, taking the derivative of the trace of the result with respect to K, setting that result equal to zero, and then solving for K. [3]. One form of the resulting K that minimizes equation (8) is given by
K_k= P_k^- H^T (HP_k^- H^T+R)^(-1) (10)
= (P_k^- H^T)/(HP_k^- H^T+R)
Equation (10) represents Kalman gain, the weighting by K is that as the measurement error covariance R approaches zero, the actual measurement z_k is “trusted” more and more, while the predicted measurement Hx ̂_k^- is trusted less and less. On the other hand, as the apriori estimate error covariance P_k^- approaches zero the actual measurement z_k is trusted less and less, while the predicted measurement Hx ̂_k^- is trusted more and more
The justification for equation (9) is rooted in the probability of the apriori estimate Hx ̂_k^- conditioned on all prior measurements z_k (Bayes’ rule) [4]. For now let it suffice to point out that the Kalman filter maintains the first two moments of the state distribution
E[x_k]= x ̂_k (11)
E[(x_k- x ̂_k)(x_k- x ̂_k )^T ]=P_k (12)
The a posteriori state estimate equation (9) reflects the mean (the first moment) of the state distribution— it is normally distributed if the conditions of equation (3) and equation (4) are met. The a posteriori estimate error covariance equation (8) reflects the variance of the state distribution (the second non-central moment).
The Discrete Kalman filter algorithm estimates a process by using a form of feedback control: the filter estimates the process state at some time and then obtains feedback in the form of (noisy) measurements. As such, the equations for the Kalman filter fall into two groups: time update equations and measurement update equations. The time update equations are responsible for projecting forward (in time) the current state and error covariance estimates to obtain the apriori estimates for the next time step. The measurement update equations are responsible for the feedback—i.e. for incorporating a new measurement into the apriori estimate to obtain an improved a posteriori estimate
The time update equations can also be thought of as predictor equations, while the measurement update equations can be thought of as corrector equations. Indeed the final estimation algorithm resembles that of a predictor-corrector algorithm for solving numerical problems
The specific equations for the time and measurement updates are presented below in table (1) and table (2)
Table (1) : Discrete Kalman filter time update equations
x ̂_k^-=Ax ̂_(k-1)+Bu_k (13)
P_k^-=AP_(k-1) A^T+Q (14)
Table (2) : Discrete Kalman filter measurement update equations
K_k= P_k^- H^T (HP_k^- H^T+R)^(-1) (15)
x ̂_k= x ̂_k^-+ K_k (z_k-Hx ̂_k^- ) (16)
P_k=(I- K_k H)P_k^- (17)
Again notice how the time update equations in table (1) above project the state and covariance estimates forward from time step k-1 to step k. A and B are from equation (1), while Q is the process noise covariance from equation (3)
The first task during the measurement update is to compute the Kalman gain K_k , Notice that the equation given here as equation (15) is the same as equation (10).
The next step is to actually measure the process to obtain z_k , and then to generate an a posteriori state estimate by incorporating the measurement as in equation (16). Again equation (16) is simply equation (9) repeated here for completeness.
The final step is to obtain an a posteriori error covariance estimate via equation (17).
After each time and measurement update pair, the process is repeated with the previous a posteriori estimates used to project or predict the new apriori estimates. This recursive nature is one of the very appealing features of the Kalman filter, it makes practical implementations much more feasible than an implementation of other filters which is designed to operate on all of the data directly for each estimate. The Kalman filter instead recursively conditions the current estimate on all of the past measurements. [5]
Section (2)
Parallel computation
Any computer, whether sequential or parallel, operates by executing instructions on data. A stream of instructions (the algorithm) tells the computer what to do at each step. A stream of data (the input to the algorithm) is affected by these instructions. Depending on whether there is one or several of these streams, we can distinguish four classes under computation:
(SISD)-Single Instruction stream, Single Data stream.
(MISD)-Multiple Instruction stream, Single Data stream.
(SIMD)-Single Instruction stream, Multiple Data stream.
(MIMD)-Multiple Instruction stream, Multiple Data stream.
In (SIMD) model, a parallel computer consists of N identical processors, each of the N processors possesses its own local memory where it can store both programs and data, all processors operate under the control of a single instruction stream issued by a central control unit. Equivalently, the N processors may be assumed to hold identical copies of a single program, each processor's copy being stored in its local memory. There are N data streams, one per processor. [6]
We can classify the (SIMD – Single Instruction stream, Multiple Data stream) model according to the communications between processors to
Shared Memory (SM) SIMD computers
Interconnection networks
Now the computer industry has entered the multicore era with hardware computational capacity increased by adding more processors (cores) on one chip. All major manufactures of processor chips have moved to a multicore (many core) design and sequential processors will not be available already in near future. Almost any personal computer and some a lot of phones bought today contain two or more processors, and the number of processors is predicted to drastically increase in the near future
As shown in the next figure,
Figure (1)
In shared-memory multicore architecture, there are several CPUs that can access a shared RAM, however each CPU has a private cache memory and All CPUs share a common bus to communicate to the RAM. The figure show the fact that two processors can block memory accesses for each other since only one processor at the time can access the bus.
Performance of a parallel program can be measured in various ways.
Execution time
Scalability when the number of processors is increased
Scalability when the problem size grows
Amdahl’s law tells how the execution time is scaled when the number of processors is increased (problem size stays constant):
S_p= (W_p+ W_r)/(W_p+W_r/P)
Where
S_p is the ratio of the execution times with one and P processors.
W_(p ) is the execution time of the sequential part of the program.
W_r is the execution time of the parallel part when using one processor.
By normalizing the one-processor execution time to one (W_p+ W_r=1) we get
S_p= 1/(∝ +(1- ∝)/p)
Where (W_p= ∝) and (W_r=1- ∝)
Performance can also be measured by efficiency which is defined as
e= S_p/P
In the ideal case (e=1)
In Gustafson’s law we keep the execution time constant and increase the problem size
S_p= (W_p+ 〖P.W〗_r)/(W_p+ W_r )
By normalizing (W_p+ W_r=1 ,∝^'= W_p) we get the law in form
S_p=P- ∝^' (P-1)
Section (3)
Sound model and kalman filter in 1-D
Sound can be represented by an autoregressive (AR) process, thus the sound signal at k-th time instant is given by
S(k)= a_1 s(k-1)+⋯+ a_p s(k-p)+u(k). (1)
So it can be represented by the state-space model as shown below
X(k)= A X(k-1)+G u(k) (2)
Where
X(k) is a state vector
A is the state transion matrix
G is input matrix
Which are defined as follows
X^T (k)=[s(k-p+1),…,s(k-1),s(k)] (3)
A=[0,I;-a_p ,…………,-a_1] (4)
G=[0……………1] (5)
When only the noise corrupted signal y(k) is available, the observation process can be written in the following form
y(k)=s(k)+n(k) (6)
This equation can be written in the matrix form as follows
y(k)=H X(k)+n(k) (7)
Where matrix H relates the state to the observation and it is similar to matrix G , given by
H=[0……………1] (8)
The noise sequences {u(k)} and {n(k)} are zero mean white noise processes and are uncorrelated. The observation noise n(k) is also uncorrelated to the state vector. For all k and l , we can write
E{u(k)}=0 E{u(k)u(l)}=Q(k)δ_k (9)
E{n(k)}=0 E{u(k)n(l)}=R(k)δ_k (10)
E{u(k)n(l)}=0 E{X(k)n(l)}=0 (11)
It is also assumed that initial estimate of X is X ̂(0)= X_0 and is unbiased i.e.
E{X(0)- X_0 }=0 (12)
The initial estimate of the error covariance matrix P_0 is known from the following relation
P_0= E{[X(0)- X_0 ][X(0)- X_0 ]^T } (13)
The state and observation equations (3) and (7) clearly suggest that Kalman filter can readily be applied for an estimate of the state-vectorX(k). [7]
Section (4)
Kalman filter equations in sequential form
The kalman filter algorithm in sequential form is illustrated as follows
The equations from line (4) to line (6) in previous algorithm is written as
K_t= (P_(t-1)* H_t^T)/(H_t* P_(t-1)*H_t^T+R )
X_t= X_(t-1)+ K_t (z_t- H_t* X_(t-1))
P_t=P_(t-1)- (P_(t-1)* H_t^T* H_t* P_(t-1))/(H_t* P_(t-1)* H_t^T+R)
To minimize the computations redundancy in equations from (1) to (3), we calculate the following terms at first
C_t= P_(t-1)* H_t^T (1)
b_t= H_t* C_t (2)
d_t=R_t+ b_t (3)
Next, we complete the algorithm by compute
K_t= C_t / d_t (4)
P_t= P_(t-1)-((C_t* C_t^T ))/d_t + Q_t (5)
S=K_t* H_t^T (6)
W= K_t* z_t (7)
X_t=(I-S)* X_(t-1)+W (8)
We iterate this algorithm including all previous steps, equations from 1-8 according the number of points want to be estimated
Section (5)
Utilizing Kalman filter in parallel form
Let M be the number of CPUs (Cores) used for this parallel utilization, The parallelization can be achieved by splitting the i-loop which is equal the number of iterations in equally large segments of size N/M, and letting each CPU (Core) process just one of these segments.
We use the function d = distributed(originalSignal) to split the all N points in equally segments.
Also use the functions spmd and L = getlocalpart(d) to apply Kalman Filter algorithm simultaneously on each segment.
To calculate the first and last indexes of each segment, we use the following statements.
segment = ceil(length(x)/numlabs)
startIndex = 1 + segment * (labindex-1)
endIndex = min(segment * (labindex),length(x))
Where numlabs is the function that returns the total number of cores operating in the parallel pool and labindex returns the ID for this core.
After that, we merge the filtered points again in one large signal.
Figure (2)
Section (6)
Implementation
We implement the Kalman Filter in parallel form to denoise a sound wave in 1-D, with N = 220500 points length
Our program is read a sound wave and add some random noise to the original signal as follows
[x ,fs] = audioread ('sound.wav');
x = x(44100*10∶ 44100*15);
x = x(1∶ 220500);
v = 1/10 * randn(size(x));
orig = x + v;
The number of iterations is equal to the number of points N, the program initials are P = 0.1 * eye(m),R = 0.1,Q = 0.0001 * eye(m,m) where m is the matrix size
We run our parallel program on various number of cores, 2-8 cores.
Section (7)
Results
It can be seen from table (3) that the computation time decrease when we increase the number of cores from 2 to 8 cores and table (4) shows that Kalman Filter can achieve nearly linear speed-up by increasing the number of cores, both results are illustrated in figures (3) and (4) consecutively.
Table (3)
Cores 1 2 3 4 5 6 7 8
Comp.T 10.96947 9.531564 7.257728 5.580195 3.62024 1.706838 2.958344 1.696797
Table (4)
Cores 1 2 3 4 5 6 7 8
Speedup 10.96947 1.150857 1.511419 1.965786 3.030039 6.426779 3.707976 6.46481
Figure (3)
Figure (4)
Section (8)
Conclusions
Through applying the parallel form of Kalman Filter equations on noisy sound wave and using various number of cores, it is found that a kalman filter algorithm can be efficiently implemented in parallel by splitting the all signal points into large segments of data and applying equations on each segment simultaneously
References
[1] M. S. Grewal and A. P. Andrews, Kalman Filtering: Theory and Practice Using MATLAB, 3rd ed. John Wiley & Sons, Inc., 2008.
[2] R. E. Kalman., A New Approach to Linear Filtering and Prediction Problems, Transaction of the SME-Journal of Basic Engineering, 1960.
[3] R. G. Brown and P.Y.C. Hwang, Introduction to Random Signals and Applied Kalman Filter, 4th ed. John Wiley & Sons, Inc.,2012.
[4] T. T. Soong, Fundamentals of probability and statistics for engineers, John Wiley & Sons, Inc., 2004.
[5] G. Welch and G. Bishop, An Introduction to Kalman Filter, ACM, Inc., 2001.
[6] Selim G. Akl, The Design and Analysis of Parallel Algorithms, Prentice-Hall, Inc., 1989.
[7] K. K. Paliwal and Anjan Basu., A Speech Enhancement Method Based on Kalman Filtering, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing, 1987.