Calling the Kalman Filter a useful tool would be an understatement, it is indeed the mathematically optimal filter to reconstruct the status of a system from measurements — if some conditions are satisfied:
- the model matches the real system perfectly
- system and measurement noise is uncorrelated
- the covarianc matrices of the system noise and measurement noise are known.
There are many good introductions to the Kalman Filter, which is why I won’t explain it here any further. This paper helped me a lot: An Introduction to the Kalman Filter by Welch and Bishop.
In this post, I wanted to focus on the covariance matrices of the kalman filter. I noticed that people tend to concentrate on the “basic functionality” of the filter and to ignore how the covariance matrices are propagated through the system. I will stick to the notation convention of the above mentioned paper to explain the meaning of the covariance matrices of the filter.
\(\mathbf{Q}_n\) is the system covariance matrix, which means, it represents the modelling noise of the system. Here is one example: Imagine our KF is used to estimate the velocity of a vehicle based on acceleration sensors and GPS data. The model equation would now integrate the acceleration over time to estimate the velocity. But what if the vehicle was driving on a bumpy road? A component of \(\sin(\alpha) \cdot g\) would be added to the acceleration in x-direction, when the car is rotated by \(\alpha\) w.r.t. the plane road. Assuming the error to be zero-mean (e.g. we are not driving uphill or downhill), the system covariance matrix can be estimated from the variance of the additional acceleration. The KF cannot compensate for an systematic error like driving uphill! If that would be the case, the engineer would have to devise a method to compensate for the systematic error in the first place.
\(\mathbf{R}\) is the measurement noise covariance matrix, which can be calculated by estimating the variance of the measurement noise. This would be the little intrinsic randomness of every acceleration sensor and the quantisation noise (which normally should be multiple dimensions below the analog measurement noise).
\(\mathbf{P}_k^-\) is the prediction covariance matrix (also called covariance matrix of the a-priori density) and \(\mathbf{P}_k\) is the filter covariance matrix (also called a-posteriori covariance matrix). \(\mathbf{P}_k^-\) represents the uncertainty about the estimated status after the kalman prediction step (also: “before the measurement”, thus a-priori), \(\mathbf{P}_k\) on the other hand represents the uncertainty about the estimated status after the innovation step (also: “after the measurement”, thus a-posteriori).
Now, how are these matrices linked up? To answer that questions, we look at the kalman equations involving them:
\(\mathbf{P}_k^- = \mathbf{A P}_{k-1} \mathbf{A}^T + \mathbf{Q}\)
The prediction covariance matrix of the current step is calculated by propagating the last filter covariance matrix through the system (expressed by matrix \(\mathbf{A}\)). Why? Because the filter covariance matrix represents the status uncertainties after the last step, they are converted to uncertainties after the prediction by propagating them through the system \(\mathbf{A}\). Also, \(\mathbf{Q}\) is added, because this is the intrinsic modelling uncertainty we have. Setting \(\mathbf{P}_{k-1}\) to \(0\) shows us: Even if we had a perfect knowledge of the last state, our uncertainty of the predicted state is constrained by \(\mathbf{Q}\), our ability to predict the status innovation.
The filter covariance matrix is calculated through
\(\mathbf{P}_k = (\mathbf{I} - \mathbf{K}_k \mathbf{H}) \mathbf{P}_k^-\)
where \(\mathbf{K}_k\) is the gain matrix and \(\mathbf{H}\) is the measurement matrix, which relates the (not observable) internal state to the measurement. What can we see in this equation? The a-posteriori covariance is smaller or equal to the a-priori covariance. Makes sense, the measurement gives us additional information which we use to reduce the uncertainty. To further analyze this matter, we need to regard the possible extrema of \(\mathbf{K}_k\), which are \(0\) and \(\mathbf{H}^{-1}\). The kalman gain becomes \(0\), if the prediction is perfect or the measurement is very noisy. In this case, the measurement is completely ignored. That means, that \(\mathbf{P}_k = \mathbf{P}_k^-\). The other extremum is hit, when there is no measurement noise, which means that the prediction is discarded and only the measurement is used to estimate the current state. In this case, \(\mathbf{P}_k = \mathbf{0}\). There is no uncertainty about the status.
In practical applications, we hope that the kalman gain stays far away from these two extreme values (The KF would be useless at these points, because it completely ignores measurement or prediction), but looking at them gives an impression of how the filter behaves if we tilt to one or the other side.