I wrote this applet to explore how particle filters work -- partially in an effort to improve my FastSLAM implementation and partially for another project I am working on.
Wikipedia has a nice article on particle filters if you would like more background.
In FastSLAM, the particle filter is used to represent the robot localization. The filter as a whole represents the expectation distribution of the robot's pose (pose = XY coordinates and heading). Each particle in the filter represents a single pose. A history of particles represents the path the robot has traversed.
Given a previous pose distribution, a current sensor reading, and a likelihood distribution of the robot's pose base on the sensor reading, the particle filter calculates a new pose distribution. This new distribution is conditioned on both the previous pose and the current sensor reading.
The particle filter algorithm is:
This demo includes a few different methods for some of these steps. The variants are discussed in more detail in the experiments section.
The particle filter solves a very similar problem as the Kalman filter. Welch and Bishop describe the type of process to which the Kalman filter is often applied. At each time step, the process is observed. The observation is used to update the expected state of the process. Each observation increases the confidence in the expected state. As time passes, the process is believed to change in an unknown way. Between each observation, the confidence of the expected value is reduced. The Kalman filter represents state with a Gaussian function. The mean of the function is the expected value, and the standard deviation of the function describes the confidence. As the standard deviation decreases, the expected range of the state shrinks and the confidence in the mean increases. As the standard deviation increases, the range of probable state values increase and confidence decreases. The observation update equation combines two Gaussians (expectation and observation) in a way that the resulting Gaussian always has a smaller standard deviation than both of the two source Gaussians. The time update simple adds a small value to the state Gaussian to represent process drift.
Instead of using a closed form equation to represent expectation, the particle filter uses a set of particles to represent expectation. In a Cartesian plane, each particle might be a point. The density of the points in an area represents the expectation that the actual system state is in that area. One could take a histogram of the particles to get an alternative representation of the state expectation. In fact, I do just that in this demo to display the particle filter distribution.
The following experiments will guide you through the different features of this demo and hopefully give you some insight into how to use a particle filter.
Experiment 1: Introduction
The applet initializes with approximately 500 particles, each containing a single floating point number, uniformly distributed between -15 and 15. The distribution is displayed graphically in blue with a histogram. The histogram is always normalized to the same display height. Overlaying the histogram in black is a Gaussian function. This represents the likelihood that a particle accurately models the system state. Like the histogram, the display for this Gaussian is always normalized to the same height.
Press the "Update" button. The histogram will adapt to fit the Gaussian. Since we started with a uniform distribution and select the particles based on the relative values gathered from the Gaussian, the new distribution will nearly exactly match the Gaussian.
Press the "Run" button and watch the particle distribution change over time. It will fluctuate around the likelihood function, but will not deviate much.
The zμ slider changes the mean of the likelihood function. z is used as a variable to represent the observation. Drag the slider back and forth and watch the particle filter follow. Try dragging the slider both slowly and quickly.
The zσ slider changes the standard deviation of the likelihood function. When you make the standard deviation very small, the particle filter still bleeds outside of it significantly. This is due to the error added after each resampling period. The smaller the standard deviation, the more confidence we are putting in the system observations. Play with the observation error slider and watch how the filter responds.
zμ will be determined by the state of your system and how you observe it, but depending on how you define your likelihood function, zσ may be a parameter that you can tune to make your system work better. In fact, it was this parameter that Dieter Fox helped me tune to a very large number that made my FastSLAM implementation actually work.
Notice that when zσ is large, the filter does not look as smooth. We can change that with the N slider. N changes the number of particles in the filter. Some applications dynamically adapt the filter size to effectively represent their system. Extra particles use computation time without giving good results. Too few particles can cause the system to lose important state information.
Experiment 2: Resampling Error
The xω slider tells the particle filter how much process error to add to each particle after resampling. The added error prevents too many particles from sitting on the same point and allows the filter to explore new points. This implementation only adds error to particles that are duplicated during resampling.
Set zσ to 0.50 and drag xω back and forth to see how the distribution changes. The width of the distribution is more dependent on this parameter than it is on the likelihood function confidence. This is because each resampling update narrows the distribution towards the likelihood mean. With no process error, the particle distribution would converge to the observation mean. The variance of the likelihood function only determines how quickly it converges. xσ provides pressure that prevents the system from converging.
xω serves another important purpose. Set xω to zero and run until there is only one bar in the histogram. Now increase zσ to 30. The histogram does not adapt because there are only a few particle values from which to select. Very slowly increase xω and watch the filter adapt. You can pause the particle filter if you want while you set the process error. Repeat this experiment a couple times with different values for the process error.
xω is another parameter that will need to be tuned to your specific application.
Experiment 3: Likelihood Error and Process Error
Set the following parameters:
Run the filter until it matches the likelihood function. Now pause the filter and move zμ to +10.0. Press Update a few times and watch the filter slide right over to the peak. Our likelihood function is perfect with no local maxima. Even though the values are very small where the filter started, the function provides a very nice slope on which the filter can hill-climb. When you design your own likelihood function, it probably will not be a perfectly smooth and continuous. It is worthwhile seeing what happens when small errors are introduced.
The zω slider sets likelihood error. Pause the system and set zω to 0.01. Set zμ back to -10.0 and run the system. The filter slowly expands as it searches the space for a likely state. The rate at which the filter expands is determined by the process noise zω. When the particles in the filter leave the tail of the bell curve, the entire filter suddenly jumps to model the distribution. You may want to watch this with the Update button to see how quickly it really happens.
Try increasing the likelihood error even more to watch how the filter behavior changes (it doesn't change much).
Experiment 4: Bimodal distributions
Everything we have done to this point could also have been done by directly modeling the system state with a Gaussian and using update equations for a Kalman filter. Particle filters have another property that make them particularly well suited for robot localization: they can conform to any distribution shape. Imagine a robot whose observations lead it to believe it could be in one of two or three places. Using a Kalman filter, you would need special code to detect bi- and tri-model distributions and use the appropriate number of Gaussians. The particle filter gives this to you for free (as long as you have enough particles).
Set the following parameters and run the system until it stabilizes:
While the system is running, set the Strength function to f2 and slowly drag zμ to 8.00. If you are a little lucky the filter distribution will be split between two peaks. That distribution will probably stick around, but there is a chance that one of the peaks will lose its representation through the luck of the draw.
f3 is not so nice. Use the mean slider to massage the filter until both peaks are well represented. Pause the system and select f3. Notice that the peak on the left is slightly lower than the peak on the right. This slight bias is going to cause the particles in the right peak to be selected more and will eventually deplete the left peak of all representation. Run the system to watch this happen.
Once one peak is gone, you can use the mean slider to swap the positions of the two peaks. If the peaks are too far apart, and the process noise too low, the distribution will not rediscover the lost peak even if it is a better choice.
Imagine the problems this might cause in robot localization. A robot might have two theories on its location. One of the theories might currently be stronger than the other, but not by a large amount. The robot sits still and looks at its static surroundings and ponders its position. At each time step it becomes more and more confident that it is in the slightly more likely place. In time it forgets the second hypothesis even though it has received no new information. This problem leads to an important variant in the resampling method.
A particle filter can be adjusted so that it attempts to model the likelihood distribution and does not attempt to converge to the maximum value of the likelihood function. Once the filter is accurately modeling the likelihood function, we want to resample from it uniformly. When the likelihood function changes, we want to increase the filter representation in places that became more likely and decrease the filter representation in places that became less likely. Accomplishing this is simple. Instead of using the likelihood function as the particle strength directly, we calculate the strength by dividing the current likelihood of the particle by the previous likelihood of the function. This results in a resampling that will not cause the filter to forget slightly less probably hypothesis until there is extra data discounting them.
But this resampling method has its own problems.
Experiment 5: Ratio Sampling
Before turning the "Ratio Selection" checkbox on, set the particle filter so that it covers the bimodal distribution for f2. Set xω to 1.0. Run the filter.
The bimodal filter distribution should be more stable than it is without ratio sampling (although I have not done empirical tests to verify this). Switch to f3. Ratio sampling helps to prevent the slightly more likely hypothesis from dominating the hypothesis space.
With the a well represented bimodal distribution and the peaks as far apart as possible, increase xω to 4.0. The bimodal distribution will become less stable. The ratio selection depends on the previous likelihood of the particle. You could take each particle and plot its state against its last likelihood value to show a graph of the likelihood function. Except that the previous likelihood value is not changed when the process error is added to the filter. So the accuracy of the selection strength value depends on how similar the likelihood function is between the original state estimate and the state estimate with process error. As long as the likelihood function is well shaped, smaller values of xω will cause the ratio selection to be more accurate, but small values of xω make it more difficult for the filter to evenly cover the distribution area and increase the search space when a primary hypothesis is lost.
From a good bimodal distribution, set zω to 0.02 and switch to f1. If the distribution at the lost peak is not immediately destroyed, it may take some time for the distribution to forget the missing hypothesis. The noise in the likelihood function allows some particles to switch between unlikely and slightly less unlikely, but at that small magnitudes a small change can result in a large selection ratio.
Turn off ratio sampling, select f1 and center the likelihood function. Run the filter until you have a good representation. Set the process error to a small value like 1.0. Stop the filter, turn on ratio sampling, and move zμ so that it partially overlaps the filter distribution. Run.
Notice that the filter distribution does not center itself quickly on the likelihood function. Once the filter gets close enough to the desired representation, there is little selective pressure to push it to correct all of the way. There are still particles representing the mean value, but it is not where the bulk of the particles are.
It was my subjective experience that FastSLAM worked better with ration sampling. It was also my interpretation of Montemerlo's paper that he used ratio sampling in his original FastSLAM implementation (although he does not use that name). Most descriptions of particle filters that I have seen do not include the ratio sampling technique.
Experiment 5: Nongaussianity
I mentioned above that particle filters have a greater range of representation than a simple Gaussian model. f4 is a stepwise function. The discontinuous nature of this function suggests that ratio sampling might do a poor job of representing this distribution. In fact, both sampling methods tend to look like Gaussians under this likelihood function. I believe the reason for this can be explained by the central limit theorem.
The central limit theorem states a distribution consisting of sum of many independent random values will tend towards a Gaussian distribution. The process noise added at each step causes the filter to represent the uniform areas of f4 as a Gaussian.
Experiment 6: Adaptive Filter Sizes
Play with the number of particles while the filter is representing different distributions. The filter works best when it accurately represents the likelihood distribution. Narrower likelihoods (which represent more accurate world models and sensors) need fewer particles than wider distributions. Bimodal distributions need more particles than unimodal distributions. Dieter Fox has used KLD-sampling to adapt the particle counts both for sensor accuracy and the number of currently valid state hypothesis.
In all honesty, I was never quite able to understand how KLD-sampling works. There may be other ways to adapt the particle filter size. A very simple method might require a minimum density of particles within an area of the space. One could use a histogram to count particles in different bins during resampling and stop the resampling process when one of the bins reaches a certain amount. This method would not automatically adapt to sensor or world model error and would require manual tuning.
This is the end of this tutorial. The source code for this applet is available below.