In this project, an acquired video sample contains several ground vehicles moving through traffic. This video is processed one frame at a time to track each vehicles' motion. Background estimation is used in addition to a differencing operation to obtain silhouette images of the moving vehicles. These binary images of vehicles are processed into blobs and the center of each blob serves as a tracking point. The moving trace and tracking data for each vehicle is successfully obtained. The final video is an overlay of the tracking data on top of the original video. The related algorithms, coding, result analysis, and future works are listed.
The main purpose of this project is to analyze a video sequence of vehicles navigating traffic. We want to provide information about these moving objects like location, size, number of vehicles in the frame, and so on. This type of information is useful in many different applications; it is not just limited to vehicular tracking either.
Tracking object motion is a useful way to extract objects or regions of interest from a background scene. It is widely used in robotic applications, autonomous navigation, and dynamic scene analysis. Being able to track cars can apply to all three of these aforementioned applications. The analysis of the motion of a tracked traffic scene can be used in many ways; such as by transportation traffic controller. Imagine placing cheap video cameras and digital systems at high traffic intersections and having the distributed network of traffic systems determine when to change the traffic lights instead of a dumb timer.
The goal of this project is to analyze a short video sample and be able to track vehicles' movements. At least two vehicles should be tracked at once, with the potential of unlimited vehicles being tracked. Because a type of background subtraction will be used, an accurate background image must be extracted. There should also be a way to export the number of vehicles being tracked as well as their absolute position in the scene in order to accommodate further scene analysis. If the background can be correctly reconstructed, the target can be extracted perfectly and tracked successfully.
This project used the "Multiple Human Tracking and Gait Based Human Recognition" [PDF] paper as a starting point. This paper deals with some motion tracking of human targets in front of a static background, among other things.
Background extraction can be accomplished many different ways including taking a finite amount of frames and computing the mean or mode image of all frames. This image will be a good background, since the background is assumed to be present the whole time in the videos; it is only occasionally obstructed by a foreground moving vehicle. However, for this project I chose to simply use a known, good example of the background image from a specific frame of the sample video which contained no vehicles.
The technique used to locate a moving object in a video is a simple background subtraction. Each frame of the video is subtracted by the constant background image already extracted and the result is an image containing only foreground (moving) data. This image is then converted to the binary color space to ease later processing of the image. At this point, however, there is a lot of noise in the resulting image due to varying irradiance. The resulting image may contain shadows or small moving objects in addition to the moving vehicles.
To remove the unwanted data and prepare our wanted data, a series of morphological operations are executed. The sequence begins with an "erosion" which effectively removes the "salt and pepper" pieces of data that were unwanted by shrinking them out of existence. The sequence continues with a few "dilations" which serve to connect the disjointed blobs that should represent a single car, but do not because of a dam of sorts (typically the windshields, which do not pass the chosen threshold value). The sequence ends with a final erosion to simply shrink our vehicle blobs a little. The resulting image contains blobs at each location of a moving vehicle.
If a video frame has a newly created blob in it, we want to track that blob. This tracking can be accomplished by calculating the centroid of each blob. With the centroid, we now know the location of the moving vehicles. A "tracker" image is constructed which consists of a square target centered at the centroid.
The final step in this project is to overlay the tracker image onto the original source image. The result can be achieved by a simple addition operation, adding the tracker to the source image. The resulting image and video shows traffic flowing as usual; however, each vehicle now has a square target on its body.
Based on the algorithm stated above, we chose some sample frames and calculated the output results. An accurate background image was extracted, accurate silhouette images of vehicles were extracted, and all vehicles in the video were successfully tracked simultaneously, even with varying irradiances. Following is a typical frame from the video without any processing:
The first image created is the background extraction result:
The next image is the binary image of the foreground resulting from subtracting the background image from the source frame. This image contains the moving vehicles:
The binary foreground image was taken and underwent a series of morphological operations. The resulting image contains blobs at the approximate locations of the vehicles. We can use these blobs in our analysis easily:
Tracking was performed on our blobs which resulted in a target tracker image. This image contains targets positioned at the approximate center of the blobs (and therefore the vehicles):
The final result was obtained by adding the tracker image to the original source image. This effectively placed targets (our tracking system) on each vehicle in each frame of the video:
The proposed algorithm was implemented successfully and met all goals and objectives. It was able to track multiple moving vehicles at once in varying irradiances. This type of successful implementation may be useful for dynamic scene analysis for interested companies or government programs in the areas of transportation traffic control or beyond. An alternative to dumb timers controlling traffic flow would be a huge cost savings. This alternative would consist of cheap video cameras and digital systems at high traffic intersections and having the distributed network of traffic systems determine when to change the traffic lights. This is just one example of an application of this type of motion tracking algorithm.
There are plenty of additional features that could be added to this algorithm as well as a few changes that could make it work more accurately. Here is a list of possible additions; some suggestions may be related to others: