Pedestrian Tracking using Multiple Cues
One traditional approach to object tracking is online tracking where one has to search in the periphery and introduce the best candidate pixel as the location of the object in next frame based on the features. On the other hand, cluster-based tracking using static color histograms has also proved to be useful for object tracking. This work is an attempt to introduce a principled way for combining the above mentioned methods. We view the tracking problem as a structured output prediction task where the goal is to predict the whole trajectory given the entire video. We build a score function that incorporates the idea of cluster-based tracking and adaptive/online appearance modeling and which helps us compare the goodness of different tracks. We then estimate the parameters of this function using a max-margin formalism.
Pedestrian Tracking Features
We use a human detector so that we can gather bounding boxes containing pedestrians in the entire video. We will then group them into different clusters based on the color distribution of several body patches. Next, we compute distance maps based on the similarity of the color distibutons at each pixel to the color distribution of the cluster center to which the target belongs and use the weighted combination of them as a feature. The following figure summarizes these steps.
Learning and Inference
Learning and inference both involve finding the most likely explanation (Viterbi) for the problem is structured. Specifically, learning amounts to finding a set of parameters that best describe the patterns of the ground truth trajectories. Several "bad" trajectories are generated (using ground truth trajectories) which characterize the most confusing "wrong" tracking results. The idea is to explore possible failure modes in order to come up with a trajectory estimator in a discriminative fashion. These examples are treated as negative examples to be avoided by the tracker. A max-margin criterion then imposes constraints to the parameter space in such a way that these negative examples are avoided proportional to their badness. The following figure shows some negative examples generated using ground truth tracks.
Inference involves finding a good trajectory given the model parameters.
Video of Detection and Tracking Results
Fully automatic detection and tracking is shown in the video below.
||Bahman Yari Saeed Khanloo, Ferdinand Stefanus, Mani Ranjbar, Ze-Nian Li, Nicolas Saunier, Tarek Sayed, and Greg Mori ,"Max-Margin Offline Pedestrian Tracking with Multiple Cues", 7th Canadian Conference on Computer and Robot Vision (CRV), 2010. [pdf]
||Bahman Yari Saeed Khanloo, Ferdinand Stefanus, Mani Ranjbar, Ze-Nian Li, Nicolas Saunier, Tarek Sayed, and Greg Mori ,
"A Large Margin Framework for Single Camera Offline Tracking with Hybrid Cues", Computer Vision and Image Understanding (CVIU), 116 pp.676-689, 2012.