The Vision and Media Lab, Simon Fraser University

We are researching automatic methods for estimating the pose of human figures in still images and video sequences.

For still images of human figures, we have been exploring two complementary approaches to the problem of localizing 2d joint positions. One performs a top-down search based on shape matching to a set of stored "exemplar" human figures (Mori and Malik, 2006). This approach matches to exemplars based upon the shape, using "shape contexts."

The second approach searches for a 2d model of a human in still images. The novel aspect of this approach is the use of segmentation of the image into "superpixels" as a pre-processing step to reduce the complexity of the search (Mori, 2005).

MATLAB source code for computing "superpixels" is available here.

We have also applied the exemplar-based approach to video sequences (McIntosh et al., 2007). The exemplar matches are used to initialize a kinematic tracker which automatically discovers part appearance models from a video sequence. These part appearance models for each half-limb are determined by performing a local segmentation of each image, and the best ones are then used for tracking.

Results of applying this technique to the CMU Mobo dataset are shown below (click image to see video sequence).