In this work, we try to estimate the body configuration of a person in a
monocular video. We use a motion correlation technique to measure the motion
similarity in various space-time locations between the input video and stored
video templates. This correlation is done at coarse to fine scales around the
joint positions to ease handling the variance in size and motion between
subjects. These observations are used to predict the conditional state
distributions both for exemplars and joint positions. The graphical model that
represents relation between joints at sequence of frames is so complicated which
makes the inference impractical. To overcome this problem we have represent the
body configuration at every frame with an exemplar. In the images below the
first row shows the input video and second row shows the best matching sequence
of exemplars found from training data.
Joint Position estimation is then solved using Gibbs Sampling and Gradient
Ascent.
Sample code for running the Schechtman and
Irani's cvpr05 is available.
Some videos of results are available, side view fast walk,
side view incline, side view slow walk
and 45' view fast walk.