The Vision and Media Lab
Real-time Motion-based Gesture Recognition using the GPU
Mark Bayazit, Alex Couture-Beil, Greg Mori
Video Preview

Get Flash to see this player.

Download higher quality version (17MB)

This is a video of me playing a game that utilizes our technology. The game tells you what action to perform (shown in yellow at the top), and then if you perform the action within 10 seconds, a checkmark will appear, and you get points depending on how quickly you complete it. If you do not perform the action in time, an X appears and you lose 100 points.

About half way through the video, I perform the actions incorrectly to demonstrate that our program will not accept the wrong action. For example, at 1:45 I do right-wax-out instead of right-wax-in, which involves motion in the same area of the frame, but our software cannot be tricked so easily. As soon I switch directions and perform the correct action, our algorithm picks it up immediately.

The game randomly chooses one of 13 actions for you to perform:

  • idle (stand still)
  • right-wax-in (like wiping the screen in a circular motion)
  • right-wax-out
  • left-wax-in
  • left-wax-out
  • punch-right (an upwards punch)
  • punch-left
  • wave-left
  • wave-right
  • sway (like you are at a mellow concert and are holding a lighter in each hand)
  • watch (tap your left wrist like you're asking for the time)
  • waves (roll your arms like water waves)
  • junk (jump around like a maniac)
We developed a method to recognize gestures in real-time, making use of the GPU. To demonstrate our technology, we made a simple computer game that asks the player to perform a certain action, and gives him points based on how quickly he performs it. Other applications might include human-computer interaction and surveillance.

Our algorithm works by examining a localized region of optical flow centred around the user's face. Motion features are created based on optical flow which has been blurred over several frames. This temporal blur captures the entire set of motion required to complete a single phase of the gesture.

A multi-class variant of AdaBoost is used to learn key features of flow intensities that best discriminate each action. Once a sufficient number of weak-classifiers are learned, they can be used to classify gestures from new input in real-time. We compute the classifier scores each frame in real-time for each possible gesture. We then classify the gesture based on the gesture which received the highest score.

The arrows in the following image correspond to the top 50 classifiers which contribute towards the punch-right gesture score. Each classifier compares a type of flow (indicated by the arrow direction) at a certain pixel to some threshold value. Only classifiers whose motion features are above the threshold (or below, for negative parity) are displayed on the image.