![]() |
![]() |
|
|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
|
![]() |
||||||||||||||||||||||
All background edges are subsequently filtered out; background edges are all edges which remained stationary over prolonged time periods. What remains are the edges corresponding to the human figure. |
||||||||||||||||||||||
![]() |
||||||||||||||||||||||
Pose-recovery is now formulated as a search problem in the pose parameters space of the 3-D model. We seek to align the 3-D human model such, that its projected contours matches the actual edges of the scene image, in all four views. Once a good fit is found in all the camera views, we assume to have found the correct 3-D positioning of the body. |
||||||||||||||||||||||
![]() |
||||||||||||||||||||||
![]() |
||||||||||||||||||||||
The similarity measure between model edges (shown grey in left image) and scene edges (shown black in left image) is based on the so-called distance transform (see right image: the distance image of the edges on the left). It basically computes the average distance between the model edges and the closest scene edges. For details on how this works, see here. A standard non-greedy algorithm is used to search through possible body configurations using the above similarity measure. Because the search space corresponding to 22 degrees of freedom is unfeasibly large, the search space is decomposed. |
||||||||||||||||||||||
![]() |
||||||||||||||||||||||
![]() |
||||||||||||||||||||||
![]() |
||||||||||||||||||||||
![]() |
||||||||||||||||||||||
![]() |
||||||||||||||||||||||
![]() |
||||||||||||||||||||||
|
|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
|
Thus, first the torso is aligned, then the arms and finally the feet. Once body pose has been determined, the system performs a prediction, based on the current pose and that of the last few frames, of what the body pose will be in the next image frame. That represents the starting point for the search at the next time iteration. |
||||||||
In summary, the system performs the following steps while in tracking mode: it preprocess the image, aligns the 3-D models so that their projections match the preprocessed image, and predicts body pose. In order to initialize tracking, the system assumes an upright and unoccluded pose of the body, and determines the 3-D torso axis by a triangulation process of the 2-D torso axes, which are visible in the various views (see dark lines in Figure below). |
||||||||
![]() |
||||||||
Once the 3-D torso axis has been determined, the other body pose parameters are found by a similar search process as during tracking. |
||||||||
So how good does this work? Take a look at the video clips below .... |
||||||||
Multi-view 3-D Tracking of Humans (1996) |
![]() |
||||||||||
|
![]() |
|
[Home Gavrila] [Resume] [Research] [People] [Publications] [Datasets] [Media Coverage] [Open Positions] [Search] |
![]() |
![]() |
Copyright © 2001-2013 Gavrila. All rights reserved. Disclaimer. |