Object Localization using Dynamic Template Warping



Abstract

A simple method is presented for detecting, localizing and recognizing classes of objects, while accommodating a wide variation in an object's pose. The method utilizes a small two-dimensional template that is warped into an image, and converts localization to a one-dimensional sub-problem, with the search for a match between image and template executed by dynamic programming. The method recovers three of the six degrees of freedom of motion (2 translation, 1 rotation), accommodates two more degrees of freedom in the search process (1 rotation, 1 translation), and is extensible to the final degree of freedom. Experiments demonstrate that the method provides an efficient search strategy that outperforms normalized correlation. This is demonstrated in the example domain of face detection and localization, and is extended to more general detection tasks. An additional technique recovers a rough object pose from the match results, and is used in a two stage recognition experiment in conjunction with maximization of mutual information.

Some Results


Detection:


The color code used indicates the mapping between template columns and image columns returned by the system. This mapping can then be used to solve for a partial pose.


Partial Pose Solution:


This is the rendering of the partial pose returned by the pose-solver.

Click here for a movie of the partial pose solution.


Final Pose Solution using Mutual Information:


This figure shows the initial pose given to the MI pose solver after the partial pose solution. The final pose is illustrated by a rendering of the 3D model, and by an overlay of points subsampled from the object surface in the next row. From the figure, we see that the final solution given by the MI alignment has corrected the x-rotation error, and the pose in the final rendered image agrees with the input pose.

Natural Scene Classification

Scene classification is an open problem in machine vision, with applications in image and video database indexing. Creating systems that can categorize images by visual similarity or content requires flexible representations of images. We investigate two methods for learning visual concepts that encode the properties of a scene class, extracted from a small set of positive and negative examples. Computed concepts are simple templates that capture color and spatial properties of the class.

Extracting templates for Scene Classification using a few examples


  • Abstract



    Multiple Instance Learning for Natural Scene Classification


  • Abstract


    Image Similarity Experiment


  • Click here to participate in an image-similarity experiment.