Abstract:
Object localization is one of the core tasks in computer vision, as they
are applied in many real-world applications such as autonomous vehicles
and robotics. It refers to the task of locating an object in an image us ing a bounding box. Most of the existing object localization methods
require a huge amount of annotations for training and are highly time consuming. Thus, it is worth developing object localization methods for
unlabeled images. However, this is far more challenging than typical co localization or weakly super- vised localization tasks. To tackle this prob lem, a novel attention-based method is proposed that takes advantage of
CNN models, attention mechanisms, and data mining. Specifically, the
proposed method first converts the feature maps from a new feature map
extractor model,VggCBAM, into a set of transactions and then discovers
frequent patterns from the transaction database through pattern mining
techniques. From the experimental results, it is observed that the fea ture maps extracted contain meaningful activations that increase focus on
the object of interest while suppressing background and the discovered
patterns typically hold appearance and spatial consistency. Motivated by
observation, this method can easily discover and localize possible objects
by merging meaningful patterns. This approach does not need any anno tations yet still shows promising localization ability, which provides a new
perspective to solve the localization problem.