Oriane Siméoni Gilles Puy Huy V. Vo Simon Roburin Spyros Gidaris Andrei Bursuc Patrick Pérez Renaud Marlet Jean Ponce

Abstract
Localizing objects in image collections without supervision can help to avoid expensive annotation campaigns. We propose a simple approach to this problem, that leverages the activation features of a vision transformer pre-trained in a self-supervised manner. Our method, LOST, does not require any external object proposal nor any exploration of the image collection; it operates on a single image. Yet, we outperform state-of-the-art object discovery methods by up to 8 CorLoc points on PASCAL VOC 2012. We also show that training a class-agnostic detector on the discovered objects boosts results by another 7 points. Moreover, we show promising results on the unsupervised object discovery task. The code to reproduce our results can be found at https://github.com/valeoai/LOST.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| single-object-discovery-on-coco-20k | LOST | CorLoc: 50.7 |
| single-object-discovery-on-coco-20k | LOST + CAD | CorLoc: 57.5 |
| weakly-supervised-object-localization-on-cub | LOST | Top-1 Localization Accuracy: 71.3 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.