Transformer-based Dual Relation Graph for Multi-label Image Recognition
Transformer-based Dual Relation Graph for Multi-label Image Recognition
Jiawei Zhao Ke Yan Yifan Zhao Xiaowei Guo Feiyue Huang Jia Li

Abstract
The simultaneous recognition of multiple objects in one image remains achallenging task, spanning multiple events in the recognition field such asvarious object scales, inconsistent appearances, and confused inter-classrelationships. Recent research efforts mainly resort to the statistic labelco-occurrences and linguistic word embedding to enhance the unclear semantics.Different from these researches, in this paper, we propose a novelTransformer-based Dual Relation learning framework, constructing complementaryrelationships by exploring two aspects of correlation, i.e., structuralrelation graph and semantic relation graph. The structural relation graph aimsto capture long-range correlations from object context, by developing across-scale transformer-based architecture. The semantic graph dynamicallymodels the semantic meanings of image objects with explicit semantic-awareconstraints. In addition, we also incorporate the learnt structuralrelationship into the semantic graph, constructing a joint relation graph forrobust representations. With the collaborative learning of these two effectiverelation graphs, our approach achieves new state-of-the-art on two popularmulti-label recognition benchmarks, i.e., MS-COCO and VOC 2007 dataset.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| multi-label-classification-on-ms-coco | TDRG-R101(448×448) | mAP: 84.6 |
| multi-label-classification-on-ms-coco | TDRG-R101(576×576) | mAP: 86.0 |
| multi-label-classification-on-pascal-voc-2007 | TDRG-R101(448×448) | mAP: 95.0 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.