Cops-Ref Object Reference Understanding Dataset

Cops-Ref stands for Compositional Referring Expression Comprehension, which is a visual reasoning image dataset about the object reference understanding. The dataset contains 75,299 real images, 148,712 text descriptions, and 1,307,885 candidate regions.
This dataset has two main features. One is a new text generation engine that can combine reasoning logic and visual features to generate text descriptions of varying degrees of complexity. The other is a new test setting that interferes with semantically similar visual images during the test.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.