Date

3 years ago

Organization

Publish URL

github.com

Paper URL

arxiv.org

License

Other

Tags

Object Detection

Cops-Ref stands for Compositional Referring Expression Comprehension, which is a visual reasoning image dataset about the object reference understanding. The dataset contains 75,299 real images, 148,712 text descriptions, and 1,307,885 candidate regions. This dataset has two main features. One is a new text generation engine that can combine reasoning logic and visual features to generate text descriptions of varying degrees of complexity. The other is a new test setting that interferes with semantically similar visual images during the test.

This dataset is contributed by community users and is intended for educational and informational purposes only. If any content involves copyright infringement, please contact us at [email protected] for prompt review and removal.