Maarten Sap∗ ◊♥ Hannah Rashkin∗ ◊♥ Derek Chen♥ Ronan Le Bras◊ Yejin Choi◊♥

Abstract
We introduce Social IQa, the first largescale benchmark for commonsense reasoning about social situations. Social IQa contains 38,000 multiple choice questions for probing emotional and social intelligence in a variety of everyday situations (e.g., Q: "Jordan wanted to tell Tracy a secret, so Jordan leaned towards Tracy. Why did Jordan do this?" A: "Make sure no one else could hear"). Through crowdsourcing, we collect commonsense questions along with correct and incorrect answers about social interactions, using a new framework that mitigates stylistic artifacts in incorrect answers by asking workers to provide the right answer to a different but related question. Empirical results show that our benchmark is challenging for existing question-answering models based on pretrained language models, compared to human performance (>20% gap). Notably, we further establish Social IQa as a resource for transfer learning of commonsense knowledge, achieving state-of-the-art performance on multiple commonsense reasoning tasks (Winograd Schemas, COPA).
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| coreference-resolution-on-winograd-schema | BERT-large 340M | Accuracy: 67 |
| coreference-resolution-on-winograd-schema | BERT-SocialIQA 340M | Accuracy: 72.5 |
| question-answering-on-copa | BERT-large 340M | Accuracy: 80.8 |
| question-answering-on-copa | BERT-SocialIQA 340M | Accuracy: 83.4 |
| question-answering-on-social-iqa | Random chance baseline | Accuracy: 33.3 |
| question-answering-on-social-iqa | BERT-base 110M (fine-tuned) | Accuracy: 63.1 |
| question-answering-on-social-iqa | BERT-large 340M (fine-tuned) | Accuracy: 64.5 |
| question-answering-on-social-iqa | GPT-1 117M (fine-tuned) | Accuracy: 63 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.