Chi Sun; Xipeng Qiu; Yige Xu; Xuanjing Huang

Abstract
Language model pre-training has proven to be useful in learning universal language representations. As a state-of-the-art language model pre-training model, BERT (Bidirectional Encoder Representations from Transformers) has achieved amazing results in many language understanding tasks. In this paper, we conduct exhaustive experiments to investigate different fine-tuning methods of BERT on text classification task and provide a general solution for BERT fine-tuning. Finally, the proposed solution obtains new state-of-the-art results on eight widely-studied text classification datasets.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| sentiment-analysis-on-imdb | BERT_base+ITPT | Accuracy: 95.63 |
| sentiment-analysis-on-imdb | BERT_large+ITPT | Accuracy: 95.79 |
| sentiment-analysis-on-yelp-binary | BERT_large+ITPT | Error: 1.81 |
| sentiment-analysis-on-yelp-binary | BERT_base+ITPT | Error: 1.92 |
| sentiment-analysis-on-yelp-fine-grained | BERT_base+ITPT | Error: 29.42 |
| sentiment-analysis-on-yelp-fine-grained | BERT_large+ITPT | Error: 28.62 |
| text-classification-on-ag-news | BERT-ITPT-FiT | Error: 4.8 |
| text-classification-on-dbpedia | BERT-ITPT-FiT | Error: 0.68 |
| text-classification-on-sogou-news | BERT-ITPT-FiT | Accuracy: 98.07 |
| text-classification-on-trec-6 | BERT-ITPT-FiT | Error: 3.2 |
| text-classification-on-yahoo-answers | BERT-ITPT-FiT | Accuracy: 77.62 |
| text-classification-on-yelp-2 | BERT-ITPT-FiT | Accuracy: 98.08% |
| text-classification-on-yelp-5 | BERT-ITPT-FiT | Accuracy: 70.58% |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.