6 months ago

FCGEC: Fine-Grained Corpus for Chinese Grammatical Error Correction

View Paper Details

Lvxiaowei Xu Jianwang Wu Jiawei Peng Jiayu Fu Ming Cai

FCGEC: Fine-Grained Corpus for Chinese Grammatical Error Correction

Abstract

Grammatical Error Correction (GEC) has been broadly applied in automatic correction and proofreading system recently. However, it is still immature in Chinese GEC due to limited high-quality data from native speakers in terms of category and scale. In this paper, we present FCGEC, a fine-grained corpus to detect, identify and correct the grammatical errors. FCGEC is a human-annotated corpus with multiple references, consisting of 41,340 sentences collected mainly from multi-choice questions in public school Chinese examinations. Furthermore, we propose a Switch-Tagger-Generator (STG) baseline model to correct the grammatical errors in low-resource settings. Compared to other GEC benchmark models, experimental results illustrate that STG outperforms them on our FCGEC. However, there exists a significant gap between benchmark models and humans that encourages future models to bridge it.

Code Repositories

Official

pytorch

Mentioned in GitHub

Mentioned in GitHub

xlxwalex/FCGEC†

Benchmarks

Benchmark	Methodology	Metrics
grammatical-error-correction-on-fcgec	STG-Joint	F0.5: 45.48 exact match: 34.10

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

6 months ago

FCGEC: Fine-Grained Corpus for Chinese Grammatical Error Correction

View Paper Details

Lvxiaowei Xu Jianwang Wu Jiawei Peng Jiayu Fu Ming Cai

FCGEC: Fine-Grained Corpus for Chinese Grammatical Error Correction

Abstract

Grammatical Error Correction (GEC) has been broadly applied in automatic correction and proofreading system recently. However, it is still immature in Chinese GEC due to limited high-quality data from native speakers in terms of category and scale. In this paper, we present FCGEC, a fine-grained corpus to detect, identify and correct the grammatical errors. FCGEC is a human-annotated corpus with multiple references, consisting of 41,340 sentences collected mainly from multi-choice questions in public school Chinese examinations. Furthermore, we propose a Switch-Tagger-Generator (STG) baseline model to correct the grammatical errors in low-resource settings. Compared to other GEC benchmark models, experimental results illustrate that STG outperforms them on our FCGEC. However, there exists a significant gap between benchmark models and humans that encourages future models to bridge it.

Code Repositories

Official

pytorch

Mentioned in GitHub

Mentioned in GitHub

xlxwalex/FCGEC†

Benchmarks

Benchmark	Methodology	Metrics
grammatical-error-correction-on-fcgec	STG-Joint	F0.5: 45.48 exact match: 34.10

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

FCGEC: Fine-Grained Corpus for Chinese Grammatical Error Correction | Papers | HyperAI