OneStopEnglish corpus: A new corpus for automatic readability assessment and text simplification
OneStopEnglish corpus: A new corpus for automatic readability assessment and text simplification
{Ivana Lu{\v{c}}i{\'c} Sowmya Vajjala}

Abstract
This paper describes the collection and compilation of the OneStopEnglish corpus of texts written at three reading levels, and demonstrates its usefulness for through two applications - automatic readability assessment and automatic text simplification. The corpus consists of 189 texts, each in three versions (567 in total). The corpus is now freely available under a CC by-SA 4.0 license and we hope that it would foster further research on the topics of readability assessment and text simplification.
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| text-classification-on-onestopenglish | SMO (Sequential Minimal Optimization) | Accuracy (5-fold): 0.781 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.