{Trieu Trinh Chinh Ngo}
Abstract
We collect data from open sources on the Internet, and classify them into different categories, each labeled with a specific language style 3. In total, there are 3.3 million pairs of English and Vietnamese texts, ranging from single sentences to paragraphs. A model trained with our dataset outperforms Google Translate on a selected set of diverse text sources. On IWSLT'15 we achieved a BLEU score of 37.84.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| machine-translation-on-iwslt2015-english-1 | Tall Transformer with Style-Augmented Training | BLEU: 37.8 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.