RobeCzech: Czech RoBERTa, a monolingual contextualized language representation model
RobeCzech: Czech RoBERTa, a monolingual contextualized language representation model
Milan Straka Jakub Náplava Jana Straková David Samuel

Abstract
We present RobeCzech, a monolingual RoBERTa language representation model trained on Czech data. RoBERTa is a robustly optimized Transformer-based pretraining approach. We show that RobeCzech considerably outperforms equally-sized multilingual and Czech-trained contextualized language representation models, surpasses current state of the art in all five evaluated NLP tasks and reaches state-of-the-art results in four of them. The RobeCzech model is released publicly at https://hdl.handle.net/11234/1-3691 and https://huggingface.co/ufal/robeczech-base.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| semantic-parsing-on-ptg-czech-mrp-2020 | PERIN + RobeCzech | F1: 92.36 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.