René Haas Leon Derczynski

Abstract
Automatic language identification is a challenging problem. Discriminating between closely related languages is especially difficult. This paper presents a machine learning approach for automatic language identification for the Nordic languages, which often suffer miscategorisation by existing state-of-the-art tools. Concretely we will focus on discrimination between six Nordic languages: Danish, Swedish, Norwegian (Nynorsk), Norwegian (Bokmål), Faroese and Icelandic.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| language-identification-on-nordic-langid | FastText | Accuracy: 0.9711 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.