A reproduction of Apple's bi-directional LSTM models for language identification in short strings
A reproduction of Apple's bi-directional LSTM models for language identification in short strings
Mads Toftrup Søren Asger Sørensen Manuel R. Ciosici Ira Assent

Abstract
Language Identification is the task of identifying a document's language. For applications like automatic spell checker selection, language identification must use very short strings such as text message fragments. In this work, we reproduce a language identification architecture that Apple briefly sketched in a blog post. We confirm the bi-LSTM model's performance and find that it outperforms current open-source language identifiers. We further find that its language identification mistakes are due to confusion between related languages.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| language-identification-on-opensubtitles | Apple bi-LSTM | Accuracy: 91.37 |
| language-identification-on-universal | Apple bi-LSTM | Accuracy: 86.93 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.