Command Palette
Search for a command to run...
MDPBench Multilingual Document Parsing Benchmark Dataset
Date
Paper URL
License
Apache 2.0
MDPBench is a benchmark dataset for parsing multilingual digital and photographic documents; related research papers include... MDPBench: A Benchmark for Multilingual Document Parsing in Real-World ScenariosThe aim is to evaluate and improve the model's ability to parse multilingual documents in real-world, complex scenarios. The dataset contains 3,400 document images covering 17 languages, including Simplified Chinese, Traditional Chinese, English, Arabic, German, Spanish, French, Hindi, Indonesian, Italian, Japanese, Korean, Portuguese, Russian, Thai, and Vietnamese. The images underwent a rigorous process of expert model annotation, manual correction, and manual verification to achieve high-quality annotations.

Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.