DocBank Text Dataset
Date
Size
Publish URL
Paper URL

DocBank is a text dataset. The dataset contains 500,000 document pages with fine-grained, term-level annotations for document layout analysis. The dataset is constructed in a simple and effective way with weak supervision from \LaTeX{} documents available on arXiv.com.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.