WikiLinks Wikipedia Link Dataset
Date
Size
Publish URL
Paper URL
License
CC BY-NC-SA 3.0

WikiLinks is a dataset that searches the full text of Wikipedia by paragraph, phrase, or part of the paragraph itself. The dataset considers each page on Wikipedia as representing an entity (or concept or idea), based on hyperlinks found from web searches, and uses anchor text as mentions, which can eventually provide large-scale labeled data without manual manipulation.
The dataset includes:
- Nearly 1.9 billion words from more than 4 million articles
- 40 million references to 3 million entities
- 10 compressed text files data-0000[0-9]-of-00010.gz.
This dataset was created on September 29, 2012
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.