HyperAIHyperAI

Command Palette

Search for a command to run...

LongBlocks Long Context Multilingual Question Answering Dataset

LongBlocks is a long-context multilingual synthesis dataset released in 2026 by the University of Lisbon, the Instituto de Telecomunicações, TransPerfect, and other institutions. This dataset contains approximately 194,000 long-context question-and-answer examples, covering long document corpora such as books, web page text, Wikipedia, arXiv papers, programming code, and community Q&A.

Data Fields:

  • id: String, a unique instance identifier (only used to recover restricted book data; null for other sources).
  • document: String, long source document content (null for limited book data).
  • source: String, the name of the source corpus.
  • language: A string representing the language or programming language of the example.
  • Question: String composition, long context problem.
  • answer: String, a reference answer that has been filtered for authenticity.
  • response_Qwen3-Next-80B-A3B / response_Qwen3.5-27B / response_Nemotron-3-Nano-30B-A3B: Strings corresponding to the generated responses of the teacher model.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing

HyperAI Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp