Date

2 years ago

Size

4.21 GB

Organization

Paper URL

arxiv.org

Tags

Natural Language Processing

This dataset is a synthetic dataset of short stories generated by GPT-3.5 and GPT-4, containing a vocabulary limited to the range of 3 to 4-year-old children's understanding. It is designed for training and evaluating small language models (LMs), and despite the small size of the model (less than 5 million parameters) or simpler architecture (only one transformer block), the model trained using this dataset is still able to produce fluent, consistent, diverse and grammatically perfect short stories. The TinyStories dataset was proposed by Microsoft Research in 2023, and the relevant paper is “TinyStories: How Small Can Language Models Be and Still Speak Coherent English?"

TinyStories.torrent

Seeding 2Downloading 0Completed 200Total Downloads 459

TinyStories/
- README.md
  1.36 KB
- README.txt
  2.72 KB

This dataset is contributed by community users and is intended for educational and informational purposes only. If any content involves copyright infringement, please contact us at [email protected] for prompt review and removal.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

Use this Dataset

Discuss on Discord

Date

2 years ago

Size

4.21 GB

Organization

Paper URL

arxiv.org

Related Datasets

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

TinyStories Short Story Synthesis Dataset

Build AI with AI

HyperAI Newsletters

Command Palette

TinyStories Short Story Synthesis Dataset

Related Datasets

COCO-2017-Vietnamese Vietnamese Image Detection Dataset

TxT360-3efforts Multi-Task Inference Dataset

LongBench-Pro Long Context Comprehensive Evaluation Dataset

Build AI with AI

HyperAI Newsletters

Command Palette

TinyStories Short Story Synthesis Dataset

Related Datasets

COCO-2017-Vietnamese Vietnamese Image Detection Dataset

TxT360-3efforts Multi-Task Inference Dataset

LongBench-Pro Long Context Comprehensive Evaluation Dataset

Build AI with AI

HyperAI Newsletters

Related Datasets

COCO-2017-Vietnamese Vietnamese Image Detection Dataset

TxT360-3efforts Multi-Task Inference Dataset

LongBench-Pro Long Context Comprehensive Evaluation Dataset

Related Datasets

COCO-2017-Vietnamese Vietnamese Image Detection Dataset

TxT360-3efforts Multi-Task Inference Dataset

LongBench-Pro Long Context Comprehensive Evaluation Dataset