TreeSynth Is a Synthetic Data Method Based on tree-guided subspaces.
TreeSynth was jointly proposed by a research team from the University of Hong Kong and the Chinese University of Hong Kong in March 2025, and the relevant research results were published in the paper "TreeSynth: Synthesizing Diverse Data from Scratch via Tree-Guided Subspace Partitioning".
TreeSynth is a tree-guided subspace-based synthetic data method inspired by decision trees. It constructs a spatial partitioning tree to recursively divide the complete data space (root node) for a specific task into multiple atomic subspaces (leaf nodes). These subspaces are mutually exclusive and exhaustive, ensuring both uniqueness and comprehensiveness before synthesizing samples within each atomic subspace. Extensive experiments on various benchmarks consistently demonstrate that TreeSynth outperforms manually constructed datasets and similar data synthesis methods in terms of data diversity, model performance, and robust scalability, achieving an average performance improvement of 10¹TP³T.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.