APM Protein Generation Dataset
Date
Size
Publish URL
Paper URL
License
Other
Tags
This dataset is a protein generation dataset released in 2025 by Hunan University, University of Chinese Academy of Sciences, and ByteDance Seed Team. The related paper results are "An All-Atom Generative Model for Designing Protein Complexes".
Dataset composition
- Single-chain protein dataset: contains 187,494 samples, covering a variety of protein types and functions, from PDB (18,684), Swiss-Prot (140,769), AFDB (28,041) databases.
- Multi-chain protein dataset: contains 11,620 samples, covering 2-6 chain protein complexes, supporting multi-chain modeling. The data is derived from PDB biological assembly data, excluding 3 types of samples: samples in the SAbDab antibody database, samples containing chains less than 30 in length (considered as peptides), samples with a length greater than 2,048 or lacking cluster IDs. The researchers randomly trimmed the multi-chain samples during training: samples with more than 384 residues were centered on the interchain binding interface residue pairs, retaining the nearest 384 amino acids.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.