MT-VAE: Learning Motion Transformations to Generate Multimodal Human
Dynamics
MT-VAE: Learning Motion Transformations to Generate Multimodal Human Dynamics
Xinchen Yan Akash Rastogi Ruben Villegas Kalyan Sunkavalli Eli Shechtman Sunil Hadap Ersin Yumer Honglak Lee

Abstract
Long-term human motion can be represented as a series of motionmodes---motion sequences that capture short-term temporal dynamics---withtransitions between them. We leverage this structure and present a novel MotionTransformation Variational Auto-Encoders (MT-VAE) for learning motion sequencegeneration. Our model jointly learns a feature embedding for motion modes (thatthe motion sequence can be reconstructed from) and a feature transformationthat represents the transition of one motion mode to the next motion mode. Ourmodel is able to generate multiple diverse and plausible motion sequences inthe future from the same input. We apply our approach to both facial and fullbody motion, and demonstrate applications like analogy-based motion transferand video synthesis.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| human-pose-forecasting-on-human36m | MT-VAE | ADE: 457 APD: 403 FDE: 595 MMADE: 716 MMFDE: 883 |
| human-pose-forecasting-on-humaneva-i | MT-VAE | ADE@2000ms: 345 APD@2000ms: 21 FDE@2000ms: 403 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.