Chain-of-frames
Chain-of-frames (CoF) was jointly proposed in May 2025 by a team from NYU Abu Dhabi Center, ETH Zurich, and the U.S. Army Research Laboratory. The related research findings were published in a paper titled "..."Chain-of-Frames: Advancing Video Understanding in Multimodal LLMs via Frame-Aware Reasoning".
In the field of large language models, thought chains enable models to handle reasoning problems. Similar to thought chains in LLMs, frame chains enable video models to solve visual problems requiring step-by-step reasoning across time and space. Unlike existing video CoT methods, CoF does not rely on additional networks to select or describe relevant frames. Experiments show that CoF-based models can generate chained reasoning that accurately references keyframes, achieving performance improvements and significantly reducing illusion rates in multiple video understanding benchmarks. The introduction of CoF accelerates the process of video models becoming a unified, general-purpose visual foundation model.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.