Gagan Mundada Yash Vishe Amit Namburi Xin Xu Zachary Novack Julian McAuley Junda Wu

Abstract
Recent advances in Multimodal Large Language Models (MLLMs) have demonstratedimpressive capabilities across various vision-language tasks. However, theirreasoning abilities in the multimodal symbolic music domain remain largelyunexplored. We introduce WildScore, the first in-the-wild multimodal symbolicmusic reasoning and analysis benchmark, designed to evaluate MLLMs' capacity tointerpret real-world music scores and answer complex musicological queries.Each instance in WildScore is sourced from genuine musical compositions andaccompanied by authentic user-generated questions and discussions, capturingthe intricacies of practical music analysis. To facilitate systematicevaluation, we propose a systematic taxonomy, comprising both high-level andfine-grained musicological ontologies. Furthermore, we frame complex musicreasoning as multiple-choice question answering, enabling controlled andscalable assessment of MLLMs' symbolic music understanding. Empiricalbenchmarking of state-of-the-art MLLMs on WildScore reveals intriguing patternsin their visual-symbolic reasoning, uncovering both promising directions andpersistent challenges for MLLMs in symbolic music reasoning and analysis. Werelease the dataset and code.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.