Command Palette
Search for a command to run...
MemLens Multimodal Long Context Benchmark Dataset
MemLens is a benchmark dataset for evaluating long-range dialogue memory in visual language models. It is designed to test the model’s ability to retrieve, recall, update, and infer visual and textual information embedded in multi-conversation dialogues within context windows of 32K, 64K, 128K, and 256K. This dataset contains 789 questions, covering five evaluation types: information extraction, knowledge updating, temporal reasoning, multi-conversation reasoning, and rejection (Abstention), and provides four context length configurations (32K / 64K / 128K / 256K). An additional fixed-level stratified subset of 195 questions is provided specifically for evaluating memory-augmented agents to balance inference costs.
Citation
@inproceedings{ren2026memlens,
title={{MemLens}: Benchmarking Multimodal Long-Context Conversational Memory in Vision-Language Models},
author={Ren, Xiyu and Wang, Zhaowei and Du, Yiming and Xie, Zhongwei and Liu, Chi and Yang, Xinlin and Feng, Haoyue and Pan, Wenjun and Zheng, Tianshi and Xu, Baixuan and Li, Zhengnan and Song, Yangqiu and Wong, Ginny and See, Simon},
booktitle={Advances in Neural Information Processing Systems (NeurIPS), Datasets and Benchmarks Track},
year={2026}
}
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.