HyperAIHyperAI

Command Palette

Search for a command to run...

MemLens Multimodal Long Context Benchmark Dataset

Date

in 4 hours

License

CC BY 4.0

MemLens is a benchmark dataset for evaluating long-range dialogue memory in visual language models. It is designed to test the model’s ability to retrieve, recall, update, and infer visual and textual information embedded in multi-conversation dialogues within context windows of 32K, 64K, 128K, and 256K. This dataset contains 789 questions, covering five evaluation types: information extraction, knowledge updating, temporal reasoning, multi-conversation reasoning, and rejection (Abstention), and provides four context length configurations (32K / 64K / 128K / 256K). An additional fixed-level stratified subset of 195 questions is provided specifically for evaluating memory-augmented agents to balance inference costs.

Citation

@inproceedings{ren2026memlens,
title={{MemLens}: Benchmarking Multimodal Long-Context Conversational Memory in Vision-Language Models},
author={Ren, Xiyu and Wang, Zhaowei and Du, Yiming and Xie, Zhongwei and Liu, Chi and Yang, Xinlin and Feng, Haoyue and Pan, Wenjun and Zheng, Tianshi and Xu, Baixuan and Li, Zhengnan and Song, Yangqiu and Wong, Ginny and See, Simon},
booktitle={Advances in Neural Information Processing Systems (NeurIPS), Datasets and Benchmarks Track},
year={2026}
}

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing

HyperAI Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp