Command Palette
Search for a command to run...
chi-bench Medical Intelligent Agent Benchmark Evaluation Dataset
chi-bench (Clinical Healthcare Intelligence Benchmark) is a dataset for evaluating healthcare intelligence agents released by Actava AI in 2026. Related research papers include... CHI-Bench: Can AI Agents Automate End-to-End, Long-Horizon, Policy-Rich Healthcare Workflows? This dataset aims to evaluate the AI Agent’s planning, reasoning, tool invocation, and cross-system collaboration capabilities in an end-to-end US healthcare workflow. This dataset constructs a high-fidelity medical business simulation environment, integrating 20 medical application systems through the open interface of MCP (Model Context Protocol) and providing a knowledge base containing 1,279 medical operation documents. The evaluation scenarios cover three major areas in the US healthcare system: Prior Authorization, Citation Management, and Population Care Management. It includes 101 evaluation tasks, including 75 basic tasks, 23 end-to-end two-agent tasks, and 3 long-range Marathon tasks. It can be used for research and evaluation in areas such as large-scale medical models, medical agents, multi-agent collaboration, and medical process automation.
Citation
@misc{chen2026chibenchaiagentsautomate,
title={CHI-Bench: Can AI Agents Automate End-to-End, Long-Horizon, Policy-Rich Healthcare Workflows?},
author={Haolin Chen and Deon Metelski and Leon Qi and Tao Xia and Joonyul Lee and Steve Brown and Kevin Riley and Frank Wang and T. Y. Alvin Liu and Hank Capps MD and Zeyu Tang and Xiangchen Song and Lingjing Kong and Fan Feng and Tianyi Zeng and Zhiwei Liu and Zixian Ma and Hang Jiang and Fangli Geng and Yuan Yuan and Chenyu You and Qingsong Wen and Hua Wei and Yanjie Fu and Yue Zhao and Carl Yang and Biwei Huang and Kun Zhang and Caiming Xiong and Sanmi Koyejo and Eric P. Xing and Philip S. Yu and Weiran Yao},
year={2026},
eprint={2605.16679},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2605.16679},
}
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.