HalluQA Chinese Large Model Hallucination Evaluation Dataset

This repository contains data and evaluation scripts for the HalluQA (Chinese Halluated Question Answering) benchmark. The full data for HalluQA is in HalluQA.json. The paper introducing HalluQA and detailed experimental results on several large Chinese language models are inhereHalluQA contains 450 carefully designed adversarial questions that span multiple domains and take into account Chinese historical culture, customs, and social phenomena.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.