Command Palette

Search for a command to run...

Multi Task Language Understanding On Bbh Nlp

评估指标

Average (%)

评测结果

各个模型在此基准测试上的表现结果

Paper Title
Qwen2.5-72B86.3-
Jiutian-大模型86.1-
LLama-3-405B85.9-
Jiutian-57B84.07-
Qwen2-72B82.4-
LLama-3-70B81.0-
Flan-PaLM 540B (3-shot, fine-tuned, CoT + SC)78.4Scaling Instruction-Finetuned Language Models
PaLM 540B (CoT + self-consistency)78.2Scaling Instruction-Finetuned Language Models
code-davinci-002 175B (CoT)73.5Evaluating Large Language Models Trained on Code
Flan-PaLM 540B (3-shot, fine-tuned, CoT)72.4Scaling Instruction-Finetuned Language Models
PaLM 540B (CoT)71.2Scaling Instruction-Finetuned Language Models
Flan-PaLM 540B (5-shot, finetuned)70.0Scaling Instruction-Finetuned Language Models
PaLM 540B62.7Scaling Instruction-Finetuned Language Models
Orca 2-13B50.18Orca 2: Teaching Small Language Models How to Reason
Orca 2-7B45.93Orca 2: Teaching Small Language Models How to Reason
0 of 15 row(s) selected.
Multi Task Language Understanding On Bbh Nlp | SOTA | HyperAI超神经