Command Palette

Search for a command to run...

6 个月前

Llama 2:开放的基础模型和微调的聊天模型

Hugo Touvron* Louis Martin† Kevin Stone† Peter Albert Amjad Almahairi Yasmine Babaei Nikolay Bashlykov Soumya Batra Prajjwal Bhargava Shruti Bhosale Dan Bikel Lukas Blecher Cristian Canton Ferrer Moya Chen Guillem Cucurull David Esiobu Jude Fernandes Jeremy Fu Wenying Fu Brian Fuller Cynthia Gao Vedanuj Goswami Naman Goyal Anthony Hartshorn Saghar Hosseini Rui Hou Hakan Inan Marcin Kardas Viktor Kerkez Madian Khabsa Isabel Kloumann Artem Korenev Punit Singh Koura Marie-Anne Lachaux Thibaut Lavril Jenya Lee Diana Liskovich Yinghai Lu Yuning Mao Xavier Martinet Todor Mihaylov Pushkar Mishra Igor Molybog Yixin Nie Andrew Poulton Jeremy Reizenstein Rashi Rungta Kalyan Saladi Alan Schelten Ruan Silva Eric Michael Smith Ranjan Subramanian Xiaoqing Ellen Tan Binh Tang Ross Taylor Adina Williams Jian Xiang Kuan Puxin Xu Zheng Yan Iliyan Zarov Yuchen Zhang Angela Fan Melanie Kambadur Sharan Narang Aurelien Rodriguez Robert Stojnic Sergey Edunov Thomas Scialom*

Llama 2:开放的基础模型和微调的聊天模型

摘要

在本研究中,我们开发并发布了Llama 2,这是一系列预训练和微调的大规模语言模型(LLMs),参数规模从70亿到700亿不等。我们的微调模型称为Llama 2-Chat,专门针对对话应用场景进行了优化。在我们测试的大多数基准上,这些模型的表现优于开源聊天模型,并且根据我们在有用性和安全性方面的人类评估结果,它们可能成为闭源模型的合适替代品。我们详细描述了对Llama 2-Chat进行微调和安全改进的方法,以帮助社区在此基础上进一步发展,并促进大规模语言模型(LLMs)负责任的研发。

代码仓库

xverse-ai/xverse-13b
pytorch
GitHub 中提及
coastalcph/eu-politics-llms
pytorch
GitHub 中提及
IBM/Dromedary
pytorch
GitHub 中提及
squeezeailab/squeezellm
pytorch
GitHub 中提及
zurichnlp/contradecode
pytorch
GitHub 中提及
xuetianci/pacit
pytorch
GitHub 中提及
young-geng/easylm
jax
GitHub 中提及
llamafamily/llama-chinese
pytorch
GitHub 中提及
glb400/Toy-RecLM
pytorch
GitHub 中提及
rijgersberg/geitje
pytorch
GitHub 中提及
flagalpha/llama2-chinese
pytorch
GitHub 中提及
usyd-fsalab/fp6_llm
pytorch
GitHub 中提及
idiap/abroad-re
pytorch
GitHub 中提及
ninglab/ecellm
pytorch
GitHub 中提及
Lightning-AI/lit-gpt
pytorch
GitHub 中提及
xzhang97666/alpacare
GitHub 中提及

基准测试

基准方法指标
arithmetic-reasoning-on-gsm8kLLaMA 2 70B (on-shot)
Accuracy: 56.8
Parameters (Billion): 70
code-generation-on-mbppLlama 2 34B (0-shot)
Accuracy: 33
code-generation-on-mbppLlama 2 7B (0-shot)
Accuracy: 20.8
code-generation-on-mbppLlama 2 70B (zero-shot)
Accuracy: 45
code-generation-on-mbppLlama 2 13B (0-shot)
Accuracy: 30.6
math-word-problem-solving-on-mawpsLLaMA 2-Chat
Accuracy (%): 82.4
math-word-problem-solving-on-svampLLaMA 2-Chat
Execution Accuracy: 69.2
multi-task-language-understanding-on-mmluLLaMA 2 13B (5-shot)
Average (%): 54.8
multi-task-language-understanding-on-mmluLLaMA 2 34B (5-shot)
Average (%): 62.6
multi-task-language-understanding-on-mmluLLaMA 2 7B (5-shot)
Average (%): 45.3
multiple-choice-question-answering-mcqa-on-25Llama2-7B
Accuracy: 43.38
multiple-choice-question-answering-mcqa-on-25Llama2-7B-chat
Accuracy: 40.07
question-answering-on-boolqLLaMA 2 13B (0-shot)
Accuracy: 81.7
question-answering-on-boolqLLaMA 2 34B (0-shot)
Accuracy: 83.7
question-answering-on-boolqLLaMA 2 7B (zero-shot)
Accuracy: 77.4
question-answering-on-boolqLLaMA 2 70B (0-shot)
Accuracy: 85
question-answering-on-multitqLLaMA2
Hits@1: 18.5
question-answering-on-natural-questionsLLaMA 2 70B (one-shot)
EM: 33.0
question-answering-on-piqaLLaMA 2 13B (0-shot)
Accuracy: 80.5
question-answering-on-piqaLLaMA 2 34B (0-shot)
Accuracy: 81.9
question-answering-on-piqaLLaMA 2 7B (0-shot)
Accuracy: 78.8
question-answering-on-piqaLLaMA 2 70B (0-shot)
Accuracy: 82.8
question-answering-on-pubchemqaLlama2-7B-chat
BLEU-2: 0.075
BLEU-4: 0.009
MEATOR: 0.149
ROUGE-1: 0.184
ROUGE-2: 0.043
ROUGE-L: 0.142
question-answering-on-triviaqaLLaMA 2 70B (one-shot)
EM: 85
question-answering-on-uniprotqaLlama2-7B-chat
BLEU-2: 0.019
BLEU-4: 0.002
MEATOR: 0.052
ROUGE-1: 0.103
ROUGE-2: 0.060
ROUGE-L: 0.009

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程
即用型 GPU
最优价格
立即开始

Hyper Newsletters

订阅我们的最新资讯
我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新
邮件发送服务由 MailChimp 提供
Llama 2:开放的基础模型和微调的聊天模型 | 论文 | HyperAI超神经