Command Palette
Search for a command to run...
Hugo Touvron* Louis Martin† Kevin Stone† Peter Albert Amjad Almahairi Yasmine Babaei Nikolay Bashlykov Soumya Batra Prajjwal Bhargava Shruti Bhosale Dan Bikel Lukas Blecher Cristian Canton Ferrer Moya Chen Guillem Cucurull David Esiobu Jude Fernandes Jeremy Fu Wenying Fu Brian Fuller Cynthia Gao Vedanuj Goswami Naman Goyal Anthony Hartshorn Saghar Hosseini Rui Hou Hakan Inan Marcin Kardas Viktor Kerkez Madian Khabsa Isabel Kloumann Artem Korenev Punit Singh Koura Marie-Anne Lachaux Thibaut Lavril Jenya Lee Diana Liskovich Yinghai Lu Yuning Mao Xavier Martinet Todor Mihaylov Pushkar Mishra Igor Molybog Yixin Nie Andrew Poulton Jeremy Reizenstein Rashi Rungta Kalyan Saladi Alan Schelten Ruan Silva Eric Michael Smith Ranjan Subramanian Xiaoqing Ellen Tan Binh Tang Ross Taylor Adina Williams Jian Xiang Kuan Puxin Xu Zheng Yan Iliyan Zarov Yuchen Zhang Angela Fan Melanie Kambadur Sharan Narang Aurelien Rodriguez Robert Stojnic Sergey Edunov Thomas Scialom*

摘要
在本研究中,我们开发并发布了Llama 2,这是一系列预训练和微调的大规模语言模型(LLMs),参数规模从70亿到700亿不等。我们的微调模型称为Llama 2-Chat,专门针对对话应用场景进行了优化。在我们测试的大多数基准上,这些模型的表现优于开源聊天模型,并且根据我们在有用性和安全性方面的人类评估结果,它们可能成为闭源模型的合适替代品。我们详细描述了对Llama 2-Chat进行微调和安全改进的方法,以帮助社区在此基础上进一步发展,并促进大规模语言模型(LLMs)负责任的研发。
代码仓库
xverse-ai/xverse-13b
pytorch
GitHub 中提及
coastalcph/eu-politics-llms
pytorch
GitHub 中提及
facebookresearch/llama
官方
pytorch
IBM/Dromedary
pytorch
GitHub 中提及
squeezeailab/squeezellm
pytorch
GitHub 中提及
zurichnlp/contradecode
pytorch
GitHub 中提及
eternityyw/tram-benchmark
GitHub 中提及
xuetianci/pacit
pytorch
GitHub 中提及
young-geng/easylm
jax
GitHub 中提及
meetyou-ai-lab/can-mc-evaluate-llms
pytorch
GitHub 中提及
llamafamily/llama-chinese
pytorch
GitHub 中提及
glb400/Toy-RecLM
pytorch
GitHub 中提及
rijgersberg/geitje
pytorch
GitHub 中提及
flagalpha/llama2-chinese
pytorch
GitHub 中提及
usyd-fsalab/fp6_llm
pytorch
GitHub 中提及
idiap/abroad-re
pytorch
GitHub 中提及
ninglab/ecellm
pytorch
GitHub 中提及
Lightning-AI/lit-gpt
pytorch
GitHub 中提及
xzhang97666/alpacare
GitHub 中提及
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| arithmetic-reasoning-on-gsm8k | LLaMA 2 70B (on-shot) | Accuracy: 56.8 Parameters (Billion): 70 |
| code-generation-on-mbpp | Llama 2 34B (0-shot) | Accuracy: 33 |
| code-generation-on-mbpp | Llama 2 7B (0-shot) | Accuracy: 20.8 |
| code-generation-on-mbpp | Llama 2 70B (zero-shot) | Accuracy: 45 |
| code-generation-on-mbpp | Llama 2 13B (0-shot) | Accuracy: 30.6 |
| math-word-problem-solving-on-mawps | LLaMA 2-Chat | Accuracy (%): 82.4 |
| math-word-problem-solving-on-svamp | LLaMA 2-Chat | Execution Accuracy: 69.2 |
| multi-task-language-understanding-on-mmlu | LLaMA 2 13B (5-shot) | Average (%): 54.8 |
| multi-task-language-understanding-on-mmlu | LLaMA 2 34B (5-shot) | Average (%): 62.6 |
| multi-task-language-understanding-on-mmlu | LLaMA 2 7B (5-shot) | Average (%): 45.3 |
| multiple-choice-question-answering-mcqa-on-25 | Llama2-7B | Accuracy: 43.38 |
| multiple-choice-question-answering-mcqa-on-25 | Llama2-7B-chat | Accuracy: 40.07 |
| question-answering-on-boolq | LLaMA 2 13B (0-shot) | Accuracy: 81.7 |
| question-answering-on-boolq | LLaMA 2 34B (0-shot) | Accuracy: 83.7 |
| question-answering-on-boolq | LLaMA 2 7B (zero-shot) | Accuracy: 77.4 |
| question-answering-on-boolq | LLaMA 2 70B (0-shot) | Accuracy: 85 |
| question-answering-on-multitq | LLaMA2 | Hits@1: 18.5 |
| question-answering-on-natural-questions | LLaMA 2 70B (one-shot) | EM: 33.0 |
| question-answering-on-piqa | LLaMA 2 13B (0-shot) | Accuracy: 80.5 |
| question-answering-on-piqa | LLaMA 2 34B (0-shot) | Accuracy: 81.9 |
| question-answering-on-piqa | LLaMA 2 7B (0-shot) | Accuracy: 78.8 |
| question-answering-on-piqa | LLaMA 2 70B (0-shot) | Accuracy: 82.8 |
| question-answering-on-pubchemqa | Llama2-7B-chat | BLEU-2: 0.075 BLEU-4: 0.009 MEATOR: 0.149 ROUGE-1: 0.184 ROUGE-2: 0.043 ROUGE-L: 0.142 |
| question-answering-on-triviaqa | LLaMA 2 70B (one-shot) | EM: 85 |
| question-answering-on-uniprotqa | Llama2-7B-chat | BLEU-2: 0.019 BLEU-4: 0.002 MEATOR: 0.052 ROUGE-1: 0.103 ROUGE-2: 0.060 ROUGE-L: 0.009 |