Command Palette
Search for a command to run...
Gemini Team

摘要
本报告介绍了一种新的多模态模型家族——Gemini,该模型在图像、音频、视频和文本理解方面表现出卓越的能力。Gemini家族包括Ultra、Pro和Nano三种尺寸的模型,适用于从复杂推理任务到设备端内存受限应用场景的各种需求。在广泛的基准测试中,我们最强大的Gemini Ultra模型在32个基准中的30个上取得了最先进的成果——特别是首次在广受研究的考试基准MMLU上达到了人类专家水平,并在我们考察的20个多模态基准测试中均提升了现有技术水平。我们认为,Gemini家族在跨模态推理和语言理解方面的全新能力将支持广泛的应用场景。我们还讨论了通过Gemini、Gemini Advanced、Google AI Studio和Cloud Vertex AI等服务负责任地进行Gemini模型的后期训练和部署的方法。
代码仓库
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| arithmetic-reasoning-on-gsm8k | Gemini Pro (maj1@32) | Accuracy: 86.5 |
| chart-question-answering-on-chartqa | Gemini Ultra | 1:1 Accuracy: 80.8 |
| long-context-understanding-on-mmneedle | Gemini Pro 1.0 | 1 Image, 2*2 Stitching, Exact Accuracy: 29.53 1 Image, 4*4 Stitching, Exact Accuracy: 24.78 1 Image, 8*8 Stitching, Exact Accuracy: 2.11 10 Images, 1*1 Stitching, Exact Accuracy: 16.25 10 Images, 2*2 Stitching, Exact Accuracy: 4.82 10 Images, 4*4 Stitching, Exact Accuracy: 0.4 10 Images, 8*8 Stitching, Exact Accuracy: 0 |
| math-word-problem-solving-on-math | Gemini Pro (4-shot) | Accuracy: 32.6 |
| math-word-problem-solving-on-math | Gemini Ultra (4-shot) | Accuracy: 53.2 |
| temporal-casual-qa-on-next-qa | Gemini Ultra (zero-shot) | WUPS: 29.9 |
| temporal-casual-qa-on-next-qa | Gemini Pro (zero-shot) | WUPS: 28.0 |
| visual-question-answering-on-mm-vet | Gemini 1.0 Pro Vision (gemini-pro-vision) | GPT-4 score: 64.3±0.4 |
| visual-question-answering-on-mm-vet-v2 | Gemini Pro Vision | GPT-4 score: 57.2±0.2 |
| visual-question-answering-vqa-on | Gemini Ultra (pixel only) | ANLS: 80.3 |
| visual-question-answering-vqa-on-ai2d | Gemini Ultra | EM: 79.5 |