Command Palette
Search for a command to run...

摘要
目前主流的序列转换模型均基于复杂的循环神经网络或卷积神经网络,并采用编码器-解码器结构。表现最优的模型还通过注意力机制将编码器与解码器连接起来。本文提出一种全新的、简单的网络架构——Transformer,该架构仅依赖于注意力机制,完全摒弃了循环结构和卷积操作。在两项机器翻译任务上的实验表明,该模型在翻译质量上优于现有方法,同时具备更强的并行化能力,训练时间显著减少。在WMT 2014英德翻译任务中,我们的模型取得了28.4的BLEU分数,较现有最优结果(包括集成模型)提升了超过2个BLEU分数。在WMT 2014英法翻译任务中,经过在八块GPU上训练3.5天,该模型达到了41.8的单模型BLEU分数,创下新的最优纪录,而训练成本仅为文献中最佳模型的极小部分。此外,我们通过在大规模和小规模训练数据下成功应用于英语句法成分分析任务,证明了Transformer模型在其他任务上也具有良好的泛化能力。
代码仓库
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| abstractive-text-summarization-on-cnn-daily | Transformer | ROUGE-1: 39.50 ROUGE-2: 16.06 ROUGE-L: 36.63 |
| constituency-parsing-on-penn-treebank | Transformer | F1 score: 92.7 |
| coreference-resolution-on-winograd-schema | Subword-level Transformer LM | Accuracy: 54.1 |
| image-guided-story-ending-generation-on-lsmdc | Transformer | BLEU-1: 15.35 BLEU-2: 4.49 BLEU-3: 1.82 BLEU-4: 0.76 CIDEr: 9.32 METEOR: 11.43 ROUGE-L: 19.16 |
| image-guided-story-ending-generation-on-vist | Transformer | BLEU-1: 17.18 BLEU-2: 6.29 BLEU-3: 3.07 BLEU-4: 2.01 CIDEr: 12.75 METEOR: 6.91 ROUGE-L: 18.23 |
| machine-translation-on-iwslt2014-german | Transformer | BLEU score: 34.44 |
| machine-translation-on-iwslt2015-english | Transformer | BLEU score: 28.50 |
| machine-translation-on-wmt2014-english-french | Transformer Big | BLEU score: 41.0 Hardware Burden: 23G Operations per network pass: 2300000000.0G |
| machine-translation-on-wmt2014-english-french | Transformer Base | BLEU score: 38.1 Hardware Burden: 23G Operations per network pass: 330000000.0G |
| machine-translation-on-wmt2014-english-german | Transformer Base | BLEU score: 27.3 Operations per network pass: 330000000.0G |
| machine-translation-on-wmt2014-english-german | Transformer Big | BLEU score: 28.4 Hardware Burden: 871G Operations per network pass: 2300000000.0G |
| multimodal-machine-translation-on-multi30k | Transformer | BLUE (DE-EN): 29.0 |
| natural-language-understanding-on-pdp60 | Subword-level Transformer LM | Accuracy: 58.3 |
| supervised-only-3d-point-cloud-classification | Transformer | GFLOPs: 4.8 Number of params (M): 22.1 Overall Accuracy (PB_T50_RS): 77.24 |
| text-summarization-on-gigaword | Transformer | ROUGE-1: 37.57 ROUGE-2: 18.90 ROUGE-L: 34.69 |