CCMT 2019-BSTC Speech Translation Corpus
Date
Size
Publish URL

BSTC stands for Baidu Speech Translation Corpus, which is a large-scale automatic simultaneous interpretation dataset used for the construction of automatic simultaneous interpretation systems.
The corpus is divided into three subsets: training set, development set and test set. Each subset includes:
-Sound signal file, named baidu_XX.wav
- Description file, including description information of each sound signal, each sentence is encoded in JSON format
-Supplementary documentation, including detailed descriptions of speeches and reports
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.