{Kenneth Liang Church Renjie Huang Jiahong Zheng Xingyu Yuan Cai}
Abstract
Speech emotion recognition (SER) classifies speech into emotion categories such as: Happy, Angry, Sad and Neutral. Recently , deep learning has been applied to the SER task. This paper proposes a multi-task learning (MTL) framework to simultaneously perform speech-to-text recognition and emotion classification, with an end-to-end deep neural model based on wav2vec-2.0. Experiments on the IEMOCAP benchmark show that the proposed method achieves the state-of-the-art performance on the SER task. In addition, an ablation study establishes the effectiveness of the proposed MTL framework.
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| speech-emotion-recognition-on-iemocap | SER with MTL | F1: - UA CV: 0.7815 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.