Hervé Bredin Ruiqing Yin Juan Manuel Coria Gregory Gelly Pavel Korshunov Marvin Lavechin Diego Fustes Hadrien Titeux Wassim Bouaziz Marie-Philippe Gill

Abstract
We introduce pyannote.audio, an open-source toolkit written in Python for speaker diarization. Based on PyTorch machine learning framework, it provides a set of trainable end-to-end neural building blocks that can be combined and jointly optimized to build speaker diarization pipelines. pyannote.audio also comes with pre-trained models covering a wide range of domains for voice activity detection, speaker change detection, overlapped speech detection, and speaker embedding -- reaching state-of-the-art performance for most of them.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| speaker-diarization-on-ami | pyannote (MFCC) | DER(%): 6.3 FA: 3.5 Miss: 2.7 |
| speaker-diarization-on-ami | pyannote (waveform) | DER(%): 6.0 FA: 3.6 Miss: 2.4 |
| speaker-diarization-on-dihard-1 | pyannote (MFCC) | DER(%): 10.5 FA: 6.8 Miss: 3.7 |
| speaker-diarization-on-dihard-1 | Baseline (the best result in the literature as of Oct.2019) | DER(%): 11.2 FA: 6.5 Miss: 4.7 |
| speaker-diarization-on-dihard-1 | pyannote (waveform) | DER(%): 9.9 FA: 5.7 Miss: 4.2 |
| speaker-diarization-on-etape | Baseline | DER(%): 7.7 FA: 7.5 Miss: 0.2 |
| speaker-diarization-on-etape | pyannote (MFCC) | DER(%): 5.6 FA: 5.2 Miss: 0.4 |
| speaker-diarization-on-etape | pyannote (waveform) | DER(%): 4.9 FA: 4.2 Miss: 0.7 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.