Skip to content

ranchlai/awesome-speaker-embedding

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 

Repository files navigation

awesome-speaker-embedding

A curated list of speaker embedding/verification resources

Must-read papers

Benchmarks (not very accurate)

Results reported (by the authors) on Voxceleb1, VoxCeleb1-E and VoxCeleb1-H.

Voxceleb1 public results (continuously updating...)

Name feature,model,activation/loss VoxCeleb1 VoxCeleb1-E VoxCeleb1-H Link Affiliation Year
X205 DPN68,Res2Net50 0.7712% 0.8968% 1.637% report AISpeech 2020
Veridas ResNet152 1.08% - - report das-nano 2020
DKU-DukeECE Resnet,ECAPA-TDNN 0.888% 1.133% 2.008% report Duke University 2020
IDLAB Resnet,ECAPA-TDNN - - - report Ghent University - 2020
speechbrain ECAPA-TDNN 0.69% - - link - 2021

Must-read technical reports

VOXSRC 2019 reports

Datasets

Commonly-used speaker datasets:

  • TIMIT: A small dataset for speaker and asr, non-free
  • Free ST: Mandarin speech corpus for speaker and asr, free
  • NIST SRE NIST Speaker Recognition Evaluation, non-free
  • AIShell-1: Mandarin speech corpus, divided into train/dev/test, free.
  • AIShell-2: free for education, non-free for commercial
  • AIShell-3: free, for speaker, asr and tts
  • AIShell-4, will be released soon
  • HI-MIA: free, for far-field text-dependent speaker verification and keyword spotting
  • SITW Speakers in the Wild,
  • Voxceleb 1&2, Celebrity interview video/audio extracted from Youtube
  • Cn-Celeb 1&2, Multi-genres speaker dataset in the wild, utterances are from chinese celebrities.

Challenges

Great Talks / Tutorials

Code/Tools/Frameworks/Libraries

  • VGGVox The first baseline system for voxceleb dataset, originally implementated in Matlab.
  • DeepSpeaker An End-to-End Neural Speaker Embedding System.
  • SincNet, also in speechbrain
  • 3D CNN TensorFlow implementation of 3D Convolutional Neural Networks for Speaker Verification
  • GE2E, implementation is also in tensorlow
  • asv-subtools An Open Source Tools based on Pytorch and Kaldi for speaker recognition/language identification, XMU Speech Lab.
  • Resemblyzer, high-level representation of a voice through a deep learning model (referred to as the voice encoder).
  • voxceleb audio-visual dataset consisting of short clips of human speech, extracted from interview videos uploaded to YouTube
  • Triplet-loss Triplet Loss and Online Triplet Mining in TensorFlow.
  • Res2Net The Res2net architecture used commonly in VoxCeleb speaker recognition challenge.
  • voxceleb_trainer A very good speaker framework written in pytorch with pretrained models.
  • Speechbrain Voxceleb recipe.
  • kaldi Kaldi recipe for voxceleb.
  • pytorch_xvectors pytorch implementation of x-vectors.

More-recent papers

  • Attention Back-end, Compare PLDA and cosine with proposed attention Back-end, model: TDNN, Resnet, data: cn-celeb

Wining solutions of Challenges

VoxSRC2019

  • Rank 1: FBank, "r-vectors" using resnet, AAM loss. From Brno University of Technolog, REPORT
  • Rank 2: 80-dim FBank features, E-TDNN/F-TDNN models, various classification loss including softmax/AM-softmax/PLDA-softmax. From Johns Hopkins University, REPORT
  • Rank 3: FBank, resnet + attentive pooling + Phonetic attention, BLSTM + ResNET, loss unclear(?). From Microsoft, REPORT

VoxSRC2020

  • Rank 1: 60-dim log-FBank, ECAPA-TDNN/SE-ResNet34, S-Norm, AAM-Softmax. From IDLab, REPORT
  • Rank 2: 40-dim FBank/mean-normalized, no VAD, resnet/Res2Net, S-Norm, CM-Softmax. From AI Speech, REPORT, kaldi recipe for data-aug
  • Rank 3: Report not available

Please let me know if your code/repo is not listed here (ranchlai at 163.com)

About

A curated list of speaker-embedding speaker-verification, speaker-identification resources.

Topics

Resources

Stars

Watchers

Forks