Skip to content

philipperemy/speaker-change-detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Speaker Change Detection

Implementation of the paper: https://arxiv.org/abs/1702.02285

license dep1 dep2

The mechanism proposed here is for real-time speaker change detection in conversations, which firstly trains a neural network text-independent speaker classifier using indomain speaker data.

The accuracy is very high and close to 100%, as reported in the paper.

Get Started

Because it takes a very long time to generate cache and inputs, I packaged them and uploaded them here:

You should have this:

  • /tmp/speaker-change-detection-data.pkl
  • /tmp/speaker-change-detection-norm.pkl
  • /tmp/speaker-change-detection/*.pkl

The final plots are generated as /tmp/distance_test_ID.png where ID is the id of the plot.

Be careful you have enough space in /tmp/ because you might run out of disk space there. If it's the case, you can modify all the /tmp/ references inside the codebase to any folder of your choice.

Now run those commands to reproduce the results.

git clone [email protected]:philipperemy/speaker-change-detection.git
cd speaker-change-detection
virtualenv -p python3.6 venv # probably will work on every python3 impl.
source venv/bin/activate
pip install -r requirements.txt
# download the cache and all the files specified above (you can re-generate them yourself if you wish).
cd ml/
export PYTHONPATH=..:$PYTHONPATH; python 1_generate_inputs.py
export PYTHONPATH=..:$PYTHONPATH; python 2_train_classifier.py
export PYTHONPATH=..:$PYTHONPATH; python 3_train_distance_classifier.py

To regenerate only the VCTK cache, run:

cd audio/
export PYTHONPATH=..:$PYTHONPATH; python generate_all_cache.py

Contributions

Contributions are welcome! Some ways to improve this project:

  • Given any audio file, is it possible to test it and detect any speaker change?

Questions

  • Given any audio file, is it possible to test it and detect any speaker change? Yes, as long as it follows the same structure as the VCTK Corpus dataset.

  • Is there any way to test the trained model to detect speaker changes of our audio files? Yeah it's possible but it's going to be a bit difficult. I guess you have to choose a dataset and converts it to VCTK format.