Speaker Change Detection

Implementation of the paper: https://arxiv.org/abs/1702.02285

The mechanism proposed here is for real-time speaker change detection in conversations, which firstly trains a neural network text-independent speaker classifier using indomain speaker data.

The accuracy is very high and close to 100%, as reported in the paper.

Get Started

Because it takes a very long time to generate cache and inputs, I packaged them and uploaded them here:

Cache uploaded at cache-speaker-change-detection.zip (unzip it in /tmp/)
speaker-change-detection-data.pkl (place it in /tmp/)
speaker-change-detection-norm.pkl (place it in /tmp/)

You should have this:

/tmp/speaker-change-detection-data.pkl
/tmp/speaker-change-detection-norm.pkl
/tmp/speaker-change-detection/*.pkl

The final plots are generated as /tmp/distance_test_ID.png where ID is the id of the plot.

Be careful you have enough space in /tmp/ because you might run out of disk space there. If it's the case, you can modify all the /tmp/ references inside the codebase to any folder of your choice.

Now run those commands to reproduce the results.

git clone [email protected]:philipperemy/speaker-change-detection.git
cd speaker-change-detection
virtualenv -p python3.6 venv # probably will work on every python3 impl.
source venv/bin/activate
pip install -r requirements.txt
# download the cache and all the files specified above (you can re-generate them yourself if you wish).
cd ml/
export PYTHONPATH=..:$PYTHONPATH; python 1_generate_inputs.py
export PYTHONPATH=..:$PYTHONPATH; python 2_train_classifier.py
export PYTHONPATH=..:$PYTHONPATH; python 3_train_distance_classifier.py

To regenerate only the VCTK cache, run:

cd audio/
export PYTHONPATH=..:$PYTHONPATH; python generate_all_cache.py

Contributions

Contributions are welcome! Some ways to improve this project:

Given any audio file, is it possible to test it and detect any speaker change?

Questions

Given any audio file, is it possible to test it and detect any speaker change? Yes, as long as it follows the same structure as the VCTK Corpus dataset.
Is there any way to test the trained model to detect speaker changes of our audio files? Yeah it's possible but it's going to be a bit difficult. I guess you have to choose a dataset and converts it to VCTK format.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
audio		audio
helpers		helpers
misc		misc
ml		ml
.gitignore		.gitignore
README.md		README.md
conf.json		conf.json
constants.py		constants.py
download_vctk.sh		download_vctk.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

audio

audio

helpers

helpers

misc

misc

ml

ml

.gitignore

.gitignore

README.md

README.md

conf.json

conf.json

constants.py

constants.py

download_vctk.sh

download_vctk.sh

requirements.txt

requirements.txt

Repository files navigation

Speaker Change Detection

Get Started

Contributions

Questions

About

Releases

Packages

Languages

philipperemy/speaker-change-detection

Folders and files

Latest commit

History

Repository files navigation

Speaker Change Detection

Get Started

Contributions

Questions

About

Topics

Resources

Stars

Watchers

Forks

Languages