Skip to content

This project performs speech recognition and diarization (speaker identification) on recordings of conversations. This is followed by sentiment analysis the transcription of each individual.

Notifications You must be signed in to change notification settings

kensonhui/Speaker-Diarization-Sentiment-Analysis

Repository files navigation

Speaker Diraization and Sentiment Analysis

This project was built by Kenson Hui, Rishik Raj, Teresa Tien, Nicolas Avramidis and Naveed Khan as part of the WSIB Hackathon on November 23, 2023 in 4 hours.

The goal of this project is to speed up the process of reviewing call quality of customer service representives and companies. The current method has been to manually listen into calls, and evaluate the calls based on different criterias. Our proposed method is to transcribe the calls and perform diarization (the process of recognizating who is speaking at any given time), then performing sentiment analysis on each sentence spoken to understand the emotions the customer is feeling, and the tone of the customer representatives.

Running our project on the A100 with OpenAi Whisper large-v3 performs at a real-time factor of 0.5.

Set up:

Ensure you are in a Conda environment with Python>-3.8, and the appropriate GPU drivers are installed.

You should also be sure to ad your HuggingFace Token as a HF_TOKEN environment variable.

Download the project packages: pip install -r requirements.txt

We'll use the pre-trained diraization model from pyannote.audio. pip install --upgrade pyannote.audio

We use Speechbox to find the minimal alignment between the transcription generated by Whisper and the diraization produced by pyannote.audio. pip install git+https://github.com/huggingface/speechbox

You can now run a GUI demo with: python gradio_script.py

After obtaining the JSON representation of the conversation, you can proceed to perform sentiment analysis through the sentiment_analysis.ipynb notebook.

Next Steps:

If we had more time, we would add the additional improvments:

  • Sentiment analysis not only with text, but on the original audio. This would significantly improve the accuracy of the sentiment analysis - as currently only the transcribed text is used to determine emotion. However, gaining the full modality of audio would unlock the nuances of the dynamics of speech which are necessary in conveying emotion.

Credits:

References:

About

This project performs speech recognition and diarization (speaker identification) on recordings of conversations. This is followed by sentiment analysis the transcription of each individual.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published