Speech-Emotion-Recognition

An emotion classifier built using standard audio data processing and deep learning algorithms. Here, we have 4 different datasets with a 12,000+ audio files and a plethora of voice actors to generalize the model and avoid overfitting over a certain accent. Due to the sheer complexity of SER (Speech Emotion Recognition), the accuracy will be 60-70% only. However. we've tried to give you a brief comparison of various decisions over the accuracy.

Datasets used:

Crowd-sourced Emotional Mutimodal Actors Dataset (Crema-D)
Ryerson Audio-Visual Database of Emotional Speech and Song (Ravdess)
Surrey Audio-Visual Expressed Emotion (Savee)
Toronto emotional speech set (Tess)

Algorithm used: Sequential with 1D convolution layer (Conv1D) & Maxpooling

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Speech-Emotion-Recognition

Files

README.md

Latest commit

History

README.md

File metadata and controls

Speech-Emotion-Recognition