Skip to content

Latest commit

 

History

History
10 lines (8 loc) · 792 Bytes

README.md

File metadata and controls

10 lines (8 loc) · 792 Bytes

Speech-Emotion-Recognition

An emotion classifier built using standard audio data processing and deep learning algorithms. Here, we have 4 different datasets with a 12,000+ audio files and a plethora of voice actors to generalize the model and avoid overfitting over a certain accent. Due to the sheer complexity of SER (Speech Emotion Recognition), the accuracy will be 60-70% only. However. we've tried to give you a brief comparison of various decisions over the accuracy.

Datasets used:

  1. Crowd-sourced Emotional Mutimodal Actors Dataset (Crema-D)
  2. Ryerson Audio-Visual Database of Emotional Speech and Song (Ravdess)
  3. Surrey Audio-Visual Expressed Emotion (Savee)
  4. Toronto emotional speech set (Tess)

Algorithm used: Sequential with 1D convolution layer (Conv1D) & Maxpooling