Speech Emotion Recognition

The project aims to develop a classifier that can predict the emotions of a speaker based on an audio file. For this project RAVDESS dataset is used. The RAVDESS contains 24 professional actors (12 female, 12 male), vocalizing two lexically-matched statements in a neutral North American accent. Speech emotions includes calm, happy, sad, angry, fearful, surprise, and disgust expressions. Each expression is produced at two levels of emotional intensity (normal, strong), with an additional neutral expression.

Dataset available at https://www.kaggle.com/datasets/uwrfkaggler/ravdess-emotional-speech-audio

Multi-Layer Perceptron model -

This project is built using a multi-layer Perceptron (MLP) model, which is a type of artificial neural network. The MLP model used in this project consists of several attributes that can be customized for optimal performance. Such as batch_size, alpha, epsilon, hidden_layer_sizes, learning_rate and max_iter.

The waveform of sound files represents the visual depiction of the audio signal over time. It shows how the amplitude of the audio signal changes at each point in time. The waveform provides insights into the characteristics of the audio, such as its intensity, duration, and variations in amplitude.

Some Waveforms of sound files are -

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
media		media
README.md		README.md
Speech Emotion Recognition.ipynb		Speech Emotion Recognition.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

media

media

README.md

README.md