Speech-Emotion-Recognition

Introduction

This repo contains the code for speech preprocessing and feature extraction for speech emotion detection using torchaudio, which is a library for audio and signal processing with PyTorch. It provides I/O, signal and data processing functions, datasets, model implementations and application components..

Dataset

The dataset that is used is RAVDESS dataset (The Ryerson Audio-Visual Database of Emotional Speech and Song), that can be downloaded free of charge at this link.

The dataset have 7356 files (total size: 24.8 GB) and contains 24 professional actors (12 female, 12 male), vocalizing two lexically-matched statements in a neutral North American accent. Speech includes calm, happy, sad, angry, fearful, surprise, and disgust expressions, and song contains calm, happy, sad, angry, and fearful emotions. Each expression is produced at two levels of emotional intensity (normal, strong), with an additional neutral expression. All conditions are available in three modality formats: Audio-only (16bit, 48kHz .wav), Audio-Video (720p H.264, AAC 48kHz, .mp4), and Video-only (no sound). We only considered the Audio-only files.

Model

The model is trained on the google cloud platfrom GCP. The scripts for training are located in the gcp folder.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
audio-exploration		audio-exploration
feature-extraction		feature-extraction
gcp		gcp
pack-padded-tuto		pack-padded-tuto
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech-Emotion-Recognition

Introduction

Dataset

Model

About

Releases

Packages

Languages

License

younesslanda/Speech-Emotion-Recognition

Folders and files

Latest commit

History

Repository files navigation

Speech-Emotion-Recognition

Introduction

Dataset

Model

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages