Skip to content

This work consists in the developement of several Machine Learning and Deep Learning models for recognition of audio and images in real time.

Notifications You must be signed in to change notification settings

enigarv/Audio_Image_Management

Repository files navigation

Audio and Image Recognition

This work consists in the developement of several Machine Learning and Deep Learning models for recognition of audio and images. We develop two tasks:

  • Audio Recognition, we try to identify the identity of the speaker starting from a mono-dimensional signal;
  • Face Recognition, we aims to identify the indentity of the subject into an image.

First, we collected manually the audios and the images used for train our models. In particular we recorded 300 audio of five seconds (next segmented into overlapping smaller fragments of 2 seconds) and 300 images containing faces with different facial expressions and different light conditions.

For the task of audio recognition we experimented different approaches:

  • SVM;
  • RF;
  • Neural Network from Scratch;
  • CNN using the spectrograms of the audios.

For the task of face recognition we used the Fine Tuning technique testing different CNN including:

  • MobileNetV2;
  • VGG;
  • VGGface.

About us:

Ginevra Mariani - enigarv

Lorenzo Mora - lomProg

Riccardo Confalonieri - rconfa

About

This work consists in the developement of several Machine Learning and Deep Learning models for recognition of audio and images in real time.

Topics

Resources

Stars

Watchers

Forks