ART generation using speech emotions

Translation of speech to image directly without text is an interesting and useful topic due to the potential application in computer-aided design, human to computer interaction, creation of an art form, etc. So we have focused on developing Deep learning and GANs based model which will take speech as an input from the user, analyze the emotions associated with it and accordingly generate the artwork which has been demanded by the user which will in turn provide a personalized experience.

The approach used here is convolutional VQGAN to learn a codebook of context-rich visual parts, whose composition is subsequently modeled with autoregressive transformer architecture. Concept of CLIP-Contrastive Language Image-Pre-Training, also uses transformers which is a model trained to determine which caption from a set of captions best fits with a given image is used in our project.

The input speech is classified into 8 different emotions using MLP classifier trained of RAVDESS emotional speech audio dataset and this acts as a base filter for the VQGAN model. Text converted from speech plays an important role in producing the final output image using CLIP model. VQGAN+CLIP model together utilizes both emotions and text to generate a more personalized artwork.

Group members- Rishabh Patil, Omkar Narvekar, Bhagawatiraj Yadav

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
art_generation.ipynb		art_generation.ipynb
speech_emotion_detection_training.ipynb		speech_emotion_detection_training.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

art_generation.ipynb

art_generation.ipynb

speech_emotion_detection_training.ipynb

speech_emotion_detection_training.ipynb

Repository files navigation

ART generation using speech emotions

About

Releases

Packages

Languages

OmkarNarvekar001/ART_GENERATION_USING_SPEECH_EMOTIONS

Folders and files

Latest commit

History

Repository files navigation

ART generation using speech emotions

About

Topics

Resources

Stars

Watchers

Forks

Languages