Sentiment Analysis and Topic Modelling on Covid Vaccine Tweets

This project is intended for mining the opinion of general public regarding Covid Vaccination. Worldwide people have been dubious about vaccination drive, so the main objective of this project was to discover important topics of discussion and analyze the ratio of public having negative to positive opinions.

Country level Vaccination advancement is analyzed to track the progress of Covid Vaccination.

Description

Dataset Used

Covid Vaccine Tweets Dataset
This Twitter dataset is taken from Kaggle, which consists of tweets extracted with #CovidVaccine. It comprises of more than 200k Tweets with 13 attributes namely 'user_name', 'user_location', 'user_description', 'user_created', 'user_followers', 'user_friends', 'user_favourites', 'user_verified', 'date', 'text', ' hashtags', 'source', 'is_retweet'
Covid-19 World Vaccination Progress Dataset
Data is collected daily from Our World in Data GitHub repository for covid-19, merged and uploaded. Country level vaccination data is gathered and assembled in one single file. Then, this data file is merged with locations data file to include vaccination sources information. A second file, with manufacturers information, is included. The dataset comprises of 15 attributes only 5 attributes are mainly used in our work. They are 'total_vaccinations', 'country', 'date', 'daily_vaccinations', 'vaccines'

EDA Covid Vaccine Sentiments

Pre-processed Tweets by removing special symbols (#,@), retweets and emoticons
Tokenized Tweets to get seperate each token from complete sentence
Removed Stop Words using NLTK's english stop words' list
Extracted Nouns and Verbs using POS Tagging
Applied Lemmatizer to get the root word
Converted data in location field to respective Country
Exploratory Data Analysis

Classification Covid Vaccine Sentiments

Vectorized data using Tf-idf Vectorizer
Trained model using three classifiers
i. Gaussian Naive Bayes (Accuracy: 72%)
ii. SVM (Accuracy: 87%)
iii. LSTM (Accuracy 96%)

Topic Modelling LDA

Tokenized tweets
Removed Stop words
Extracted useful terms using POS Tagging
Applied Lemmatizer
Vectorized using Count vectorizer
Trained LDA Model
Extracted Top 7 topics disccued

Libraries Used

Numpy
Pandas
Matplotlib
Seaborn
Plotly
Vader
NLTK
Sklearn

Results

Contact Info

📧 E-Mail
🤝 LinkedIn

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Classification_CovidVaccine_Sentiments.ipynb		Classification_CovidVaccine_Sentiments.ipynb
EDA_CovidVaccine_Sentiments.ipynb		EDA_CovidVaccine_Sentiments.ipynb
Keras_LSTM.PNG		Keras_LSTM.PNG
README.md		README.md
Topic.PNG		Topic.PNG
Topic_Modelling_LDA.ipynb		Topic_Modelling_LDA.ipynb
Word_Cloud.PNG		Word_Cloud.PNG
Words.PNG		Words.PNG

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentiment Analysis and Topic Modelling on Covid Vaccine Tweets

Description

Dataset Used

EDA Covid Vaccine Sentiments

Classification Covid Vaccine Sentiments

Topic Modelling LDA

Libraries Used

Results

Contact Info

About

Releases

Packages

Languages

Malvi-M/Sentiment-Analysis-and-Topic-Modelling-on-Covid-Vaccine-Tweets

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis and Topic Modelling on Covid Vaccine Tweets

Description

Dataset Used

EDA Covid Vaccine Sentiments

Classification Covid Vaccine Sentiments

Topic Modelling LDA

Libraries Used

Results

Contact Info

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages