This project is intended for mining the opinion of general public regarding Covid Vaccination. Worldwide people have been dubious about vaccination drive, so the main objective of this project was to discover important topics of discussion and analyze the ratio of public having negative to positive opinions.
Country level Vaccination advancement is analyzed to track the progress of Covid Vaccination.
-
Covid Vaccine Tweets Dataset
This Twitter dataset is taken from Kaggle, which consists of tweets extracted with #CovidVaccine. It comprises of more than 200k Tweets with 13 attributes namely 'user_name', 'user_location', 'user_description', 'user_created', 'user_followers', 'user_friends', 'user_favourites', 'user_verified', 'date', 'text', ' hashtags', 'source', 'is_retweet' -
Covid-19 World Vaccination Progress Dataset
Data is collected daily from Our World in Data GitHub repository for covid-19, merged and uploaded. Country level vaccination data is gathered and assembled in one single file. Then, this data file is merged with locations data file to include vaccination sources information. A second file, with manufacturers information, is included. The dataset comprises of 15 attributes only 5 attributes are mainly used in our work. They are 'total_vaccinations', 'country', 'date', 'daily_vaccinations', 'vaccines'
- Pre-processed Tweets by removing special symbols (#,@), retweets and emoticons
- Tokenized Tweets to get seperate each token from complete sentence
- Removed Stop Words using NLTK's english stop words' list
- Extracted Nouns and Verbs using POS Tagging
- Applied Lemmatizer to get the root word
- Converted data in location field to respective Country
- Exploratory Data Analysis
- Vectorized data using Tf-idf Vectorizer
- Trained model using three classifiers
i. Gaussian Naive Bayes (Accuracy: 72%)
ii. SVM (Accuracy: 87%)
iii. LSTM (Accuracy 96%)
- Tokenized tweets
- Removed Stop words
- Extracted useful terms using POS Tagging
- Applied Lemmatizer
- Vectorized using Count vectorizer
- Trained LDA Model
- Extracted Top 7 topics disccued
- Numpy
- Pandas
- Matplotlib
- Seaborn
- Plotly
- Vader
- NLTK
- Sklearn