Skip to content

NLP Project of Sentiment Analysis and Topic Modelling on Covid-19 Vaccine Tweets for Opinion Mining

Notifications You must be signed in to change notification settings

Malvi-M/Sentiment-Analysis-and-Topic-Modelling-on-Covid-Vaccine-Tweets

Repository files navigation

Sentiment Analysis and Topic Modelling on Covid Vaccine Tweets

This project is intended for mining the opinion of general public regarding Covid Vaccination. Worldwide people have been dubious about vaccination drive, so the main objective of this project was to discover important topics of discussion and analyze the ratio of public having negative to positive opinions.

Country level Vaccination advancement is analyzed to track the progress of Covid Vaccination.

Description

Dataset Used

  1. Covid Vaccine Tweets Dataset
    This Twitter dataset is taken from Kaggle, which consists of tweets extracted with #CovidVaccine. It comprises of more than 200k Tweets with 13 attributes namely 'user_name', 'user_location', 'user_description', 'user_created', 'user_followers', 'user_friends', 'user_favourites', 'user_verified', 'date', 'text', ' hashtags', 'source', 'is_retweet'

  2. Covid-19 World Vaccination Progress Dataset
    Data is collected daily from Our World in Data GitHub repository for covid-19, merged and uploaded. Country level vaccination data is gathered and assembled in one single file. Then, this data file is merged with locations data file to include vaccination sources information. A second file, with manufacturers information, is included. The dataset comprises of 15 attributes only 5 attributes are mainly used in our work. They are 'total_vaccinations', 'country', 'date', 'daily_vaccinations', 'vaccines'

EDA Covid Vaccine Sentiments

  1. Pre-processed Tweets by removing special symbols (#,@), retweets and emoticons
  2. Tokenized Tweets to get seperate each token from complete sentence
  3. Removed Stop Words using NLTK's english stop words' list
  4. Extracted Nouns and Verbs using POS Tagging
  5. Applied Lemmatizer to get the root word
  6. Converted data in location field to respective Country
  7. Exploratory Data Analysis

Classification Covid Vaccine Sentiments

  1. Vectorized data using Tf-idf Vectorizer
  2. Trained model using three classifiers
    i. Gaussian Naive Bayes (Accuracy: 72%)
    ii. SVM (Accuracy: 87%)
    iii. LSTM (Accuracy 96%)

Topic Modelling LDA

  1. Tokenized tweets
  2. Removed Stop words
  3. Extracted useful terms using POS Tagging
  4. Applied Lemmatizer
  5. Vectorized using Count vectorizer
  6. Trained LDA Model
  7. Extracted Top 7 topics disccued

Libraries Used

  • Numpy
  • Pandas
  • Matplotlib
  • Seaborn
  • Plotly
  • Vader
  • NLTK
  • Sklearn

Results

Word_CLoud
Topic
Word_Cloud_Topic

Contact Info

📧 E-Mail
🤝 LinkedIn

About

NLP Project of Sentiment Analysis and Topic Modelling on Covid-19 Vaccine Tweets for Opinion Mining

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published