Real Time Twitter Sentiment Analysis for Trend tracking or Brand Improvement

Overview

Scrape Historic Tweets
Create labelled dataset using Amazon Comprehend
Train a BERT model using the labelled Dataset
Create a microservice using the trained model
Scrape live data on a user specified topic
Ingest data into kafka
Set up kafka producer to produce event stream
Set up kafka consumers to process the events, extract essential information and perform sentiment analysis on the tweet
Stream this live data in a topic
Read the live stream into Druid
Flatten the data and store it as rows and column in a database
Visualize and Analyze data using Turnilo
Dockerize various components
Use K8 to manage containers
Deploy to EC2

System Architecture

Install instructions

Create an Amazon Web Services (AWS) account

If you already have an account, skip this step.

Go to this link and follow the instructions. You will need a valid debit or credit card. You will not be charged, it is only to validate your ID.

Install AWS Command Line Interface (AWSCLI)

Install the AWS CLI Version 1 for your operating system. Please follow the appropriate link below based on your operating system.

** Please make sure you add the AWS CLI version 1 executable to your command line Path. Verify that AWS CLI is installed correctly by running aws --version.

You should see something similar to aws-cli/1.17.0 Python/3.7.4 Darwin/18.7.0 botocore/1.14.0.

Configuring the AWS CLI

You need to retrieve AWS credentials that allow your AWS CLI to access AWS resources.

Sign into the AWS console. This simply requires that you sign in with the email and password you used to create your account. If you already have an AWS account, be sure to log in as the root user.
Choose your account name in the navigation bar at the top right, and then choose My Security Credentials.
Expand the Access keys (access key ID and secret access key) section.
Press Create New Access Key.
Press Download Key File to download a CSV file that contains your new AccessKeyId and SecretKey. Keep this file somewhere where you can find it easily.

Now, you can configure your AWS CLI with the credentials you just created and downloaded.

In your Terminal, run aws configure.

i. Enter your AWS Access Key ID from the file you downloaded.
ii. Enter the AWS Secret Access Key from the file.
iii. For Default region name, enter us-east-1.
iv. For Default output format, enter json.
Run aws s3 ls in your Terminal. If your AWS CLI is configured correctly, you should see nothing (because you do not have any existing AWS S3 buckets) or if you have created AWS S3 buckets before, they will be listed in your Terminal window.

** If you get an error, then please try to configure your AWS CLI again.

Get Twitter API Keys

Create a free Twitter user account, This will allow you to access the Twitter developer portal.
Navigate to Twitter Dev Site, sign in, and create a new application. After that, fill out all the app details. Once you do this, you should have your access keys.

Install Postman

Follow the instructions of your operating system:

macOS

Windows

Install Docker

Install Docker Desktop. Use one of the links below to download the proper Docker application depending on your operating system. Create a DockerHub account if asked.

For macOS, follow this link.
For Windows 10 64-bit Home, follow this link

i. Excecute the files "first.bat" and "second.bat" in order, as administrator.

ii. Restart your computer.

iii.Excecute the following commands in terminal, as administrator.

 ```
 REG ADD "HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion" /f /v EditionID /t REG_SZ /d "Professional"
 REG ADD "HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion" /f /v ProductName /t REG_SZ /d "Windows 10 Pro"
 ```

iv. Follow this link to install Docker.

v. Restart your computer, do not log out.

vi. Excecute the following commands in terminal, as administrator.

 ```
 REG ADD "HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion" /v EditionID /t REG_SZ /d "Core"\
 REG ADD "HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion" /v ProductName /t REG_SZ /d "Windows 10 Home"
 ```

Open a Terminal window and type docker run hello-world to make sure Docker is installed properly . It should appear the following message:

Hello from Docker!
This message shows that your installation appears to be working correctly.

Finally, in the Terminal window excecute docker pull tensorflow/tensorflow:2.1.0-py3-jupyter.

Install Anaconda

Follow the instructions for your operating system.

For macOS, follow this link
For Windows, follow this link

Install Sublime

Follow the instructions for your operating system.
If you already have a prefered text editor, skip this step.

Install Kafka

Follow the following instructions to install zookeeper and kafka on your system.
Once done you can use the following commands to run the kafka server.

Start Zookeeper
bin/zookeeper-server-start.sh config/zookeeper.properties

Start Kafka
bin/kafka-server-start.sh config/server.properties

Install Druid (Windows not supported)

Follow the following instructions to install Druid on your system.

Pre-requisites

Java 8 (8u92+) or later
Linux, Mac OS X, or other Unix-like OS (Windows is not supported)

Install Turnilo (Windows not supported)

Pre-requisites

Node.js - 10.x or 8.x version.
npm - 6.5.0 version.

Once you have the pre-requisite packages:

Install Turnilo distribution using npm.
npm install -g turnilo

To connect to the existing Druid broker using --druid command line option. Turnilo will automatically introspect your Druid broker and figure out available datasets.
turnilo --druid http[s]://druid-broker-hostname[:port]

Install Superset (Windows not supported)

Pre-requisites

Docker Client

Use the commands to install Superset Incubator
git clone https://github.com/apache/incubator-superset/
cd incubator-superset
docker-compose up

Use the port: http://localhost:8088 to access superset portal

Run Sequence

Run requirements.txt

pip install -U -r requirements.txt

This command will instal all the required packages and update any older packages.

Now that we have our enviornment set up, we will create an S3 bucket.

Follow this link and create a S3 bucket.

Scraping Tweets: To run the scraping pipeline follow the detailed instructions in the Scraping Pipeline folder.
This pipeline will scrape historic tweets using tweepy library and label the dataset and save it on s3 bucket.
Run the scraping pipeline using to following command:
python annotation_pipeline.py --environment=conda run
Training Pipeline: To run the scraping pipeline follow the detailed instructions in the Training Pipeline folder.
This pipeline will read the labelled dataset from the s3 and train a ML sentiment analysis model (BERT), which we will use to service a flask api. Run the scraping pipeline using to following command:
python training.py run
Run the Flask App: You can use a docker hub image to run this app or run it locally, you will find detailed instructions on how to run the api here
This is a sentiment analysis api, which will take in a text input (tweet, in our case) and provide us with a sentiment and it's score. Run the api using to following command:
python app.py
Analysis Pipeline: This is a kafka pipeline which will injest real-time tweets and perform sentiment analysis on them and process each tweet as a event, we then store this events in druid and flatten the data, and then use turnilo for visualization.
Detailed instruction on how to run this pipeline can be found here
Now that we have our kafka stream running, we will start Druid and configure it to ingest the kafka stream.
To start Druid use the following command:
./bin/start-micro-quickstart
Configure Druid to take in the kafka stream using the following steps
Once configured, Druid will ingest real time data from kafka and store it in a database
Now that we have our data in the Druid database, we use turnilo for Data Visualization and Analysis
To start Turnilo use the following command:
turnilo --druid DRUITPORT
DRUIDPORT is the port where Druid is running, which is http://localhost:8888 by default.
Load the Superset Dashboard
One you open Superset, load the druid dataset into it using the following link
Then select import, and import the analysis.json file, which will start up the dashboard.

Future Implementations

Create a react web app as the front end of the system
Currently we have our kafka cluster and micro-service running on EC2, we'd like to house our database on cloud too, so it's remotely accessible

Name		Name	Last commit message	Last commit date
Latest commit History 173 Commits
Images		Images
ScrapingPipeline		ScrapingPipeline
StreamingPipeline		StreamingPipeline
TrainingPipeline		TrainingPipeline
api		api
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md
analysis.json		analysis.json
requirements.txt		requirements.txt

License

SidNimbalkar/Real-time-twitter-sentiment-analysis

Folders and files

Latest commit

History

Repository files navigation

Real Time Twitter Sentiment Analysis for Trend tracking or Brand Improvement

Overview

System Architecture

Install instructions

Create an Amazon Web Services (AWS) account

Install AWS Command Line Interface (AWSCLI)

Configuring the AWS CLI

Get Twitter API Keys

Install Postman

Install Docker

Install Anaconda

Install Sublime

Install Kafka

Install Druid (Windows not supported)

Pre-requisites

Install Turnilo (Windows not supported)

Pre-requisites

Install Superset (Windows not supported)

Pre-requisites

Run Sequence

Future Implementations

About

Topics

Resources

License

Stars

Watchers

Forks

Languages