Disaster Tweets Prediction: NLP, Deep Learning, LSTM

Project Background

Twitter has become an important communication channel in times of emergency. The ubiquitousness of smartphones enables people to announce an emergency they’re observing in real-time. Because of this, more agencies are interested in programatically monitoring Twitter (i.e. disaster relief organizations and news agencies).

But, it’s not always clear whether a person’s words are actually announcing a disaster. A twitter user may use the word 'ablaze" to describe an amazing view of the sky or a horrific forest fire.

There is an opportunity to use machine learning to predict whether a tweet is discussing a disaster or not. Lucky for me, Kaggle has compiled a labeled dataset of disaster and non-disaster tweets that is ready for some supervised learning.

Approach

I used a bi-directional LSTM augemented by pre-trained word emeddings as my primary means to predict whether a tweet is about a disaster. In the disaster_tweets_nb.ipynb file you can see the following steps in my process:

EDA
Pre-processing tweets (removing stopwords, spell checking, removing punctuation, etc.)
Prep data for LSTM Model (tokenizing tweets, importing word emdeddings)
Define LSTM model (create model structure and define hyperparameters)
Train/Tune Model
Make Predictions on Kaggle dataset

*Update

After experimenting with a new approach utilizing BERT (via huggingface transformer package), I have improved my score (0.81887) and ranking (824) in the kaggle competition, with no changes to preprocessing. Code in disaster_tweets_nb_BERT.ipynb.

TO DO

Improve preprocessing of tweets. For example, saw others mapping common shorthand like 'lol' to 'laugh out loud', etc

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.ipynb_checkpoints		.ipynb_checkpoints
__pycache__		__pycache__
README.md		README.md
disaster_tweets_nb.ipynb		disaster_tweets_nb.ipynb
disaster_tweets_nb_BERT.ipynb		disaster_tweets_nb_BERT.ipynb
mdl		mdl
sample_submission.csv		sample_submission.csv
sub.csv		sub.csv
sub_BERT.csv		sub_BERT.csv
test.csv		test.csv
train.csv		train.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Disaster Tweets Prediction: NLP, Deep Learning, LSTM

Project Background

Approach

*Update

TO DO

About

Languages

sdurancmu/disaster_tweets

Folders and files

Latest commit

History

Repository files navigation

Disaster Tweets Prediction: NLP, Deep Learning, LSTM

Project Background

Approach

*Update

TO DO

About

Topics

Resources

Stars

Watchers

Forks

Languages