Web Scraping: Tweet Archive of Twitter User @dog_rates data analysis

I will be gathering data from three sources, including downloading file from Udacity, dowloading from url and web scraping from Twitter API.

The project includes the following contents:

Introduction
Data gathering
Data assessing
Data Cleaning
Data Analysis and visualization
Proposal for the next step

Language, Packages and Libraries

The project is using Jupyter notebook with Python 3.7. The packages include numpy, pandas, requests, tweepy, json, timeit, re, matplotlib.pyplot, seaborn, and scipy.stats.

Metadata

The definition of tweet data can be found on the twitter website

Image prediction is generated by neural network. The definitions of the variables are in the following table.

Variable Name	Definition
tweet_id	the last part of the tweet URL after "status/"
p1	the algorithm's #1 prediction for the image in the tweet
p1_conf	how confident the algorithm is in its #1 prediction
p1_dog	whether or not the #1 prediction is a breed of dog
p2	the algorithm's #2 prediction for the image in the tweet
p2_conf	how confident the algorithm is in its #2 prediction
p2_dog	whether or not the #2 prediction is a breed of dog
p3	the algorithm's #3 prediction for the image in the tweet
p3_conf	how confident the algorithm is in its #3 prediction
p3_dog	whether or not the #3 prediction is a breed of dog

Methodology

Descriptive Statistics
Predictive Modeling (in development)

Reports

wrangle_act.ipynb includes all the process from data wrangling to data analysis.
wrangle_report.ipynb, wrangle_report.html: documentation for data wrangling steps: gather, assess, and clean
act_report.ipynb, act_report.html: documentation of analysis and insights into final data

Datasets

twitter_archive_enhanced.csv: file as given
image_predictions.tsv: file downloaded programmatically
tweet_json.txt: file constructed via API
twitter_archive_master.csv: combined and cleaned data
image_predictions_tp.csv: cleaned data
Twitter.db includes cleaned data, twitter_archive_master.csv and image_predictions_tp.csv

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.ipynb_checkpoints		.ipynb_checkpoints
README.md		README.md
Twitter.db		Twitter.db
Udacity_Reviews_1.pdf		Udacity_Reviews_1.pdf
Udacity_Reviews_2.pdf		Udacity_Reviews_2.pdf
act_report.html		act_report.html
act_report.ipynb		act_report.ipynb
breed_fav.png		breed_fav.png
breed_retw.png		breed_retw.png
fav_retweet.png		fav_retweet.png
image-predictions.tsv		image-predictions.tsv
image_predictions_tp.csv		image_predictions_tp.csv
stag_fav.png		stag_fav.png
stag_retw.png		stag_retw.png
tweet_json.txt		tweet_json.txt
twitter-archive-enhanced.csv		twitter-archive-enhanced.csv
twitter_archive_master.csv		twitter_archive_master.csv
wrangle_act.ipynb		wrangle_act.ipynb
wrangle_report.html		wrangle_report.html
wrangle_report.ipynb		wrangle_report.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Scraping: Tweet Archive of Twitter User @dog_rates data analysis

Language, Packages and Libraries

Metadata

Methodology

Reports

Datasets

Feedback

About

Releases

Packages

Languages

jemc36/Udacity-DAND-DataWrangling-TwitterAPI-WeRateDogs

Folders and files

Latest commit

History

Repository files navigation

Web Scraping: Tweet Archive of Twitter User @dog_rates data analysis

Language, Packages and Libraries

Metadata

Methodology

Reports

Datasets

Feedback

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages