Data Analysis on Reddit data, using Python

The Python Reddit API Wrapper(PRAW) is used to extract information from Reddit. Analysis is carried out on this information using various Python libraries.

Overview

Reddit is a widely used social media website, with emphasis on social news aggregation, discussion and user content. From Wikipedia

Reddit is an American social news aggregation, web content rating, and discussion website. Registered members submit content to the site such as links, text posts, images, and videos, which are then voted up or down by other members. Posts are organized by subject into user-created boards called "communities" or "subreddits", which cover a variety of topics such as news, politics, religion, science, movies, video games, music, books, sports, fitness, cooking, pets, and image-sharing. Submissions with more up-votes appear towards the top of their subreddit and, if they receive enough up-votes, ultimately on the site's front page.

Using PRAW, data is retrieved from the website. Analysis is done in two Jupyter notebooks - one for a specific reddit post, while the other is for the analysis of two subreddits as a whole.

Dependencies used

praw
pandas
matplotlib
seaborn
spacy
textblob
numpy

Details

Sentiment Analysis on a specific Reddit post

We select the Daily Discussion post dated 31st May 2021, from the r/soccer subreddit as our specific post.

We carry out the following operations :

Read all comments beneath the post.
Find out the sentiment value for each comment, using TextBlob.
Clean the resultant data, and store it in a pandas DataFrame.
Find out the total number of positive and negative comments.
Find out the top 10 Proper Nouns used. This gives us an idea about the most-discussed topics.
Find out the top 10 positive words used.

Data Analysis and Comparison between two subreddits

We select two subreddits for this purpose - r/india and r/politics.

We carry out the following operations :

For each subreddit,
a. Extract details of the top 100 posts from the last year, like account name, upvote ratio, total score(upvotes - downvotes), number of awards etc.
b. Analyse the source url of these posts.
c. Plot graphs between attributes like score, number of comments, number of awards etc to see whether any sort of relationship exists between them.
For r/politics, find the most discussed topics among the top posts (by using the titles of the top 100 posts).
Compare both subreddits using the respective descriptive statistics like mean score, mean upvote ratio, mean number of comments among their top 100 posts over the last year.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
reddit-post-sentiment-analysis.ipynb		reddit-post-sentiment-analysis.ipynb
subreddit-analysis.ipynb		subreddit-analysis.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

reddit-post-sentiment-analysis.ipynb

reddit-post-sentiment-analysis.ipynb

subreddit-analysis.ipynb

subreddit-analysis.ipynb

Repository files navigation

Data Analysis on Reddit data, using Python

Overview

Dependencies used

Details

Sentiment Analysis on a specific Reddit post

Data Analysis and Comparison between two subreddits

About

Releases

Packages

Languages

pillaikartik10/python-reddit-analysis

Folders and files

Latest commit

History

Repository files navigation

Data Analysis on Reddit data, using Python

Overview

Dependencies used

Details

About

Topics

Resources

Stars

Watchers

Forks

Languages