Gmail Scraper for Word Analysis

TLDR;

This small script will scrape your gmail and will create a word analysis visualization from the contents of the queried emails.

The purpose is to gather a general outlook on interactions in the correspondence to try and decipher a general emotion/reaction of the participants towards one another.

Setup

Install virtual environment

pip install virtualenv

Create your virtual enviornment

 python<version> -m venv <virtual-environment-name>

Open the virtual environment

source <environment name>/bin/activate

To get this repository, run the following command inside your git enabled terminal

git clone https://github.com/TeaZea/Gmail-Scraper_Word-Analysis.git

Cd into the folder and install items on requirement.txt

pip install -r /<path>/<to>/requirements.txt

Open in Jupyter Notebook

Setting up your gmail API

Documentation on how to set up your gmail API can be found here at the official documentation page.

Overview of the code

I decided to use getpass library for some basic security. You can hardcode your gmail api password but I reccomend not doing so.

This is the main iteration of the script. It uses the imaplib and email libraries to iterate through the query (which would have been assigned before this section) and places the contents into the body variable. After converting the variable using the nltk library, I loop through the new variable (token) and remove any words from the toDropAll list.

This list is custom stopwords list I created for this example, but you can edit it with whatever other words you want. This can also be replaced by STOPWORDS library from wordcloud or another prefered library. The result is appended into the bar list variable before is converted to a string and passed into a CSV file to begin the visualization process.

This loop is similar to the previous one but instead it throws the tokenized words to a list that is sorted and then printed. This is useful if you'd like to create a .py file to run from the terminal window. Since this was created using jupyter notebooks, I decided to leverage the fact that I can use visualizations to show to output.

Example output

This example had the lyrics of Queen's Bohemian Rhapsody sent through a number emails from 1 person.

Challenges

One of the more challenging parts was the iterating through emails to tokenize the words within the contents of the email. The library made it easy, but grasping the loop within the loop was difficult at first.

This project was also the first time I was content with my utelization of list comprehension. I've always had trouble with it, but with this project, my grasp of list comprehension really grew.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
README.md		README.md
analysis_loop.jpg		analysis_loop.jpg
getpass.jpg		getpass.jpg
gmail_scraper_analysis.ipynb		gmail_scraper_analysis.ipynb
outputExample.jpg		outputExample.jpg
requirements.txt		requirements.txt
tokenized_wordcount.jpg		tokenized_wordcount.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gmail Scraper for Word Analysis

TLDR;

Setup

Setting up your gmail API

Overview of the code

Example output

This example had the lyrics of Queen's Bohemian Rhapsody sent through a number emails from 1 person.

Challenges

About

Releases

Packages

Languages

TeaZea/Gmail-Scraper_Word-Analysis

Folders and files

Latest commit

History

Repository files navigation

Gmail Scraper for Word Analysis

TLDR;

Setup

Setting up your gmail API

Overview of the code

Example output

This example had the lyrics of Queen's Bohemian Rhapsody sent through a number emails from 1 person.

Challenges

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages