This small script will scrape your gmail and will create a word analysis visualization from the contents of the queried emails.
The purpose is to gather a general outlook on interactions in the correspondence to try and decipher a general emotion/reaction of the participants towards one another.
Install virtual environment
pip install virtualenv
Create your virtual enviornment
python<version> -m venv <virtual-environment-name>
Open the virtual environment
source <environment name>/bin/activate
To get this repository, run the following command inside your git enabled terminal
git clone https://github.com/TeaZea/Gmail-Scraper_Word-Analysis.git
Cd into the folder and install items on requirement.txt
pip install -r /<path>/<to>/requirements.txt
Open in Jupyter Notebook
Documentation on how to set up your gmail API can be found here at the official documentation page.
I decided to use getpass library for some basic security. You can hardcode your gmail api password but I reccomend not doing so.
This is the main iteration of the script. It uses the imaplib and email libraries to iterate through the query (which would have been assigned before this section) and places the contents into the body variable. After converting the variable using the nltk library, I loop through the new variable (token) and remove any words from the toDropAll list.
This list is custom stopwords list I created for this example, but you can edit it with whatever other words you want. This can also be replaced by STOPWORDS library from wordcloud or another prefered library. The result is appended into the bar list variable before is converted to a string and passed into a CSV file to begin the visualization process.
This loop is similar to the previous one but instead it throws the tokenized words to a list that is sorted and then printed. This is useful if you'd like to create a .py file to run from the terminal window. Since this was created using jupyter notebooks, I decided to leverage the fact that I can use visualizations to show to output.
This example had the lyrics of Queen's Bohemian Rhapsody sent through a number emails from 1 person.
One of the more challenging parts was the iterating through emails to tokenize the words within the contents of the email. The library made it easy, but grasping the loop within the loop was difficult at first.
This project was also the first time I was content with my utelization of list comprehension. I've always had trouble with it, but with this project, my grasp of list comprehension really grew.