GitHub - IyanuSh/Python-Project: Text Analysis & Sentiment Analysis of Charles Dicken's 'Oliver Twist' on Python

Project Title

Text Analysis and Sentiment Analysis of Charles Dickens’ Oliver Twist using Natural Language Tool Kit and TextBlob

Project Background

This project will focus on conducting a text analysis of the famous novel, ‘Oliver Twist’ written by renowned writer, Charles Dickens. With the aid of Natural Language Toolkit (NLTK) and TextBlob, the text will be analyzed to know the word that was mentioned frequently in the novel, the number of times the main character was mentioned. Also, a brief sentiment analysis will be carried out on the novel.

Prerequisites

For the purpose of this project, python needs to be downloaded. This project will be using NLTK for text analysis, so NLTK has to be downloaded. The dataset - Oliver Twist written by Charles Dickens- used for this project was sourced from Project Gutenberg.

Install & Setup

To analyse the text, follow this step on:

< !pip install nltk >

< !pip install wordcloud >

< !pip install texblob >

Also, download all the NLTK packages to import some of them into the code and install TextBlob for Sentiment Analysis on the text.

Obtaining of Dataset

The dataset - Oliver Twist by Charles Dickens- used for this project was sourced the online digital library, Project Gutenberg. The webpage of the textfile is http://www.gutenberg.org/cache/epub/730/pg730.txt. It was loaded into the program and the text was opened to be read so that the text contents will be analysed. The text was converted to UTF-8 (8-bit Unicode Transformation Format).

Project Goal

The goal of this project is to find out the most frequent word that was mentioned in the text. Also, this project focuses on revealing the sentiment of the text.

Code Implememtation

After loading the text from the web, the text was cleaned for effective analysis to remove stopwords, numbers and punctuation marks. Tokenization, splitting of whitespace, normalization (converting text to lowercase), stemming, and lemmatization operations were done on the text. To get the most frequent word mentioned in the text, frequency distribution was performed. The ten most frequent words mentioned in the text and their frequencies were represented on a grap using the Matplot Library and Word Cloud. To perform sentiment analysis on the text, Textblob was installed to know the subjectivity and polarity results of the text. Also, SentimentIntensityAnalyzer was imported using Vader to know the sentiment of the text.

Resources

Several online resources were used to carried out the text and sentiment analysis of text. The webpages are listed below:

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
PROJECT INFORMATION.docx		PROJECT INFORMATION.docx
Python Project - Text Analysis & Sentiment Analysis.ipynb		Python Project - Text Analysis & Sentiment Analysis.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Title

Table of Contents

Project Background

Prerequisites

Install & Setup

Obtaining of Dataset

Project Goal

Code Implememtation

Resources

About

Releases

Packages

Languages

IyanuSh/Python-Project

Folders and files

Latest commit

History

Repository files navigation

Project Title

Table of Contents

Project Background

Prerequisites

Install & Setup

Obtaining of Dataset

Project Goal

Code Implememtation

Resources

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages