Skip to content

Repo to analyze large swathes of text data for lexical richness

License

Notifications You must be signed in to change notification settings

harsharaman/python-book-analyzer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python Book Analyzer

A command line tool to analyze large swathes of text data to look for symbols, total and distinct words, lexical richness, word dispersion, hapax legomena and collocation.

The project is to be completed in the following steps:

  1. Read the text data from .txt file.
  2. Determine the number of total words.
  3. Determine the number of distinct words.
  4. How about lexical richness of the text. Lexical richness is the ratio of distinct words to total words.
  5. What are the most commonly used words.
  6. (Maybe) Character correlation
  7. (Maybe) Sentiment Analysis

(Maybe) is to be completed if time permits.

About

Repo to analyze large swathes of text data for lexical richness

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages