A command line tool to analyze large swathes of text data to look for symbols, total and distinct words, lexical richness, word dispersion, hapax legomena and collocation.
- Read the text data from .txt file.
- Determine the number of total words.
- Determine the number of distinct words.
- How about lexical richness of the text. Lexical richness is the ratio of distinct words to total words.
- What are the most commonly used words.
- (Maybe) Character correlation
- (Maybe) Sentiment Analysis
(Maybe) is to be completed if time permits.