Skip to content

English Corpus Text-Visualization using Word2Vec Model from Gensim. A mini project under the mentorship of Prof. Sandipan Ganguly, HIT-K.

License

Notifications You must be signed in to change notification settings

Rajspeaks/Deep-Learning-Approach-to-English-Corpus-Text-Visualization-using-Word2Vec-Model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

English Corpus Text Visualization using Word2Vec Model

Machine Learning approach to English Corpus Text-visualization using Word2Vec Model from Gensim Library in NLP. This project was done to test the accuracy of the Word2Vec Model on English Corpus.

Library requirements:

  1. Sklearn: Used for data preprocessing, model selection, classification, Regression, clustering.
  2. Matplotlib: It's used for 2D or 3D plotting to show Histogram, Bar-Chart etc
  3. Gensim: Open Source Library used in Text Analysis, Word2Vec, Doc2Vec.
  4. Used Melon Honey font & sample texts are collected from the Internet.

Word2Vec

Word2Vec model is used in word embedding. I have used here Gensim library & Matplotlib-pyplot for 2d visualization of corpus.

Methodology:

  1. First I took an English Corpus applied punctuation remover.
  2. Splitted the data & visualized the corpus using.
  3. Repeated the Process taking larger corpus.

Tools:

  1. Google Colab/Jupyter Notebook
  2. Language: Python
  3. Word2Vec from Gensim
  4. Matplotlib | Plyplot

Mentor

Prof. Sandipan Ganguly, HIT-K.

Developer

Rajdeep Das

Thank you