Skip to content

aak-ash/Sentimental-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Sentimental-Analysis

Sentimental Analysis of Movie Reviews Using Pytorch


Preparing Data

  • One of the main concepts of TorchText is the Field. These define how your data should be processed. In our sentiment classification task the data consists of both the raw string of the review and the sentiment, either "pos" or "neg".

  • The parameters of a Field specify how the data should be processed.

  • We use the TEXT field to define how the review should be processed, and the LABEL field to process the sentiment.

  • TEXT field has tokenize='spacy' as an argument. This defines that the "tokenization" (the act of splitting the string into discrete "tokens") should be done using the spaCy tokenizer. If no tokenize argument is passed, the default is simply splitting the string on spaces.

  • LABEL is defined by a LabelField.

Building Vocabulary

  • The number of unique words in our training set is over 100,000 , which means that our one-hot vectors will have over 100,000 dimensions
  • To reduce the dimensions we only keep 25,000 Most Common Words

Model

  • We'll be using LSTMs, as they don't Suffer from vanishing gradient problem
  • They overcome this by Using Cell state
  • To overcome the problem of Overfitting we'll use - Dropout

Optimizer

  • Adam Optimizer is used

About

Sentimental Analysis of Movie Reviews Using Pytorch

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages