Skip to content

reutregev/sentiment-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Text Sentiment Analysis

This project utilizes a pre-trained language model (PyTorch) for the downstream task of sentiment classification of text.

The trained classifier is served via REST API built with Flask.

How to use

Training

The dataset used for training is Twitter US Airline Sentiment, which includes reviews on US Airline companies and their sentiment (positive, negative, or neutral).
The data is split into train and validation, which are preprocessed by using a tokenizer that corresponds to the language model.

To build the classifier, the type of language model should be passed as an input, otherwise it uses base BERT as a default.
The added model files (as git LFS artifacts) are of trained classifier based on pre-trained bert-base-uncased.

Below is an example to run the training:

python ./src/main.py [ARGUMENTS] [OPTIONS]

Arguments:
    --data_path     Path to .csv file, containing tweets (and their sentiment- optional)
    --label_col     Name of label column in the file
    --text_col      Name of text column in the file
    --model_dir     Path to directory in which the trained model should be saved

Options:
    --lang_model    Type of language model to use for pretrained model and tokenization

Inference

For inference, you should run either the Flask app or the docker:

  • Flask:
python ./app/app.py [OPTIONS]

Options:
    --model_dir     Path to directory where the trained model is saved
    --lang_model    Type of language model to use for tokenization. Must correspond to the one used in the training phase
  • Docker:
    • Build the docker image:
      docker build -t <docker_name> -f ./docker/Dockerfile .
      
    • Once the image is built, run the container:
      docker run -d -p 9980:9980 <docker_name>
      

Once the service is running, run the following request to get sentiment predictions:

    curl --location --request POST 'http://127.0.0.1:9980/predict' \
         --header 'Content-Type: application/json' \
         --data-raw '{
                        "text1": "Had a terrible experience flying with you!",
                        "text2": "The best airline company ever!!!"
                     }'

Where --data-raw contains the texts for prediction in json format.

About

Text sentiment analysis using large language models, served via REST API built with Flask

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published