Skip to content

Using pretrained T5 model for abstractive summarization of books

License

Notifications You must be signed in to change notification settings

saarthdeshpande/book-summarizer

Repository files navigation

Book-Summarizer

NLP-based book summarizer which summarises the book chapter-wise.
In case the book does not contain chapters: the entire book is summarized.

Why summarize a book?

  • The goal of writing a summary of an article, a single chapter or a whole book is to offer as accurately as possible the full sense of the original, but in a more condensed form.
  • A summary restates the author's main point, purpose, intent and supporting details in your own words.

How does the summarizer work?

  • The summarizer is developed using T5-small pretrained model from HuggingFace Transformers.
  • Chunks are created from individual chapters.
  • Then the chunks are tokenized using T5Tokenizer.
  • The tokenized text is passed to T5ForConditionalGeneration model class, for summary-ids generation.
  • The summary-ids are decoded to text using decode() function from the T5Tokenizer.

How to run the book summarizer:

  1. Clone the repository.
  2. git clone https://github.com/saarthdeshpande/book-summarizer.git
    
  3. Install all the dependencies mentioned in the requirements.txt.
  4. pip install -r requirements.txt
    
  5. To run via CLI:
  6. python3 bsCLI.py --path <path-to-PDF-file>
    
  7. To run on Flask server with frontend and mail:
    1. Update the value of sender_address and sender_pass in mail.py.
    2. Run views.py.
    3. python3 views.py
      

Screenshots

Home Page

Mail on Successful Processing