Skip to content

Using Extractive summarization to summarize medium posts

License

Notifications You must be signed in to change notification settings

anitaokoh/Medium_Summarizer

Repository files navigation

Medium Summarizer App

This is a web app that scrapes medium post get inputting the post url, sumarizes it into 5- 10 sentences using the extractive summarization method , get 5 keywords in the articles based on importance(not frequency) using TFIDF and then outputs the summary and the keywords.

Some of the tools used

  • App framework tools like Streamlit(first attempt), Heroku
  • Extractive summarization tool like Sumy(LSA model)
  • Webscraping libraries like BeautifulSoup and Requests
  • URL verification library like TldExtract
  • Text preprocessing libraries like Spacy(heavy duty), Sumy
  • Text vectorization library like TFIDFVectorizer and CountVectorizer in Scikit-learn
  • Others like HTML etc

The most difficult part was trying to understand the algorithm behind the summarization model and using Streamlit to wrap all the components together.

Here is a demo of the app

You can find the link to the app here

Further improvements to be done are

  • Moving from extractive summarizer to abstractive summarizer
  • Improvemnet in building a good summarizer from scratch
  • Improvement on the scraping part as well as the interface.

Feel free to go through my draft_work and jupyter_notebooks

About

Using Extractive summarization to summarize medium posts

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published