Text Summariser

Install NLTK module

pip install nltk

Importing required libraries

from nltk.corpus import stopwords 
from nltk.tokenize import word_tokenize, sent_tokenize

There are two NLTK libraries that will be necessary for building an efficient feedback summarizer.

Corpus Corpus means a collection of text. It could be data sets of anything containing texts be it poems by a certain poet, bodies of work by a certain author, etc. In this case, we are going to use a data set of pre-determined stop words.
Tokenizers it divides a text into a series of tokens. There are three main tokenizers – word, sentence, and regex tokenizer. We will only use the word and sentence tokenizer

Download stopwords and punkt

nltk.download('stopwords')
nltk.download('punkt')

Frequency tables A python dictionary that’ll keep a record of how many times each word appears in the feedback after removing the stop words.we can use the dictionary over every sentence to know which sentences have the most relevant content in the overall text.

stopWords = set(stopwords.words("english")) 
words = word_tokenize(text) 
freqTable = dict()

Assign score to each sentence depending on the words it contains and the frequency table

sentences = sent_tokenize(text) 
sentenceValue = dict()

Assign a certain score to compare the sentences within the feedback. A simple approach to compare our scores would be to find the average score of a sentence. The average itself can be a good threshold.

sumValues = 0
for sentence in sentenceValue: 
    sumValues += sentenceValue[sentence] 
average = int(sumValues / len(sentenceValue))

Apply the threshold value and store sentences in order into the summary.

Check the complete code in app.py

Peace !

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
NLTK_Text_Summariser.ipynb		NLTK_Text_Summariser.ipynb
README.md		README.md
app.py		app.py
result.txt		result.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitattributes

.gitattributes

.gitignore

.gitignore

LICENSE

LICENSE

NLTK_Text_Summariser.ipynb

NLTK_Text_Summariser.ipynb

README.md

README.md

app.py

app.py

result.txt

result.txt

Repository files navigation

Text Summariser

About

Releases

Languages

License

kamaravichow/text-summariser-python

Folders and files

Latest commit

History

Repository files navigation

Text Summariser

About

Topics

Resources

License

Stars

Watchers

Forks

Languages