Skip to content

simple text processing program which crawls imdb and extracts keywords with TextRank algorithm and crawls Digikala special offers and extracts some feature and shows them on web using Django framework

Notifications You must be signed in to change notification settings

shokoofa-ghods/Web-Crawling_Text-Proccessing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Teamwork project with v-nafise

IMDb-storyline-TextProccessing

A Python program web crawling and text proccessing of storylines for 250 top IMDb movies

At first the app needs to extract storyline text of movies via web crawlling using :

  • BeautifulSoap
  • RegEx

Then analysing and extracting the keywords of the prepared text through TextRank algorithm + text proccessing like tokenizing, deleting stopwords and lemmatizing are implemented on story-line text of each movie

After that trying to find the common keywords employed on each movie's storyline text And to show the result by a weighted graph in which the weight refers to the number of common keywords and the nodes are movies, using :

  • Netwokrx

Finally the weighted graph is stored in a csv file.

output example of 250 movie's common words computed graph:

Figure_1 (1)


Digikala-Offers Scraping

A Python program scraping the special offers of products and showing the results in a web using django framework

technologies used in this app are:

  • craping with BeautifulSoup library

  • using regex for extracting exact details

  • saving files into json and csv format file

  • using django fixtures for populating database with the data derived from previous steps

About

simple text processing program which crawls imdb and extracts keywords with TextRank algorithm and crawls Digikala special offers and extracts some feature and shows them on web using Django framework

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages