GitHub - Erdos1729/webscrapping-identify-download-classify-published-pdfs-from-multiple-urls: This repository will assist you in scrapping data from multiple websites. It will identify, download and classify the latest pdf files published on a website as per the users requirement. This can be used for automating various operations involved in market research.

Webscrapping to identify and download latest pdf documents. Classify these documents into pre-defined categories.

This repository will assist you in scrapping data from multiple websites. It will download the latest pdf files published on a website in a specific folder as per the users requirement. This can be used for automating various operations involved in market research.
Once the pdfs are downloaded they are classified into oil/no_oil/foreign_language categories based on a string based rule
You can customize these rules for classification as per your need

I devised the solution from the following pages of the documentation:

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
mr_project		mr_project
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
radar_automation.py		radar_automation.py
requirements		requirements