Skip to content

Python scripts that harvest data from yellow pages; includes source-code and data of Nepali schools and colleges

License

Notifications You must be signed in to change notification settings

hbvj99/scrape_yellowpage

Repository files navigation

Web Scraping Yellow Pages

A collection of python scripts which scrapes data from yellow pages using BeautifulSoup. Beautiful Soup is a Python package for parsing HTML and XML documents. The data includes schools and colleges in Nepal.

You can learn about Beautiful Soup on parsing html document following its documentaton. I suggest on reading articles from dataquest datacamp to learn how data is extracted within HTML document.

Webpages

Requirement

Install its dependencies using

pip install -r requirements.txt

How to run scripts

Open terminal and simply run the python script files.

python ypnepal_school.py	
python yellowpagesnepal_college.py	
python yellowpagesnepal_school.py	

Tested using BeautifulSoup 4.4.0, Python 3.7.0


How to use data

The scripts dumps the data in CSV, JSON files. You can click here to find more dataset like this. Please refer to the licence below on how you can manipulate the data. The different information each files crawl from yellow pages includes;

- Name
- Address
- City
- Country
- Phone
- P.O.Box
- Email
- Mobile
- Webpage
- Fax
- Updatedon

CSV Dataset sample

Name,Address,City,Country,Phone,P.O.Box,Email,Mobile,Webpage,Fax,Updatedon
Aayaam International College,Kumaripati,Lalitpur,Nepal,"5550778,5537674",,[email protected],,www.aayaamcollege.edu.np,	5552785,2015-10-08
Asian Institute of Technology & Management (AITM),Khumaltar,Lalitpur,Nepal,"5541179,5552376",,[email protected],,www.aitm.edu.np,	5548772,2015-01-07
B & C Medical College & Teaching Hospital & Research Center,Birtamod-5,Jhapa,Nepal,"023-545566,542242",,[email protected],,www.bnchospital.edu.np,,2018-06-13

JSON Sample

[{"Name": "Adarsha Kanya Niketan Higher Secondary School", "Address": "MangalBazar", "City": "Kathmandu", "Country": "Nepal", "Phone": "00977-1-5521488", "Email": "[email protected]", "Updatedon": "2009-11-07"}, {"Name": "Alok Vidyashram", "Address": "Gyaneshwor", "P.O.Box": "806,Ktm", "City": "Kathmandu", "Country": "Nepal", "Phone": "00977-1-4415912,016219909", "Email": "[email protected]", "Webpage": "www.alokvidyashram.edu.np", "Updatedon": "2017-02-13"}, {"Name": "Bhanubhakta Memorial Higher Secondary School", "Address": "Panipokhari", "P.O.Box": "10597,Ktm", "City": "Kathmandu", "Country": "Nepal", "Phone": "00977-1-4415538,4413586", "Fax": "\t00977-1-4428931", "Email": "[email protected]", "Webpage": "www.bhanu.edu.np", "Updatedon": "2009-11-07"}, {"Name": "Cambridge International Boarding Higher Secondary School", "Address": "Kalanki-14", "City": "Kathmandu", "Country": "Nepal", "Phone": "00977-1-5219858,5218003", "Email": "[email protected]", "Webpage": "www.cambridgecollegekalanki.edu.np", "Updatedon": "2009-11-07"}, {"Name": "Galaxy Public School", "Address": "Gyaneshwor", "P.O.Box": "4901,Ktm", "City": "Kathmandu", "Country": "Nepal", "Phone": "00977-1-4410076,4411362", "Fax": "\t00977-1-4416989", "Email": "[email protected]", "Webpage": "www.galaxy.edu.np", "Updatedon": "2009-11-07"}]


Contributions

You can modify the content, optimize the code or even use the dataset commercially as you may like. You can credit me by mentioning this repository if you wish. Pull requests are welcomed.

SUPPORT ❤️ OPEN-SOURCE!


License

License

About

Python scripts that harvest data from yellow pages; includes source-code and data of Nepali schools and colleges

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages