A collection of python scripts which scrapes data from yellow pages using BeautifulSoup. Beautiful Soup is a Python package for parsing HTML and XML documents. The data includes schools and colleges in Nepal.
You can learn about Beautiful Soup on parsing html document following its documentaton. I suggest on reading articles from dataquest datacamp to learn how data is extracted within HTML document.
- http://yellowpagesnepal.com/index.php?cat=541&page=1
- http://yellowpagesnepal.com/index.php?cat=196
- http://www.ypnepal.com/index.php?cat=530&Schools-&-Higher-Secondary-Schools
- http://www.ypnepal.com/index.php?cat=196&Colleges
Install its dependencies using
pip install -r requirements.txt
Open terminal and simply run the python script files.
python ypnepal_school.py
python yellowpagesnepal_college.py
python yellowpagesnepal_school.py
Tested using BeautifulSoup 4.4.0, Python 3.7.0
The scripts dumps the data in CSV, JSON files. You can click here to find more dataset like this. Please refer to the licence below on how you can manipulate the data. The different information each files crawl from yellow pages includes;
- Name
- Address
- City
- Country
- Phone
- P.O.Box
- Email
- Mobile
- Webpage
- Fax
- Updatedon
Name,Address,City,Country,Phone,P.O.Box,Email,Mobile,Webpage,Fax,Updatedon
Aayaam International College,Kumaripati,Lalitpur,Nepal,"5550778,5537674",,[email protected],,www.aayaamcollege.edu.np, 5552785,2015-10-08
Asian Institute of Technology & Management (AITM),Khumaltar,Lalitpur,Nepal,"5541179,5552376",,[email protected],,www.aitm.edu.np, 5548772,2015-01-07
B & C Medical College & Teaching Hospital & Research Center,Birtamod-5,Jhapa,Nepal,"023-545566,542242",,[email protected],,www.bnchospital.edu.np,,2018-06-13
[{"Name": "Adarsha Kanya Niketan Higher Secondary School", "Address": "MangalBazar", "City": "Kathmandu", "Country": "Nepal", "Phone": "00977-1-5521488", "Email": "[email protected]", "Updatedon": "2009-11-07"}, {"Name": "Alok Vidyashram", "Address": "Gyaneshwor", "P.O.Box": "806,Ktm", "City": "Kathmandu", "Country": "Nepal", "Phone": "00977-1-4415912,016219909", "Email": "[email protected]", "Webpage": "www.alokvidyashram.edu.np", "Updatedon": "2017-02-13"}, {"Name": "Bhanubhakta Memorial Higher Secondary School", "Address": "Panipokhari", "P.O.Box": "10597,Ktm", "City": "Kathmandu", "Country": "Nepal", "Phone": "00977-1-4415538,4413586", "Fax": "\t00977-1-4428931", "Email": "[email protected]", "Webpage": "www.bhanu.edu.np", "Updatedon": "2009-11-07"}, {"Name": "Cambridge International Boarding Higher Secondary School", "Address": "Kalanki-14", "City": "Kathmandu", "Country": "Nepal", "Phone": "00977-1-5219858,5218003", "Email": "[email protected]", "Webpage": "www.cambridgecollegekalanki.edu.np", "Updatedon": "2009-11-07"}, {"Name": "Galaxy Public School", "Address": "Gyaneshwor", "P.O.Box": "4901,Ktm", "City": "Kathmandu", "Country": "Nepal", "Phone": "00977-1-4410076,4411362", "Fax": "\t00977-1-4416989", "Email": "[email protected]", "Webpage": "www.galaxy.edu.np", "Updatedon": "2009-11-07"}]
You can modify the content, optimize the code or even use the dataset commercially as you may like. You can credit me by mentioning this repository if you wish. Pull requests are welcomed.
SUPPORT ❤️ OPEN-SOURCE!
- MIT license
- Copyright 2019 © Vijay Pathak.