js_driven_scraping

Scraping Javascript Driven Websites Using Python

Run js_scrape.py file and you should get lesser results of the images then actually exists on the website. This is due to the javascript getting loaded while the website is also loading so python requests cannot scrape that for us

For scraping javascript-driven websites we need a more powerful python package which is selenium-python

Using the instructions given in the docs for selenium-python install the selenium and firefox drivers for selenium. Be sure to donwload

Download 32bit or 64bit according to our specs for windows, unzip the folder and add the path of that folder in system variables

Create a new file named using_selenium.py

import requests
from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Firefox()

When you run the file the firefox browser with a new window should open.

For opening a url in browser window, close the firefox browser and add

driver.get('https://google.com')

at last. Then run the file again and the browser should open the google website

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
images		images
.gitignore		.gitignore
README.md		README.md
js_scrape.py		js_scrape.py
scraping_images.py		scraping_images.py
using_selenium.py		using_selenium.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

js_driven_scraping

About

Releases

Packages

Languages

Alexmhack/js_driven_scraping

Folders and files

Latest commit

History

Repository files navigation

js_driven_scraping

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages