Skip to content

Latest commit

 

History

History
45 lines (30 loc) · 1.12 KB

README.md

File metadata and controls

45 lines (30 loc) · 1.12 KB

Concurrent Web Scraping with Selenium Grid and Docker Swarm

Want to learn how to build this project?

Check out the blog post.

Want to use this project?

  1. Fork/Clone

  2. Create and activate a virtual environment

  3. Install the requirements

  4. Sign up for Digital Ocean and generate an access token

  5. Add the token to your environment:

    (env)$ export DIGITAL_OCEAN_ACCESS_TOKEN=[your_token]
  6. Spin up four droplets and deploy Docker Swarm:

    (env)$ sh project/create.sh
  7. Run the scraper:

    (env)$ docker-machine env node-1
    (env)$ eval $(docker-machine env node-1)
    (env)$ NODE=$(docker service ps --format "{{.Node}}" selenium_hub)
    (env)$ for i in {1..8}; do {
             python project/script.py ${i} $(docker-machine ip $NODE) &
           };
           done
  8. Bring down the resources:

    (env)$ sh project/destroy.sh