Scrapy14

Scraping News Stories - Multiple Sources

'independent'
'guardian'
'express'

View the YouTube Playlist for the entire project (https://www.youtube.com/playlist?list=PLKMY3XNPiQ7u_ljiiDt1382T9T4xgLpRI)

Objective : Multiple spiders using ONE items.py with MySQL database for consistent data

Check all potential news sites in Scrapy shell first

Use scrapy shell's fetch (url, headers={}) https://youtu.be/UaqSo7hlX9g

Also you can check with scrapy shell and curl

~~### Plan the columns / fields for "items" to scrape~~

Also features a fix for scrapy & items 'module not found' error :

Add this with imports in each spider

import sys
sys.path.insert(0,'..')
from items import NewzzItem

Add new database to MySQL

sudo mysql -u root -p -h localhost

DROP DATABASE IF EXISTS newz;
CREATE DATABASE newz;

GRANT ALL PRIVILEGES ON newz.* TO 'pi'@'localhost';

FLUSH PRIVILEGES;

Allow remote connection to database

GRANT ALL PRIVILEGES ON newz.*  TO 'user1'@'%';

XPATH selectors - some more advanced examples

response.xpath('//*[@id="articleHeader"]//a[contains(@href,"/author/")]/text()')[0].get()
response.xpath('//a[@class="title"][not(contains(@href,"https://www.independent.co.uk/vouchercodes"))]/@href')

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
README.md		README.md
items.py		items.py
middlewares.py		middlewares.py
newzspider.py		newzspider.py
newzspider2.py		newzspider2.py
newzspider3.py		newzspider3.py
pipelines.py		pipelines.py
settings.py		settings.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scrapy14

Objective : Multiple spiders using ONE items.py with MySQL database for consistent data

Check all potential news sites in Scrapy shell first

Use scrapy shell's fetch (url, headers={}) https://youtu.be/UaqSo7hlX9g

Also features a fix for scrapy & items 'module not found' error :

Add this with imports in each spider

Add new database to MySQL

Allow remote connection to database

XPATH selectors - some more advanced examples

More to follow - Also : visit my web scraping and automation site : https://redandgreen.co.uk/

how to back up MySQL databse and restore : https://redandgreen.co.uk/mysql-backup-and-restore-for-scrapy-web-scraping/

About

Languages

RGGH/Scrapy14

Folders and files

Latest commit

History

Repository files navigation

Scrapy14

Objective : Multiple spiders using ONE items.py with MySQL database for consistent data

Check all potential news sites in Scrapy shell first

Use scrapy shell's fetch (url, headers={}) https://youtu.be/UaqSo7hlX9g

Also features a fix for scrapy & items 'module not found' error :

Add this with imports in each spider

Add new database to MySQL

Allow remote connection to database

XPATH selectors - some more advanced examples

More to follow - Also : visit my web scraping and automation site : https://redandgreen.co.uk/

how to back up MySQL databse and restore : https://redandgreen.co.uk/mysql-backup-and-restore-for-scrapy-web-scraping/

About

Topics

Resources

Stars

Watchers

Forks

Languages