Scraping News Stories - Multiple Sources
- 'independent'
- 'guardian'
- 'express'
View the YouTube Playlist for the entire project (https://www.youtube.com/playlist?list=PLKMY3XNPiQ7u_ljiiDt1382T9T4xgLpRI)
Use scrapy shell's fetch (url, headers={}) https://youtu.be/UaqSo7hlX9g
Also you can check with scrapy shell and curl
### Plan the columns / fields for "items" to scrape
import sys
sys.path.insert(0,'..')
from items import NewzzItem
sudo mysql -u root -p -h localhost
DROP DATABASE IF EXISTS newz;
CREATE DATABASE newz;
GRANT ALL PRIVILEGES ON newz.* TO 'pi'@'localhost';
FLUSH PRIVILEGES;
GRANT ALL PRIVILEGES ON newz.* TO 'user1'@'%';
response.xpath('//*[@id="articleHeader"]//a[contains(@href,"/author/")]/text()')[0].get()
response.xpath('//a[@class="title"][not(contains(@href,"https://www.independent.co.uk/vouchercodes"))]/@href')