Skip to content

sreejeet/one-line-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

one-line-scraper

A web scraper in a single line of shell commands.

No, It's not a full fledged web scraper.

There is a lot more to a scraper. This is just a simple html response processor for a specific website.

Why?

I like complex text processing and this is a stepping stone. Plus, it's cool to be able to build something that require little to no overhead or environment setup. This script can be run on a fresh installation of pretty much any flavour of linux.

What programs are used here?

curl and awk. Thats all!

How can this be made better?

Adding exception handling (blocking due to too many requests, etc). Adding it as a cron job. Appending to existing data. Removing anything older than the last n number of lines. Removing duplicates wihout changing order (sort-u changes order). But then it probably won't stay in a single line, or would it?

Can i contribute?

Yes you can! You are free to add new features/improve existing code.