Skip to content

JLospinoso/abrade

Repository files navigation

Abrade

CI

Abrade is a coroutine-based web scraper suitable for querying the existence (a HEAD request) or the contents (a GET request) of a web resource with a sequential, numerical pattern.

Check out the blog post at http://lospi.net for usage and examples.

> abrade -h
Usage: abrade host pattern:
  --host arg                            host name (eg example.com)
  --pattern arg (=/)                    format of URL (eg ?mynum={1:5}&myhex=0x
                                        {hhhh}). See documentation for
                                        formatting of patterns.
  --agent arg (=Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0)
                                        User-agent string (default: Firefox 47)
  --out arg                             output path. dir if contents enabled.
                                        (default: HOSTNAME)
  --err arg                             error path (file). (default:
                                        HOSTNAME-err.log)
  --proxy arg                           SOCKS5 proxy address:port. (default:
                                        none)
  --screen arg                          omits 200-level response if contents
                                        contains screen (default: none)
  -d [ --stdin ]                        read from stdin (default: no)
  -t [ --tls ]                          use tls/ssl (default: no)
  -s [ --sensitive ]                    complain about rude TCP teardowns
                                        (default: no)
  -o [ --tor ]                          use local proxy at 127.0.0.1:9050
                                        (default: no)
  -r [ --verify ]                       verify ssl (default: no)
  -l [ --leadzero ]                     output leading zeros in URL (default:
                                        no)
  -e [ --telescoping ]                  do not telescope the pattern (default:
                                        no)
  -f [ --found ]                        print when resource found (default:
                                        no). implied by verbose
  -v [ --verbose ]                      prints gratuitous output to console
                                        (default: no)
  -c [ --contents ]                     read full contents (default: no)
  --test                                no network requests, just write
                                        generated URIs to console (default: no)
  -p [ --optimize ]                     Optimize number of simultaneous
                                        requests (default: no)
  -i [ --init ] arg (=1000)             Initial number of simultaneous requests
  --min arg (=1)                        Minimum number of simultaneous requests
  --max arg (=25000)                    Maximum number of simultaneous requests
  --ssize arg (=50)                     Size of velocity sliding window
  --sint arg (=1000)                    Size of sampling interval
  -h [ --help ]                         produce help message

v0.2

You can now pipe URLs to Abrade via the --stdin option:

echo /anything/a/b/c?d=123 | abrade httpbin.org --stdin --contents --verbose

You must omit the pattern positional argument to pipe from stdin.

You can also use the --screen option to detect error landing pages that still return 200 responses. Such responses get screened out and will not get written to disk during a --content scrape.

docker pull jlospinoso/abrade:v0.2.0

or

docker pull quay.io/jlospinoso/abrade:v0.2.0

v0.1

docker pull jlospinoso/abrade:v0.1.0

or

docker pull quay.io/jlospinoso/abrade:v0.1.0

Building Abrade

  1. Abrade uses cmake, so you'll need to install it.
  2. Clone abrade.
  3. Navigate to the checked out directory.
  4. Make a build subdirectory.
  5. Navigate to the build directory.
  6. Invoke cmake.
  7. Use make (*nix) or Visual Studio (Windows) to build the project.

For example, on *nix:

git clone [email protected]:JLospinoso/abrade.git
cd abrade
mkdir build
cd build
cmake ..
make

On Windows, you'll need to open the abrade.sln file and build.