Skip to content

franciscobmacedo/debtors-scraper

Repository files navigation

Debtors Scraper

The Portuguese Tax Authority holds a list of all of it's debtors, from singular individuals to colective entities here:

https://static.portaldasfinancas.gov.pt/app/devedores_static/de-devedores.html

This information is presented in a set of PDFs, making it hard to search, analyse or actually find someone in it.

This is a scraper that fetches this data, parses the PDF files, and joins them together in json files.

Everyday, github actions run the scraper and update the main json file, that works as a JSON API for the frontend version of this project:

https://debtors.fmacedo.com

The code for the frontend can be found here.

The entire platform runs free of charge, using github actions to update the backend service, github to serve the json data and cloudflare pages to host the frontend.

Contributing

Contributions are welcome. Feel free to open an issue or submit a pull request. If you're not sure where to start, mention me in the comments!

Installation

  1. Clone the repository:

    git clone https://github.com/franciscomacedo/debtors-scraper.git
  2. Install the required dependencies:

    poetry install

Usage

  1. Navigate to the project directory:

    cd debtors-scraper
  2. Run the run.py script:

    python run.py

    This will execute the main() function, which sets up the configuration, fetches data, parses files, and joins them together.

Contributing

Contributions are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request.

About

Portuguese Tax Authority debtors scraper

Resources

Stars

Watchers

Forks

Sponsor this project