Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] PDF Capture of Gauteng Status #588

Open
shaze opened this issue Jul 22, 2020 · 3 comments
Open

[Feature] PDF Capture of Gauteng Status #588

shaze opened this issue Jul 22, 2020 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@shaze
Copy link
Contributor

shaze commented Jul 22, 2020

Gauteng District Results Wednesday 22 July 2020 (1).pdf
Is your feature request related to a problem? Please describe.

Semi-automate the capturing of daily data from the GDOH rather than manual transcription

@SimonRosen173 has offered to work on this

Describe the solution you'd like

Program that given the PDF produces text results -- ideally two lines -- headings in one line, data in the next, values comma separated.

I have attached a sample PDF

Describe alternatives you've considered

We currently manually transcribe.

Additional context

This is what I currently capture

YYYYMMDD | date | GP Cases |   | GP Recoveries | GP Deaths |   | GP Hospitalisations | Johannesburg Cases |   | Johannesburg Recoveries | Johannesburg Deaths |   | Ekurhuleni Cases | Ekurhuleni Deaths | Ekurhuleni Recoveries | Tshwane Cases | Tshwane Deaths | Tshwane Recoveries | Sedibeng Cases | Sedibeng Deaths | Sedibeng Recoveries | West Rand Cases | West Rand Deaths | West Rand Recoveries | GP Unallocated Cases | Check |   | Johannesburg A Cases | Johannesburg A Recoveries | Johannesburg B Cases | Johannesburg B Recoveries | Johannesburg C Cases | Johannesburg C Recoveries | Johannesburg D Cases | Johannesburg D Recoveries | Johannesburg E Cases | Johannesburg E Recoveries | Johannesburg F Cases | Johannesburg F Recoveries | Johannesburg G Cases | Johannesburg G Recoveries | Johannesburg Unallocated Cases | Johannesburg Unallocated Recoveries |   |   | Tshwane 1 Cases | Tshwane 2 Cases | Tshwane 3 Cases | Tshwane 4 Cases | Tshwane 5 Cases | Tshwane 6 Cases | Tshwane 7 Cases | Tswhane Unallocated Cases |   | Ekurhuleni East 1 Cases | Ekurhuleni East 2 Cases | Ekurhuleni North 1 Cases | Ekurhuleni North 2 Cases | Ekurhuleni South 1 Cases | Ekurhuleni South 2 Cases | Ekurhuleni Unallocated Cases |   | Sedbeng Lesedi Cases | Sedibeng Emfuleni Cases | Sedibeng Midvaal Cases | Sedibeng Unallocated Cases |   | West Rand Mogale City Cases | West Rand Rand West City Cases | West Rand Merafong City Cases | West Rand Unallocated Cases |   | source | Comment | Tshwane 1 Recoveries | Tshwane 2 Recoveries | Tshwane 3 Recoveries | Tshwane 4 Recoveries | Tshwane 5 Recoveries | Tshwane 6 Recoveries | Tshwane 7 Recoveries | Tshwane Unallocated Recoveries | Ekurhuleni East 1 Recoveries | Ekurhuleni East 2 Recoveries | Ekurhuleni North 1 Recoveries | Ekurhuleni North 2 Recoveries | Ekurhuleni South 1 Recoveries | Ekurhuleni South 2 Recoveries | Ekurhuleni Unallocated Recoveries | Sedibeng Lesedi Recoveries | Sedibeng Emfuleni Recoveries | Sedibeng Midvaal Recoveries | West Rand Mogale City Recoveries | West Rand Rand West City Recoveries | West Rand Merafong City Recoveries

@shaze shaze added the enhancement New feature or request label Jul 22, 2020
@shaze shaze self-assigned this Jul 22, 2020
@SimonRosen173
Copy link
Contributor

Many apologies, but I did not see this issue. I will start working on this now and will get something up ASAP.

SimonRosen173 added a commit to SimonRosen173/covid19za that referenced this issue Aug 20, 2020
Python file to automatically extract Gauteng Covid-19 data from Gauteng Health Department PDF media releases. Used as a simple Python module with a single method call to return relevant data as delimeted string. This is a solution to dsfsi#588.
@SimonRosen173
Copy link
Contributor

Hi, I have made a pull request with a solution to this issue.

shaze added a commit that referenced this issue Aug 20, 2020
Created Python file to automatically extract data from Gauteng Health Department PDF media releases. Solution to issue #588.
@shaze
Copy link
Contributor Author

shaze commented Sep 22, 2020

@SimonRosen173 The script has been working very well to now but they've changed the format and it's now breaking. If you have a moment please could you check. I'll attach two examples below

Many thanks

Scott

Gauteng District Results Monday 21 September 2020.pdf
Gauteng District Results Tuesday 22 September 2020.pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants