Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gis_nicd_scraper #840

Open
lrossouw opened this issue Jun 9, 2021 · 24 comments
Open

gis_nicd_scraper #840

lrossouw opened this issue Jun 9, 2021 · 24 comments
Labels
enhancement New feature or request

Comments

@lrossouw
Copy link
Collaborator

lrossouw commented Jun 9, 2021

Is your feature request related to a problem? Please describe.
The new way to share data is here:
https://sacoronavirus.co.za/live-counter/

Not via the NICD page so my scraper will need to be retired and redesigned.

@lrossouw lrossouw added the enhancement New feature or request label Jun 9, 2021
@lrossouw
Copy link
Collaborator Author

And it's broken...

@vukosim
Copy link
Member

vukosim commented Jun 10, 2021

We are unfortunately dealing with the fallout of digital vibes. Would NICD be better @lrossouw?

@lrossouw
Copy link
Collaborator Author

It looked like they stopped doing them. But I see the 9th is available again. They skipped the 8th though.

@lrossouw
Copy link
Collaborator Author

I also see the NICD page layout has changed. Captured all data manually until 9 June. Going to wait for the dust to settle before I update my scripts, but of course this week appears to be key in terms of stats that is coming out!

@lrossouw
Copy link
Collaborator Author

lrossouw commented Jun 12, 2021

I've created something new that would (hopefully!) be more stable. It collects from various sources so no longer posting the exact urls as the source. I'm flagging auto scraped results as source = "gis_nicd_scraper" as most of the data is scraped from public dashboard there, Don't have anything for vaccines yet.

@lrossouw lrossouw changed the title New Method for Sharing Data -> New Scraper gis_nicd_scraper Jun 12, 2021
@lrossouw
Copy link
Collaborator Author

lrossouw commented Jun 16, 2021

I scan two or three different dashboards for the figures. I typically can get the cases and tests the evening they are released from one dashboard (but they do break the dash from time to time). I usually seem to pickup the deaths and recoveries the next day (and the cases from this if the other dashboard is broken). Haven't solved the vaccines yet.

But these are more stable than scarping the pages of the media releases that change all the time.

@dmackie
Copy link
Collaborator

dmackie commented Jun 30, 2021

@lrossouw It seems the scraper has stopped working again? Before I do a manual update just want to check on it's status?

@lrossouw
Copy link
Collaborator Author

Tx did not notice with all the other COVID-19 news out. Will have a look.

@lrossouw
Copy link
Collaborator Author

lrossouw commented Jun 30, 2021

Sometimes the Rt calculation is still running (someone else maintains that?) and it commits back and creates a conflict with my process. So my bot keeps updating my local repo but can't push until I manually resolve the conflict.

Not sure how to fix that.

Anyway it's resolved now.

@lrossouw
Copy link
Collaborator Author

lrossouw commented Jun 30, 2021

It might be this:
f8bfa83#diff-0e6e5c3c2330a562992a4157e9afb54fdea1938025dd074fec10a03e4e655aed

Can we make it pull before the push here as my bot might have made changes while this bot was running. That way it seems less likely to get into conflicts. @vukosim do you maintain that code?

@vukosim
Copy link
Member

vukosim commented Jun 30, 2021

I will check late this evening. It runs after a change to the file.

@lrossouw
Copy link
Collaborator Author

My bot posts new case data, Rt bot runs and then updates new data comes in and my bot update again while Rt is running. It creates a technical merge conflict but Rt bot uses --force so overwrites. Perhaps do a pull just before the push to bring the latest changes in. So you don't effectively reverse my or other changes. Rt bot has also reversed other data I captured manually before. I.e. I capture vaccine data while it's running and then it kind of reverse it.

@dmackie
Copy link
Collaborator

dmackie commented Jul 1, 2021

I did a manual update of Death and Recoveries today as they had not updated by mid-day.

#854

@lrossouw
Copy link
Collaborator Author

lrossouw commented Jul 4, 2021

Sorry just noticed now. Will sort it out.

@lrossouw
Copy link
Collaborator Author

lrossouw commented Jul 4, 2021

Fixed. @vukosim did you managed to update the bot?

@janvdl
Copy link
Contributor

janvdl commented Jul 15, 2021

Good morning. The provincial data for confirmed cases for 2021-07-05 is missing. Should I add it manually or would you prefer to have the scraper make another pass and add it instead?

@lrossouw
Copy link
Collaborator Author

@janvdl
Copy link
Contributor

janvdl commented Jul 15, 2021

Seems to be there (line 487):

https://github.com/dsfsi/covid19za/blob/master/data/covid19za_provincial_cumulative_timeline_confirmed.csv#L487

Louis, apologies if I missed something, will check my import again tonight and get back to you.

@janvdl
Copy link
Contributor

janvdl commented Jul 15, 2021

Seems to be there (line 487):

https://github.com/dsfsi/covid19za/blob/master/data/covid19za_provincial_cumulative_timeline_confirmed.csv#L487

My mistake, sorry. Data for 05 July 2021 is indeed available in the confirmed cases file, but is missing from cumulative deaths and cumulative recoveries.

@lrossouw
Copy link
Collaborator Author

Ah 5 July had issue I believe:https://www.nicd.ac.za/latest-confirmed-cases-of-covid-19-in-south-africa-05-july-2021/

They did not release provincial figures for deaths and recoveries on NICD site but I see now they are available here:
https://sacoronavirus.co.za/2021/07/05/update-on-covid-19-05th-july-2021/

Feel free to capture.

@lrossouw
Copy link
Collaborator Author

My data source stopped providing deaths/recoveries in machine readable form on 30 July or so. What sources are people using?

@shaze
Copy link
Contributor

shaze commented Sep 18, 2021

@lrossouw Does this problem apply to testing too? covid19za_timeline_testing.csv I don't know how I missed that this has not been updating since the end of July.

@vukosim
Copy link
Member

vukosim commented Sep 20, 2021

Thanks @shaze yeah that needs to be updated.

@vukosim
Copy link
Member

vukosim commented Sep 20, 2021

Also looping @krokkie seems we have a few more failures. So we might need again to sync between you and @lrossouw

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants