Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File xlxs "COVID-19 ISS open data" #1127

Open
Zambo61 opened this issue Apr 5, 2021 · 3 comments
Open

File xlxs "COVID-19 ISS open data" #1127

Zambo61 opened this issue Apr 5, 2021 · 3 comments

Comments

@Zambo61
Copy link

Zambo61 commented Apr 5, 2021

E' possibile avere i dati storici della cartella "Stato_Clinico" del file "COVID-19 ISS open data"?
Mi serve per verificare come sta cambiando nel corso di questi ultimi mesi lo stato clinico dei pazienti attualmente positivi rispetto alla % vaccinati per età.
Ringrazio in anticipi Andrea

@floatingpurr
Copy link

Ciao, non è perfetto ma ecco qui: https://github.com/floatingpurr/covid-19_sorveglianza_integrata_italia

@Zambo61
Copy link
Author

Zambo61 commented Apr 8, 2021

Grazie

@alessiosavi
Copy link

Puoi scaricare il dump dal 2022 sino ad oggi utilizzando questo codice:

Le operazioni dello script sono le seguenti:

#!/usr/bin/env python
# coding: utf-8

# In[1]:


import pandas as pd
from tqdm.notebook import tqdm
import requests
import os
import zipfile
import glob


# In[2]:


base_path = "data/"
csv_path = "csv/"
base_url = "https://www.epicentro.iss.it/coronavirus/open-data/OPENDATA-{}.zip"
sheets_name = [
    "casi_prelievo_diagnosi",
    "casi_inizio_sintomi",
    "casi_inizio_sintomi_sint",
    "casi_regioni",
    "casi_provincie",
    "ricoveri",
    "decessi",
    "sesso_eta",
    "stato_clinico",
]
dfs = {}


# In[9]:


for folder in ["extracted", "raw_extracted", csv_path, base_path]:
    if not os.path.isdir(os.path.join(base_path, folder)):
        os.mkdir(os.path.join(base_path, folder))


# In[4]:


# Download all data from ISS
for year in ["2020", "2021", "2022"]:
    r = requests.get(base_url.format(year), allow_redirects=True)
    open("{}/{}.zip".format(base_path, year), "wb").write(r.content)


# In[5]:


for year in tqdm(["2020", "2021", "2022"]):
    with zipfile.ZipFile("{}/{}.zip".format(base_path, year)) as zf:
        zf.extractall(os.path.join(base_path, "raw_extracted"))


# In[6]:


for file in glob.glob("**/**.xlsx", recursive=True):
    os.replace(file, os.path.join(base_path, "extracted", os.path.basename(file)))


# In[7]:


df = pd.DataFrame()
for f in tqdm(os.listdir(os.path.join(base_path, "extracted"))):
    xls = pd.ExcelFile(os.path.join(base_path, "extracted", f))
    for sheet_name in sheets_name:
        if sheet_name not in dfs:
            dfs[sheet_name] = pd.DataFrame()
        # now read your csv file
        temp = pd.read_excel(xls, sheet_name=sheet_name)
        dfs[sheet_name] = dfs[sheet_name].append(temp)


# In[15]:


for k in tqdm(dfs):
    dfs[k]["iss_date"] = pd.to_datetime(dfs[k]["iss_date"])
    dfs[k].sort_values("iss_date",inplace=True)
    dfs[k].to_csv("{}.csv".format(os.path.join(base_path, csv_path, k)), index=False)


# In[16]:


for k in dfs:
    display(dfs[k].head())

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants