Skip to content

alimz758/Covid19-Prediction-Model-----UCLA-CS145-----Intro-to-Data-Mining

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Covid19-Prediction-Model-----UCLA-CS145-----Intro-to-Data-Mining

Course project for UCLA CS145, Introduction to Data Mining

Running the Model

The main driver script is run.py. It takes in a single argument, the ML model type: [NN, PR, AR, ARIMA, ARMA, MA, SARIMA]

Models used for prediction:

PR: Polynomial Regression
NN: Neural Network
AR: Auto Regression
MA: Moving Average
ARIMA
ARMA
SARIMA

ex)

py run.py NN

This will generate a result csv file, matching the Kaggle submission format. To change any configurations, refer to the constant variables declared in run.py, polynomial_regression.py, neural_network.py, or prediction_model.py (superclass of all prediction models).

Initializing Input Data

Partitioning daily report data by states

To transform input data, run:

python transform_input.py

It will then create a csv file for each states, each containing its state's daily report. Miscellaneous states from the input data set are ignored

NOTE Each time this script is ran, all the <state>.csv files are truncated an refilled from the daily report files.

USA daily state reports (csse_covid_19_daily_reports_us)

This table contains an aggregation of each USA State level data.

Create the Test.csv

To create the test.csv file, run:

python create_test_csv.py

Get MAPE

To get MAPE of the prediction vs truth data, run:

python mape.py

File naming convention

MM-DD-YYYY.csv in UTC.

Field description

  • Province_State - The name of the State within the USA.
  • Country_Region - The name of the Country (US).
  • Last_Update - The most recent date the file was pushed.
  • Lat - Latitude.
  • Long_ - Longitude.
  • Confirmed - Aggregated case count for the state.
  • Deaths - Aggregated death toll for the state.
  • Recovered - Aggregated Recovered case count for the state.
  • Active - Aggregated confirmed cases that have not been resolved (Active cases = total cases - total recovered - total deaths).
  • FIPS - Federal Information Processing Standards code that uniquely identifies counties within the USA.
  • Incident_Rate - cases per 100,000 persons.
  • People_Tested - Total number of people who have been tested.
  • People_Hospitalized - Total number of people hospitalized. (Nullified on Aug 31, see Issue #3083)
  • Mortality_Rate - Number recorded deaths * 100/ Number confirmed cases.
  • UID - Unique Identifier for each row entry.
  • ISO3 - Officialy assigned country code identifiers.
  • Testing_Rate - Total test results per 100,000 persons. The "total test results" are equal to "Total test results (Positive + Negative)" from COVID Tracking Project.
  • Hospitalization_Rate - US Hospitalization Rate (%): = Total number hospitalized / Number cases. The "Total number hospitalized" is the "Hospitalized – Cumulative" count from COVID Tracking Project. The "hospitalization rate" and "Total number hospitalized" is only presented for those states which provide cumulative hospital data. (Nullified on Aug 31, see Issue #3083)

Neural Network Model

For more details of Neural Network Model please refer to neural_network.py.

In this class we train based on Neural Network and we use GridSearch to find the best parameters

You can add/remove parameters and their values to see how to find the optimal NN settings. Please only modify the following in neural_network.py

self.parameters = {
    'hidden_layer_sizes': [(80, 80), (70, 70), (60, 60)],
    'activation': ['relu'],
    'solver': ['adam'],
    'learning_rate': ['adaptive'],
    'learning_rate_init': [0.0001, 0.001, 0.005, 0.0005]
} 

About

Covid-19 data processing, training, prediction and validation

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •