Skip to content

Investigation of the value of commercial sales data on respiratory death predictions using Model Class Reliance

License

Notifications You must be signed in to change notification settings

nhsx/commercial-data-healthcare-predictions

Repository files navigation

Value of Commercial Product Sales Data in Healthcare Prediction

NHSX Analytics Unit - PhD Internship Project

About the Project

This repository holds code for the NHSX Analytics Unit PhD internship project investigating the use of model class reliance to identify the value of including commerical sales data in respiratory death predictions by Elizabeth Dolan.

Project Description - Value of Commercial Product Sales Data in Healthcare Prediction

Note: No data, public or private are shared in this repository.

Project Stucture

  • The main code is found in the root of the repository (see Usage below for more information)
  • The accompanying report is also available in the reports folder. Results and Discussion can be read as a full pre-print via https://www.researchsquare.com/article/rs-2226531/v1.
  • The Python libraries needed are listed in the requirements document. Please take note, you will need to go to https://github.com/gavin-s-smith/mcrforest to install the packages for MCR (Model Class Reliance). You may need to install numpy and Cython before the mcrforest will install. You will also need to install sci-kit learn version 0.24.2 in order to run the code "from sklearn.model_selection import TimeSeriesSplit" . This TimeSeriesSplit version has the correct parameters to ensure no data leakage in the time series cross validation.

Built With

Python v3.8

Getting Started

Installation

To get a local copy up and running follow these simple steps.

To clone the repo:

git clone https://github.com/nhsx/commercial-data-healthcare-predictions.git

To create a suitable environment:

  • python -m venv env or virtualenv -p /path/to/required/python/version .venv
  • source .venv/bin/activate
  • (may need to) pip install numpy & pip install Cython
  • pip install git+https://github.com/gavin-s-smith/mcrforest
  • pip install -r requirements.txt

You may need to install pyscopg2 (https://www.psycopg.org/docs/install.html) which in turn can require gcc and additions to your PATH (https://stackoverflow.com/questions/5420789/how-to-install-psycopg2-with-pip-on-python).

Caveats for Apple Macbook Pro M1 users: Sklearn will not install using usual methods, the installation errors citing a build dependencies issue. Use this line to install; pip3 install -U --no-use-pep517 scikit-learn==0.24.2 More information on this see scikit-learn/scikit-learn#19137

Usage

Note: In it's current form this repoistory has been shared with fake data to allow the codes to run. This data is randomly sampled from the same metadata features as the data but bears no resemblance to the ground truth data.

run Create_op_rf_for_mcr.py to create a set of models to predict registered deaths from respiratory disease. These models used commercial sales data and a wide range of other variables, which have shown associations with deaths from respiratory disease.

run MCR_for_op_rf.py to create explanations for the models by identifying the different impact variables inputted have on the models’ predictions, including commercial sales data. This code implements the novel variable importance tool MCR for random forest regressor.

Dataset

Experiments are run against the:

Roadmap

See the open issues for a list of proposed features (and known issues).

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

See CONTRIBUTING.md for detailed guidance.

License

Distributed under the MIT License. See LICENSE for more information.

Contact

To find out more about the Analytics Unit visit our project website or get in touch at [email protected].

Acknowledgements

About

Investigation of the value of commercial sales data on respiratory death predictions using Model Class Reliance

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published