This repository contains code for predicting blood donation likelihood using machine learning models. The dataset used in this project (transfusion.data
) contains information about individuals' blood donation history.
- Loaded the dataset from a CSV file (
transfusion.data
). - Renamed the target column to "target".
- Checked the data types and basic information about the dataset.
- Split the dataset into training and testing sets using
train_test_split()
fromsklearn.model_selection
.
- Trained the
TPOTClassifier
model to find the best pipeline for predicting blood donation likelihood. - Evaluated the model's performance on the testing data using ROC AUC score.
- Normalized the specified column ("Monetary (c.c. blood)") using log transformation.
- Checked the variance of the normalized data.
- Trained a logistic regression model using the normalized training data.
- Evaluated the logistic regression model's performance on the testing data using ROC AUC score.
- Compared the performance of the TPOT model and logistic regression model based on their AUC scores.
- Serialized the trained logistic regression model using pickle and saved it to a file (
logistic_regression_model.pkl
). - Demonstrated loading the saved model from the file for future use.
- Python 3
- numpy
- pandas
- streamlit
- scikit-learn
- tpot