Skip to content

The project aims to provide a comprehensive understanding of the dataset related to heart disease and build predictive models to classify whether a patient is likely to have heart disease based on various attributes.

Notifications You must be signed in to change notification settings

rosyteran/Advanced-EDA-Heart-Disease-Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Advanced EDA-Heart Disease Prediction

This repository contains an advanced exploratory data analysis (EDA) and heart disease prediction project. The project aims to provide a comprehensive understanding of the dataset related to heart disease and build predictive models to classify whether a patient is likely to have heart disease based on various attributes.

Dataset

The dataset used in this project is sourced from https://www.kaggle.com/datasets/johnsmith88/heart-disease-dataset , containing a collection of clinical parameters for individuals, including demographic information, medical test results, and lifestyle factors. It comprises both numerical and categorical features, which are crucial in understanding the factors influencing heart disease.

Project Structure

The project is structured as follows:

  • data: This directory contains the dataset used for analysis and modeling.
  • notebooks: This directory contains Jupyter notebooks outlining the exploratory data analysis process, feature engineering, model development, and evaluation.
  • src: This directory contains any source code used in the project, such as utility functions or custom modules.
  • reports: This directory contains any reports generated from the analysis, including visualizations, summary statistics, and model evaluation metrics.

Exploratory Data Analysis (EDA)

The EDA process involves thorough examination and visualization of the dataset to gain insights into the relationships between variables, identify patterns, anomalies, and potential predictive features related to heart disease. The notebooks in the notebooks directory detail the steps taken in the EDA process.

Key components of the EDA process include:

  • Descriptive statistics
  • Data cleaning and preprocessing
  • Univariate and multivariate analysis
  • Visualization of distributions, correlations, and trends
  • Identification of outliers and missing values
  • Feature engineering and selection

Heart Disease Prediction

Following the EDA, predictive models are developed to classify whether an individual is likely to have heart disease based on the available features. Various machine learning algorithms such as logistic regression, decision trees, random forests, support vector machines, or neural networks may be explored and evaluated for their predictive performance.

The prediction task involves:

  • Data splitting into training and testing sets
  • Model training using different algorithms
  • Hyperparameter tuning for optimizing model performance
  • Model evaluation using appropriate metrics such as accuracy, precision, recall, F1-score, and ROC-AUC curve

Getting Started

To get started with this project, follow these steps:

  1. Clone this repository to your local machine.
  2. Navigate to the notebooks directory to explore the Jupyter notebooks containing the analysis and modeling process.
  3. Ensure you have the necessary dependencies installed. You can install them via pip using the requirements.txt file.
pip install -r requirements.txt
  1. Run the notebooks sequentially to replicate the analysis and understand the steps involved.

Requirements

The project requires Python 3.11 along with various libraries such as pandas, numpy, matplotlib, seaborn, scikit-learn, etc. These dependencies are listed in the requirements.txt file.

Contributing

Contributions to this project are welcome. If you find any issues or have suggestions for improvements, please feel free to open an issue or submit a pull request.


Feel free to customize this README according to your project's specifics and preferences.

About

The project aims to provide a comprehensive understanding of the dataset related to heart disease and build predictive models to classify whether a patient is likely to have heart disease based on various attributes.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published