Advanced EDA-Heart Disease Prediction

This repository contains an advanced exploratory data analysis (EDA) and heart disease prediction project. The project aims to provide a comprehensive understanding of the dataset related to heart disease and build predictive models to classify whether a patient is likely to have heart disease based on various attributes.

Dataset

The dataset used in this project is sourced from https://www.kaggle.com/datasets/johnsmith88/heart-disease-dataset , containing a collection of clinical parameters for individuals, including demographic information, medical test results, and lifestyle factors. It comprises both numerical and categorical features, which are crucial in understanding the factors influencing heart disease.

Project Structure

The project is structured as follows:

data: This directory contains the dataset used for analysis and modeling.
notebooks: This directory contains Jupyter notebooks outlining the exploratory data analysis process, feature engineering, model development, and evaluation.
src: This directory contains any source code used in the project, such as utility functions or custom modules.
reports: This directory contains any reports generated from the analysis, including visualizations, summary statistics, and model evaluation metrics.

Exploratory Data Analysis (EDA)

The EDA process involves thorough examination and visualization of the dataset to gain insights into the relationships between variables, identify patterns, anomalies, and potential predictive features related to heart disease. The notebooks in the notebooks directory detail the steps taken in the EDA process.

Key components of the EDA process include:

Descriptive statistics
Data cleaning and preprocessing
Univariate and multivariate analysis
Visualization of distributions, correlations, and trends
Identification of outliers and missing values
Feature engineering and selection

Heart Disease Prediction

Following the EDA, predictive models are developed to classify whether an individual is likely to have heart disease based on the available features. Various machine learning algorithms such as logistic regression, decision trees, random forests, support vector machines, or neural networks may be explored and evaluated for their predictive performance.

The prediction task involves:

Data splitting into training and testing sets
Model training using different algorithms
Hyperparameter tuning for optimizing model performance
Model evaluation using appropriate metrics such as accuracy, precision, recall, F1-score, and ROC-AUC curve

Getting Started

To get started with this project, follow these steps:

Clone this repository to your local machine.
Navigate to the notebooks directory to explore the Jupyter notebooks containing the analysis and modeling process.
Ensure you have the necessary dependencies installed. You can install them via pip using the requirements.txt file.

pip install -r requirements.txt

Run the notebooks sequentially to replicate the analysis and understand the steps involved.

Requirements

The project requires Python 3.11 along with various libraries such as pandas, numpy, matplotlib, seaborn, scikit-learn, etc. These dependencies are listed in the requirements.txt file.

Contributing

Contributions to this project are welcome. If you find any issues or have suggestions for improvements, please feel free to open an issue or submit a pull request.

Feel free to customize this README according to your project's specifics and preferences.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Heart_Disease_Prediction_Part_1_EDA.ipynb		Heart_Disease_Prediction_Part_1_EDA.ipynb
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Advanced EDA-Heart Disease Prediction

Dataset

Project Structure

Exploratory Data Analysis (EDA)

Heart Disease Prediction

Getting Started

Requirements

Contributing

About

Releases

Packages

Languages

rosyteran/Advanced-EDA-Heart-Disease-Prediction

Folders and files

Latest commit

History

Repository files navigation

Advanced EDA-Heart Disease Prediction

Dataset

Project Structure

Exploratory Data Analysis (EDA)

Heart Disease Prediction

Getting Started

Requirements

Contributing

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages