Skip to content

An analysis of heart failure cases using SQL and Power BI

Notifications You must be signed in to change notification settings

colbystout/Heart-Failure-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Heart-Failure-Analysis

An Analysis of Heart Attack Causes and Trends

The Dataset

The dataset was clinically and professionally collected by the party below. The original hosting link and heart failure study is available after the author credit. I obtained the dataset through Kaggle.com, and it can be found here.

Davide Chicco, Giuseppe Jurman: Machine learning can predict survival patients with heart failure from serum creatine and ejection fraction alone. BMC Medical Informatics and Decision Making 20, 16 (2020).

This dataset was built and structured with the main focus being on machine learning. The .csv for the original data as well as the transformed data + change log can be found in the "Data Set and Changelogs" folder.

License: CC BY 4.0

Methodology

The original dataset was cleaned in excel by giving Patient ID keys to each record, as none previously existed, and also to replace 1/0 values to M/F for the sex column. The purpose of this notebook is to execute exploratory analysis on the heart failure dataset for trends and summary tables. After exploratory analysis is done, I will move the data to Power BI for detailed visualization. Even though the dataset was designed for machine learning, there are still many insights to be made through descriptive analysis.

Note: The most important piece of this dataset is the hardest part to understand. The "time" attribute denotes the last status of the patient observed by the scientists, who collected the data, until the patient either died or they lost contact. Losing contact will be considered a "survival" by me and denoted as a "0" as a data point. Any 1's in this attribute are officialy counted as deaths at the follow-up point by the scientists. This is a small inconsistency in the data set from this longitudinal study for my analysis, but will still offer enough accuracy for insights.

The Questions that Need Answered

  1. Which age groups are at the highest risk of heart disease?
  2. Which attribute contained in the dataset is the highest indicator of heart disease?
  3. Are certain attributes in the dataset more likely to affect men more than women, or vise versa?
  4. Are there any key indicators that deceased patients show that might help us learn how to decrease the mortality rate?

Using SQL

The SQL notebook, "Heart Failure Analysis.ipynb," contained within the repository, outlines my use of SQL to find exporatory trends before moving into Power BI for more complex analysis. I used Azure Data Studio to build the notebook.

Notebook Steps

  1. The early part of the notebook I queried simple breakdowns to get an idea of how attributes distributed across each sex.

Example: Screen Shot 2022-10-08 at 6 13 47 PM

  1. I aggregated a histogram of ages to find distributions of age groups most likely to have a heart attack, and generate averages for each age group in key attributes.

Example: Screen Shot 2022-10-08 at 6 16 08 PM

  1. I further segmented ages into 10 bins to increase the n for outlier ages. This allowed for more trustworthy results across most age bins.

Example: Notebook Screenshot

Example: Screen Shot 2022-10-08 at 6 18 39 PM

Analyzing in Power BI

PDF Export of my Power BI pages are contained in the "Power BI" folder.

PAGES:

  1. Dashboard
  2. Metrics
  3. Time Trends
  4. Level Comparisons by Gender
  5. Mortality Rates
  6. Age Bin Analysis
  7. Time Quadrant Analysis

Building the Dashboard

  1. I imported the data through my Google Drive.
  2. (No PowerQuery transformations were needed due to the editing of the raw data in Google Sheets.)
  3. Used DAX to write a Mortality Rate metric image
  4. Created a series of pages with exploratory visuals in order to discover the most insightful trends.
  5. Finished with the "Dashboard" page to compile the most compelling visuals generated throughout each page.

Example: Power BI Dashboard Screenshot

Results

Analysis Key Findings

  1. Men are more likely to experience heart failure. This is already known but important to establish once again.
  2. People in their 60s are most likely to experience heart failure.
  3. Mortality rate increases as age increases.
  4. The majority of deaths occured in 75 days or less days of final follow-up time by the scientists. The majority of patients would "survive" the heart attack if they stayed alive at least 75 days after the heart attack.
  5. Hypertension and Anaeamia are higher risk factors of death than smoking are diabetes are.
  6. The first quadrant of follow-up times, which also had the highest deaths, had far higher average serum creatine levels than the other three.
  7. CPK levels are much more likely to spike high for men than they are women when serum sodium levels are 130 or higher.
  8. Hypertension is more likely for men when serum sodium reaches 130 or higher.
  9. Almost 1/3 of heart attackes resulted in death. (The data set does not have a high enough sample size or give a demographic/location to make further statements on this statistic.)

Recommendations

  1. Monitoring serum sodium levels and keeping them under 130 will greatly decrease the risk of heart failure.
  2. Patients with the highest serum creatine levels should be monitored the closest during the first 3 months following a heart attack.
  3. Patients with anaemia or hypertension, and especially those with both, should be treated with priority; there is almost a 40% mortality rate for these patients.
  • According to the National Nutrition Council, hypertension is shown to be caused by severe anaemia due to the body trying desperately to get oxygen to the body.

Data Requests

  1. A larger sample size.
  2. Race/ethnicity demographic data.
  3. Patient geographical location.

Collecting data on the above three criteria would greatly improve effectiveness and accuracy of analysis.

Please feel free to message me further insights or recommendations; thank you for reading!