This repository contains Python code for performing Exploratory Data Analysis (EDA) on the Iris dataset. The analysis includes generating summary statistics, creating histograms, scatter plots, and a correlation heatmap for the numeric variables.
The UCI Machine Learning Repository provides 150 samples of iris flowers with four features: sepal length, sepal width, petal length, and petal width. These samples make up the Iris dataset. One of the three species—Iris setosa, Iris versicolor, or Iris virginica—represents each sample.
The approach to the analysis is as follows:
-
Data Fetching: The dataset is fetched using the ucimlrepo package.
-
DataFrame Creation: The data is converted into a Pandas DataFrame.
-
Summary Statistics: Basic summary statistics for each numeric variable are computed and saved to a text file. Provides an overview of the central tendency, dispersion, and shape of the dataset's distribution
-
Histograms: Histograms are created for each numeric variable to visualize their distributions. Help in understanding the distribution and frequency of numeric variables.
-
Scatter Plots: Scatter plots for each pair of numeric variables are generated to observe relationships. Useful for identifying potential relationships and correlations between pairs of variables.
-
Correlation Heatmap: A heatmap of the correlation matrix is created to show correlations between variables.
──> iris_EDA_G00305450.py
──> output_files1/
──> summary_statistics.txt
──> sepal_length_histogram.png
──> sepal_width_histogram.png
──> petal_length_histogram.png
──> petal_width_histogram.png
──> scatter_plots.png
──>README.md
──>requirements.txt
iris_EDA_G00305450.py: Main script that performs the EDA. Output_files1/: Directory where the output files (summary statistics, histograms, scatter plots) are saved.
- Summary Statistics: Stored in output_files1/summary_statistics.txt.
- Histograms: Stored as PNG files in output_files1/.
- Scatter Plots: Stored as scatter_plots.png in output_files1/.
- Correlation Heatmap: Stored as correlation_matrix.png in output_files1/.
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. Download and extract the zip folder here pands-project
VS Code . You can download here VS Code
Once you have downloaded the executable file click on it and it will automatic guide you for installation.
To run the code in this repository, you need to have Python 3.x and the following Python packages installed:
-
ucimlrepo
-
pandas
-
seaborn
-
matplotlib
-
You can install the required packages using pip:
-
pip install ucimlrepo pandas seaborn matplotlib
File menu from VS Code ----> Hit the open ------> select iris_EDA_G00305450 python file from file Explorer
To run -> iris_EDA_G00305450.py
Sometimes code shows this ERROR iris = fetch_ucirepo(id=53) ^^^^^^^^^^^^^^^^^^^^
has ConnectionError: Error connecting to server
, Just run it again if you get this error.
Visual Studio Code -> VS Code
Version 1
This project is licensed under the MIT License - see the LICENSE.md file for details