DataScienceCourseProject

This repository contains all the code written in the elaboration of the final project of the course of Data Science of the MSc program in Computer Science and Engineering at IST (2019/2020).

The project goal was the application of data science techniques to discover information in two distinct problems (datasets). It was expected that we explored the datasets and adequately select and learn models suited for the data. Additionally, we should criticize the results achieved, hypothesize causes for the limited performance of certain models and identify opportunities to improve the mining process in a final succinct report.

Project Collaborators:

-André Patrício - https://github.com/Andrempp

-Bernardo Santos - https://github.com/BSantosCoding

-Diogo Viegas

The datasets used are: Parkinson Disease (pd_speech_features.csv). Source data and description in: https://archive.ics.uci.edu/ml/datasets/Parkinson%27s+Disease+Classification

and

Covertype (covtype.info + covtype.data). Source data and description in: https://archive.ics.uci.edu/ml/datasets/Covertype

The structure of the project is the following: final_report.pdf - the final report delivered with all the requested analysis.

20190921.EnunciadoProjecto 2019.pdf - project description

course_labs - contains auxiliary notebooks done throughout the semester, used to learn the implementation of certain data science techniques.

data - contains the 2 datasets analyzed throughout this project.

course_project - contains the code implemented for the project, detailed below

course_project

aux_libs - Auxiliary libraries provided by the faculty members and further modified by the students.

clf_tunning - Set of files where we test and plot the best hyperparameters for each classifier used in the solution. Numbers "1" and "2" correspond respectively to "pd_speech_features" and "covtype". The plots generated by this files are stored in the folder images.

imgs - Set of subfolders with images corresponding to the plots drawn by the various files of the project.

pattern_mining - Python files where we explore which are the adequate preprocessing techniques to apply.

statistical_analysis - Statistical analysis of both datasets.

results - The final results for all classifiers and clustering techniques used for each dataset.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DataScienceCourseProject

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
course_labs		course_labs
course_project		course_project
data		data
20190921.EnunciadoProjecto 2019.pdf		20190921.EnunciadoProjecto 2019.pdf
README.md		README.md
final_report.pdf		final_report.pdf

diogoViegas/Data-Science-Course-Project

Folders and files

Latest commit

History

Repository files navigation

DataScienceCourseProject

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages