This repository contains the prediction of baseball statistics using MLB Statcast Metrics.
Goals
- Using MLB Statcast Metrics, summarize and examine baseball statistics.
Classification
-
Build and train models to predict home runs and extra-base hits implementing the following approaches:
- Logistic Regression
- k-Nearest Neighbors Classification
- Decision Trees Classification
- Random Forests Classification
- Support Vector Machines Classification
- XGBoost Classification
- Neural Networks Classification
-
Implement over-sampling for imbalanced data to improve the quality of predictive modeling (i.e., generalizability).
-
Apply regularization and cross-validation techniques for model evaluation, selection, and optimization.
Regression
-
Build and train models to predict hit distance implementing the following approaches:
- Linear Regression
- Decision Trees Regression
- Random Forests Regression
-
Apply regularization (Ridge, Lasso, Elastic Net) and cross-validation (k-fold) techniques for model evaluation, selection, and optimization.