Skip to content

Predicting Baseball Statistics: Classification and Regression Applications in Python Using scikit-learn and TensorFlow-Keras

Notifications You must be signed in to change notification settings

tweichle/Predicting-Baseball-Statistics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 

Repository files navigation

Predicting-Baseball-Statistics

Classification and Regression Applications in Python Using scikit-learn and TensorFlow-Keras

This repository contains the prediction of baseball statistics using MLB Statcast Metrics.

ap_mlb_1_stadium

Goals

  • Using MLB Statcast Metrics, summarize and examine baseball statistics.

Classification

  • Build and train models to predict home runs and extra-base hits implementing the following approaches:

    • Logistic Regression
    • k-Nearest Neighbors Classification
    • Decision Trees Classification
    • Random Forests Classification
    • Support Vector Machines Classification
    • XGBoost Classification
    • Neural Networks Classification
  • Implement over-sampling for imbalanced data to improve the quality of predictive modeling (i.e., generalizability).

  • Apply regularization and cross-validation techniques for model evaluation, selection, and optimization.

Regression

  • Build and train models to predict hit distance implementing the following approaches:

    • Linear Regression
    • Decision Trees Regression
    • Random Forests Regression
  • Apply regularization (Ridge, Lasso, Elastic Net) and cross-validation (k-fold) techniques for model evaluation, selection, and optimization.