Skip to content

Predicting housing prices in Iowa using Python/Pandas/linear regression within SKLearn.

Notifications You must be signed in to change notification settings

gilaniasher/kaggle-house-regression-challenge

Repository files navigation

Kaggle House Regression Challenge

This is the "Pandas Express" submission for the Kaggle House Prices: Advanced Regression Techniques Challenge as part of the BMGT438A data science class. Note that the code in this repository relies on data provided by Kaggle which has been removed from the repository's history. Please visit Kaggle to see this dataset.

Final Video Presentation

Final presentation including analysis can be viewed here:

BMGT438A Final Video Presentation

Final Poster

Methodology

  1. Clean up the data by turning all of the categorical columns (e.g Neighborhood) into a format an ML model can read using pd.get_dummies()
  2. Create an initial OLS (Ordinary Least Squares) model to see its r^2 value and see whether there are any other problems with the data
  3. Make use of SKlearn's automated feature selection package by using RFECV (Recursive Feature Selection with Cross Validation) to recursively determine the number of features to use in the final model as well as what those features are
  4. Explore SKlearn's Univariate Automated Feature Selection to see if it performs better than RFECV
  5. Build the final model and analyze the residuals to look for outliers and see if there are any patterns in the model's inaccuracies
  6. Run the final model on the test dataset to predict prices needed for the final Kaggle submission

About

Predicting housing prices in Iowa using Python/Pandas/linear regression within SKLearn.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published