Pet Adoption Speed Predictor

A Data Science capstone project completed in fulfilment of the Coursera course - IBM Advanced Data Science Capstone - the final course in the IBM Advanced Data Science Specialization.

Dataset

Kaggle dataset - PetAdoption has been used. Dataset contains tabular, image and text (descriptive) data, which makes it quite challenging and interesting to work with. In this project, only the tabular and image data has been used to train and validate the model. More information about the dataset, its attributes and image data can be found in the above link.

Notebook

The major(primary) tasks present within the notebook are:

ETL (Extract Transform Load)
Data Quality Assessment
Data Exploration
Data Visualization
Feature Engineering
Model Definition
Model training
Model Evaluation

Proposed Model

The proposed model comprises of 3 Neural networks, one of which is a pretrained network (on the imagenet dataset). This 3-net model has been approached due to the presence of categorical + continuous data (including images).

The pretrained network (DenseNet169 has been used to obtain image features (feature_vec1) from the pet images.
The 1st network (NN-1) is trained on the tabular/relational dataset (all the attributes after feature engineering are categorical) with the actual labels.
Trained NN-1 output is then used to prepare the input for the 2nd network (NN-2). An 1-D feature vector (feature_vec2) is extracted from a Dense layer of the trained NN-1.
Both feature vectors (feature_vec1 and feature_vec2 - both 1D) are then concatenated to form a new 1-D feature vector which forms the input for training NN-2
The architecture can be imagined to be something similar to the following (except the pretrained network is missing which takes in the continous data):

Image taken from StackExchange

Technology

Keras(latest version) with Tensorflow 2.2 has been used in the project. Python 3.7.x has been used to programme the solution. Should work with any python version >= 3.5. Training the 1st network (NN-1) can be done on CPU, whereas the NN-2 training require GPU computation (TPU recommended), as batches of data (tabular + image) are generated dynamically (preprocessing of images done on the generated batch before feeding into the net) during training.

Additional

The ADD (Architectural Decisions Document) can be found in the repo along with the notebook
A gist of the notebook - notebook gist
Project Presentation (with a demo of the notebook) can be found here

Citation (if referred)

Cite the work as:
Abhilash Neog, Pet Adoption-Speed predictor (July 2020), GitHub repository, https://github.com/abhilash97/Pet-Adoption-Speed-Predictor

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Adoption-Speed Predictor.ipynb		Adoption-Speed Predictor.ipynb
Architectural Decisions Document.pdf		Architectural Decisions Document.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adoption-Speed Predictor.ipynb

Adoption-Speed Predictor.ipynb

Architectural Decisions Document.pdf

Architectural Decisions Document.pdf

README.md

README.md

Repository files navigation

Pet Adoption Speed Predictor

Dataset

Notebook

Proposed Model

Technology

Additional

Citation (if referred)

About

Releases

Packages

Languages

abhilash-neog/Pet-Adoption-Speed-Predictor

Folders and files

Latest commit

History

Repository files navigation

Pet Adoption Speed Predictor

Dataset

Notebook

Proposed Model

Technology

Additional

Citation (if referred)

About

Topics

Resources

Stars

Watchers

Forks

Languages