Skip to content

abhilash1910/nd013-c1-vision-starter-main

Repository files navigation

Object detection in an Urban Environment

This is a repository containing the details of the Computer Vision starter module for Udacity Self Driving Car ND.

Data

For this project, we will be using data from the Waymo Open dataset. The files can be downloaded directly from the website as tar files or from the Google Cloud Bucket as individual tf records.

Structure

The data in the classroom workspace will be organized as follows:

/home/backups/
    - raw: contained the tf records in the Waymo Open format. (NOTE: this folder only contains temporary files and should be empty after running the download and process script)

/home/workspace/data/
    - processed: contained the tf records in the Tf Object detection api format. (NOTE: this folder should be empty after creating the splits)
    - test: contain the test data
    - train: contain the train data
    - val: contain the val data

The experiments folder will be organized as follow:

experiments/
    - exporter_main_v2.py: to create an inference model
    - model_main_tf2.py: to launch training
    - experiment0/....
    - experiment1/....
    - experiment2/...
    - pretrained-models/: contains the checkpoints of the pretrained models.

Prerequisites

Local Setup

For local setup if you have your own Nvidia GPU, you can use the provided Dockerfile and requirements in the build directory.

Follow the README therein to create a docker container and install all prerequisites.

Classroom Workspace

In the classroom workspace, every library and package should already be installed in your environment. However, you will need to login to Google Cloud using the following command:

gcloud auth login

This command will display a link that you need to copy and paste to your web browser. Follow the instructions. You can check if you are logged correctly by running :

gsutil ls gs://waymo_open_dataset_v_1_2_0_individual_files/

It should display the content of the bucket.

Instructions

Download and process the data

The first goal of this project is to download the data from the Waymo's Google Cloud bucket to your local machine. For this project, we only need a subset of the data provided (for example, we do not need to use the Lidar data). Therefore, we are going to download and trim immediately each file. In download_process.py, you will need to implement the create_tf_example function. This function takes the components of a Waymo Tf record and save them in the Tf Object Detection api format. An example of such function is described here. We are already providing the label_map.pbtxt file.

Once you have coded the function, you can run the script at using

python download_process.py --data_dir /home/workspace/data/ --temp_dir /home/backups/

You are downloading XX files so be patient! Once the script is done, you can look inside the /home/workspace/data/processed folder to see if the files have been downloaded and processed correctly.

Exploratory Data Analysis

Now that you have downloaded and processed the data, you should explore the dataset! This is the most important task of any machine learning project. To do so, open the Exploratory Data Analysis notebook. In this notebook, your first task will be to implement a display_instances function to display images and annotations using matplotlib. This should be very similar to the function you created during the course. Once you are done, feel free to spend more time exploring the data and report your findings. Report anything relevant about the dataset in the writeup.

Keep in mind that you should refer to this analysis to create the different spits (training, testing and validation).

Create the splits

Now you have become one with the data! Congratulations! How will you use this knowledge to create the different splits: training, validation and testing. There are no single answer to this question but you will need to justify your choice in your submission. You will need to implement the split_data function in the create_splits.py file. Once you have implemented this function, run it using:

python create_splits.py --data_dir /home/workspace/data/

NOTE: Keep in mind that your storage is limited. The files should be moved and not copied.

Edit the config file

Now you are ready for training. As we explain during the course, the Tf Object Detection API relies on config files. The config that we will use for this project is pipeline.config, which is the config for a SSD Resnet 50 640x640 model. You can learn more about the Single Shot Detector here.

First, let's download the pretrained model and move it to training/pretrained-models/.

Now we need to edit the config files to change the location of the training and validation files, as well as the location of the label_map file, pretrained weights. We also need to adjust the batch size. To do so, run the following:

python edit_config.py --train_dir /home/workspace/data/train/ --eval_dir /home/workspace/data/val/ --batch_size 4 --checkpoint ./training/pretrained-models/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8/checkpoint/ckpt-0 --label_map label_map.pbtxt

A new config file has been created, pipeline_new.config.

Training

You will now launch your very first experiment with the Tensorflow object detection API. Create a folder training/reference. Move the pipeline_new.config to this folder. You will now have to launch two processes:

  • a training process:
python model_main_tf2.py --model_dir=training/reference/ --pipeline_config_path=training/reference/pipeline_new.config
  • an evaluation process:
python model_main_tf2.py --model_dir=training/reference/ --pipeline_config_path=training/reference/pipeline_new.config --checkpoint_dir=training/reference/

NOTE: both processes will display some Tensorflow warnings.

To monitor the training, you can launch a tensorboard instance by running tensorboard --logdir=training. You will report your findings in the writeup.

Improve the performances

Most likely, this initial experiment did not yield optimal results. However, you can make multiple changes to the config file to improve this model. One obvious change consists in improving the data augmentation strategy. The preprocessor.proto file contains the different data augmentation method available in the Tf Object Detection API. To help you visualize these augmentations, we are providing a notebook: Explore augmentations.ipynb. Using this notebook, try different data augmentation combinations and select the one you think is optimal for our dataset. Justify your choices in the writeup.

Keep in mind that the following are also available:

  • experiment with the optimizer: type of optimizer, learning rate, scheduler etc
  • experiment with the architecture. The Tf Object Detection API model zoo offers many architectures. Keep in mind that the pipeline.config file is unique for each architecture and you will have to edit it.

Creating an animation

Export the trained model

Modify the arguments of the following function to adjust it to your models:

python .\exporter_main_v2.py --input_type image_tensor --pipeline_config_path training/experiment0/pipeline.config --trained_checkpoint_dir training/experiment0/ckpt-50 --output_directory training/experiment0/exported_model/

Finally, you can create a video of your model's inferences for any tf record file. To do so, run the following command (modify it to your files):

python inference_video.py -labelmap_path label_map.pbtxt --model_path training/experiment0/exported_model/saved_model --tf_record_path /home/workspace/data/test/tf.record --config_path training/experiment0/pipeline_new.config --output_path animation.mp4

Submission Template

Project overview

This is a repository containing the details of the project Udacity Self Driving Car, where we are using Tf object detection API for better detetion of objects such as cars,pedestrians, cyclists etc. This repository contains the details to download the tfrecord sample files from Cloud storage and then split them for training purposes on the object detection API. The dataset used for this purpose is Waymo which can be downloaded from the [Google Cloud Storage Bucket]((https://console.cloud.google.com/storage/browser/waymo_open_dataset_v_1_2_0_individual_files/). In this case, we will be using tfrecord files which we will be modified into tf.Train.Example for the object detection api format. We will also be splitting the dataset into training, validation and testing sets using np.split in "create_splits.py" python program.

Set up

As mentioned in the project rubrics, GPU compatible system should be present for this. In my case, I have used Nvidia Geforce RTX 2060.

  • First the project files should be downloaded through git clone from this repository
  • Navigate to the root directory of the project and use the docker file and requirements.txt from the "build" directory
  • The following command should be run from inside the "build" directory:
docker build -t project-dev -f Dockerfile.gpu
  • Then we create a docker container to run the created image.
docker run -v D:\nd013-c1-vision-starter\:/app/project/ -p 8888:8888 -p 6006:6006 --shm-size=16gb -ti project-dev bash
  • Inside the container, we can use the gsutil command to download the tfrecord from cloud storage:
curl https://sdk.cloud.google.com | bash

-Authentication can be done using

gcloud auth login

  • The following libraries can be installed
pip install tensorflow-gpu==2.3.0
pip install numpy
pip install pandas
pip install matplotlib
pip install seaborn

Dataset

In the dataset, we have to fit rectangular bounding boxes on the images with objects ,which includes pedestrians, cyclists and cars.Images are taken from different places, and different weather conditions and at different time of the day (day/night).The image set contains diverse set of images of which some are blurry, clear, light and some are dark. A sample image in dark and foggy background is provided below

Dataset analysis

The dataset presents most of the labels being associated with cars and pedestrians with the sample size of cyclists being very small. The proportionate amount of the counts of the different labels are in "Exploratory Data Analysis.ipynb" file. There is a class imbalance which can be handled with oversampling techniques. A sample image of the proportional counts of the labels (cars,pedestrians,cyclists) are shown below:

Images are taken in different environments(subway/highway/city) with different weather conditions(foggy/sunny) and different times of the day(day/night).The bounding boxes are red for the vehicles, green for the cyclists and blue for the pedestrians.

img1img2img3img4img5img6img7img8img9img10

Further analysis of the dataset shows that most images contain vehicles and pedestrians (majority vehicles), and very few sample images have cyclists in them. The chart below shows a bar plot for the distribution of classes (cars, pedestrians, and cyclists), over a collection of 20000 random images in the dataset. The analysis is updated in the "Exploratory Data Analysis.ipynb" notebook.

Below are histogram charts of the distribution of vehicls/cars, pedestrians and, cyclits in 20000

Distribution of Cars

car distribution

Here we observed out of the 20000 images, above 15000 images have at least 10 vehicles present in them and the also the maximum number of vehicles present in an object is about 67.

Distribution of Pedestrians

pedestrians distribution

Here we observed that out of the 20000 images, about 5000 images have at least 10 pedestrians present in them. over 10000 images have at least 1 pedestrian.

Distribution of Cyclits

cyclits distribution

Here we observed there are very few cyclists presnt in images. The msmximum number of cyclits present in an image is just 6 and only about 2400 images have at least 1 cyclit present in them.

Cross validation

We are using 100 tfrecord files. We first shuffle the data randomly and then split into training,testing and validation sets. The reason for random shuffling is to reduce the class imbalance in each sample. The shuffling ensures approximately equal distribution of samples in the training,testing and validation datasets.

In this case, we are using 0.75 : 0.15 as the proportion of training and validation data since we are using only 100 tfrecord samples. This ensures that we have sufficient data for training as well as validation.We are using 10% (0.1) of the sample as the test set to check the error rate and if the model is overfitting.Ideally,overfitting should not be an issue since 75% is under training and rest 10% is for testing.

Training

Reference experiment

The residual network model (Resnet) without augmentation , model loss is shown below:

Initially the model was overffiting as the training loss was diverging from the validation loss.The training loss is indicated in orange and the validation loss in blue.This divergence indicates a significant error rate during model validation- an indication that the model is overfitting. The precision and recall curves indicate that the performance of the model slowly increases, as both precision and recall start to increase. A high recall rate is often not suitable and the model performance is not that great. Precision:

Recall:

Improve on the reference

To improve on the model performance, the first step was to augment the images by converting them to grayscale with a probability of 0.2. After this, we have clamped the contrast values between 0.6 and 1.0 such that more lighting datapoints are available for classification. A greater part of the images were a bit darker and increasing the brightness to 0.3 provided an even datapoint which could be better classified with the model.The pipeline changes are there in pipeline_new.config

Augmentations applied:

  • 0.02 probability of grayscale conversion
  • brightness adjusted to 0.3
  • contrast values between 0.6 and 1.0

Grayscale images:

Night(Darker) Images:

Contrast Images:

The details of the run can be found here : "Explore augmentations.ipynb"

The model loss with augmentation :

Precision with Augmentation:

Recall with Augmentation:

The loss is lower than the previous loss (un-augmented model). This is an indication of better performance. There should be more samples of augmented datapoints such as combining the contrast values with grayscale. Brightness can also be clamped within a limit instead of fixing it to 0.3 However the most important point is to add more samples of cyclists,pedestrians which are in a low quantity in the dataset. This is an inherent requirement since model biases play an important role in the loss curves and lesser the diversity in training samples, the lower will be the accuracy.

We have reduced overfitting to an extent with augmentation, however better classification results would be resulting from a more balanced dataset.