Skip to content


Repository files navigation

Semantic Segmentation

A project to semantically segment images with SegNet-like architecture

The reference paper [1] can be found at

SegNet-like because the Maxpooling indices are not shared with decoder.

Use KITTI semantic segmentation dataset and create following folder structure

|    |__train/
|    |   |_images/
|    |   |_labels/
|    |__test/
|        |_images/
|        |_labels/


$ pip install -r requirements.txt
$ python
$ tensorboard --logdir=graph/


This semantic segmentation neural network architecure follows an Encoder-Decoder pattern. Here the image in convolved and maxpooled in the encoder and transpose-convolved and upsampled in the decoder network. Encoder network is used for image recognition and decoder for image segmentation. Utilising the transfer learning [2] technique, the encoder network was replaced with pre trained VGG16 [3] CNN to reduce training time and improve the accuracy. This made the CNN smaller and simpler enough to be trained using a GeForce 940M.

Training ran for 200 epocs with 8 images per batch.


[1] V. Badrinarayanan, A. Kendall, and R. Cipolla, "SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation", arXiv:1511.005eprint61, 2015.

[2] A. Karpathy, "CS231n Convolutional Neural Networks for Visual Recognition", Available:

[3] K. Simonyan & A. Zisserman, "Very Deep Convolutional Networks for Large-Scale Image Recognition", ICLR 2015,