Name	Name	Last commit message	Last commit date
parent directory ..
configs	configs
demo	demo
figure	figure
pretrain_models	pretrain_models
src	src
tools	tools
tutorial	tutorial
README.md	README.md
README_cn.md	README_cn.md
config.py	config.py
requirements.txt	requirements.txt
run_local.sh	run_local.sh
train.py	train.py
val.py	val.py

English | 简体中文

Semantic segmentation toolkit based on Visual Transformers

Semantic segmentation aims at classifying each pixel in an image to a specified semantic category, including objects (e.g., bicycle, car, people) and stuff (e.g., road, bench, sky).

Environment

This code is developed under the following configurations:

Hardware: 1/2/4/8 GPU for training and testing Software: Centos 6.10, CUDA=10.2 Python=3.8, Paddle=2.1.0

Installation

Create a conda virtual environment and activate it.

conda create -n paddlevit python=3.8
conda activate ppvit

Install PaddlePaddle following the official instructions, e.g.,

conda install paddlepaddle-gpu==2.1.0 cudatoolkit=10.2 --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/

Install PaddleViT

git clone https://github.com/BR-IDL/PaddleViT.git
cd PaddleViT/semantic_segmentation
pip3 install -r requirements.txt

Demo

We provide a demo script demo.py. This script performs inference on single images. You can put the input images in ./demo/img.

cd demo
CUDA_VISIBLE_DEVICES=0 python3 demo.py \
    --config ${CONFIG_FILE} \
    --model_path ${MODEL_PATH} \
    --pretrained_backbone ${PRETRAINED_BACKBONE} \
    --img_dir ${IMAGE_DIRECTORY} \
    --results_dir ${RESULT_DIRECTRORY}

Examples:

cd demo
CUDA_VISIBLE_DEVICES=0 python3 demo.py \
    --config ../configs/setr/SETR_PUP_Large_768x768_80k_cityscapes_bs_8.yaml \
    --model_path ../pretrain_models/setr/SETR_PUP_cityscapes_b8_80k.pdparams \
    --pretrained_backbone ../pretrain_models/backbones/vit_large_patch16_224.pdparams \
    --img_dir ./img/ \
    --results_dir ./results/

Quick start: training and testing models

1. Preparing data

Pascal-Context dataset

Download Pascal-Context dataset. "pascal_context/SegmentationClassContext" is generated by running the script voc2010_to_pascalcontext.py. Specifically, downloading the PASCAL VOC2010 from http://host.robots.ox.ac.uk/pascal/VOC/voc2010/VOCtrainval_03-May-2010.tar, and annotation file from https://codalabuser.blob.core.windows.net/public/trainval_merged.json. It should have this basic structure:

pascal_context
|-- Annotations
|-- ImageSets
|-- JPEGImages
|-- SegmentationClass
|-- SegmentationClassContext
|-- SegmentationObject
|-- trainval_merged.json
|-- voc2010_to_pascalcontext.py

ADE20K dataset

Download ADE20K dataset from http://sceneparsing.csail.mit.edu/. It should have this basic structure:

ADEChallengeData2016
|-- annotations
|   |-- training
|   `-- validation
|-- images
|   |-- training
|   `-- validation
|-- objectInfo150.txt
`-- sceneCategories.txt

Cityscapes dataset

Download Cityscapes dataset from https://www.cityscapes-dataset.com/. **labelTrainIds.png are used for cityscapes training, which are generated by the script convert_cityscapes.py. It should have this basic structure:

cityscapes
|-- gtFine
|   |-- test
|   |-- train
|   `-- val
|-- leftImg8bit
|   |-- test
|   |-- train
|   `-- val

Trans10kV2 dataset

Download Trans10kV2 dataset from Google Drive. or Baidu Drive. code: oqms . It should have this basic structure:

Trans10K_cls12
|-- test
|   |-- images
|   `-- masks_12
|-- train
|   |-- images
|   `-- masks_12
|-- validation
|   |-- images
|   `-- masks_12

2. Testing

Single-scale testing on single GPU

CUDA_VISIBLE_DEVICES=0 python3  val.py  \
    --config ./configs/setr/SETR_MLA_Large_480x480_80k_pascal_context_bs_8.yaml \
    --model_path ./pretrain_models/setr/SETR_MLA_pascal_context_b8_80k.pdparams

Multi-scale testing on single GPU

CUDA_VISIBLE_DEVICES=0,1 python3 val.py \
    --config ./configs/setr/SETR_MLA_Large_480x480_80k_pascal_context_bs_8.yaml \
    --model_path ./pretrain_models/setr/SETR_MLA_pascal_context_b8_80k.pdparams \
    --multi_scales True

Single-scale testing on multi GPU

CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -u -m paddle.distributed.launch val.py \
    --config ./configs/setr/SETR_MLA_Large_480x480_80k_pascal_context_bs_8.yaml \
    --model_path ./pretrain_models/setr/SETR_MLA_pascal_context_b8_80k.pdparams

Multi-scale testing on multi GPU

CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -u -m paddle.distributed.launch val.py \
    --config ./configs/setr/SETR_MLA_Large_480x480_80k_pascal_context_bs_8.yaml \
    --model_path ./pretrain_models/setr/SETR_MLA_pascal_context_b8_80k.pdparams \
    --multi_scales True

Note:

that the -model_path option accepts the path of pretrained weights file (segmentation model, e.g., setr)

3. Training

Training on single GPU

CUDA_VISIBLE_DEVICES=0 python3  train.py \
    --config ./configs/setr/SETR_MLA_Large_480x480_80k_pascal_context_bs_8.yaml

Note:

The training options such as lr, image size, model layers, etc., can be changed in the .yaml file set in -cfg. All the available settings can be found in ./config.py

Training on multi GPU

CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -u -m paddle.distributed.launch train.py \
    --config ./configs/setr/SETR_MLA_Large_480x480_80k_pascal_context_bs_8.yaml

Note:

The training options such as lr, image size, model layers, etc., can be changed in the .yaml file set in -cfg. All the available settings can be found in ./config.py

Contact

If you have any questions regarding this repo, please create an issue.

Files

semantic_segmentation

Directory actions

More options