Official PyTorch implementation of the method OLIVINE. More details can be found in the paper:
Image-to-Lidar Self-Supervised Distillation for Autonomous Driving Data, under review , by xxx authors.
Please install the required required packages. Some libraries used in this project, including MinkowskiEngine and Pytorch-lightning are known to have a different behavior when using a different version; please use the exact versions specified in requirements.txt
.
The code provided is compatible with nuScenes and semantic KITTI. Put the datasets you intend to use in the "datasets" folder (a symbolic link is accepted).
datasets/
├── nuscenes
├── camseg (semantic labels infered by Grounded-SAM)
├── lidarseg (decompress nuScenes-lidarseg-all-v1.0.tar)
├── maps
├── samples
├── sweeps
├── v1.0-mini
├── v1.0-test
├── v1.0-trainval
└── zip_files
└── semantic_kitti
├── dataset
├── poses
└── sequences
First we use the Grounded-SAM to obtain weak semantic labels of RGB images. The tools can be found in link. We will further refine the introduction to this point after the deadline.
To launch a pre-training of the Minkowski SR-UNet (minkunet) on nuScenes:
python pretrain.py --cfg config/olivine_minkunet.yaml
You can alternatively replace minkunet with voxelnet to pre-train a PV-RCNN backbone.
Weights of the pre-training can be found in the output folder, and can be re-used during a downstream task.
If you wish to use multiple GPUs, please scale the learning rate and batch size accordingly.
To launch a semantic segmentation, use the following command:
python downstream.py --cfg_file="config/semseg_nuscenes.yaml" --pretraining_path="output/pretrain/[...]/model.pt"
with the previously obtained weights, and any config file. The default config will perform a finetuning on 1% of nuScenes' training set, with the learning rates optimized for the provided pre-training.
To re-evaluate the score of any downstream network, run:
python evaluate.py --resume_path="output/downstream/[...]/model.pt" --dataset="nuscenes"
If you wish to reevaluate the linear probing, the experiments in the paper were obtained with lr=0.05
, lr_head=null
and freeze_layers=True
.
All experiments for object detection have been done using OpenPCDet.
All results are obtained with weights pre-trained on nuScenes.
Method | nuScenes lin. probing |
nuScenes Finetuning with 1% data |
KITTI Finetuning with 1% data |
---|---|---|---|
Random init. | 8.1 | 30.3 | 39.5 |
PointContrast | 21.9 | 32.5 | 41.1 |
DepthContrast | 22.1 | 31.7 | 41.5 |
PPKT | 36.4 | 37.8 | 43.9 |
SLidR | 38.8 | 38.3 | 44.6 |
OLIVINE | 47.3 | 46.1 | 47.3 |
Method | 1% | 5% | 10% | 25% | 100% |
---|---|---|---|---|---|
Random init. | 30.3 | 47.7 | 56.6 | 64.8 | 74.2 |
SLidR | 39.0 | 52.2 | 58.8 | 66.2 | 74.6 |
OLIVINE | 46.1 | 57.5 | 63.0 | 69.3 | 76.1 |
All results are obtained with a pre-training on nuScenes.
Results on the validation set using PV-RCNN:
Method | Car | Pedestrian | Cyclist | mAP@40 |
---|---|---|---|---|
Random init. | 84.5 | 57.9 | 71.3 | 71.3 |
STRL* | 84.7 | 57.8 | 71.9 | 71.5 |
PPKT | 83.2 | 55.5 | 73.8 | 70.8 |
SLidR | 84.4 | 57.3 | 74.2 | 71.9 |
OLIVINE | 84.8 | 59.3 | 74.2 | 72.8 |
*STRL has been pre-trained on KITTI, while SLidR and PPKT were pre-trained on nuScenes
Results on the validation set using SECOND:
Method | Car | Pedestrian | Cyclist | mAP@40 |
---|---|---|---|---|
Random init. | 81.5 | 50.9 | 66.5 | 66.3 |
DeepCluster* | 66.1 | |||
SLidR | 81.9 | 51.6 | 68.5 | 67.3 |
OLIVINE | 82.0 | 53.2 | 69.8 | 68.3 |
*As reimplemented in ONCE
We implement the method based on SLidR. Part of the codebase has been adapted from PointContrast. Computation of the lovasz loss used in semantic segmentation follows the code of PolarNet.
OLIVINE is released under the Apache 2.0 license.