Skip to content

Helper tools for extracting and projecting ENet features to ScanNet pointclouds.

Notifications You must be signed in to change notification settings

3dlg-hcvc/ENet-ScanNet

Repository files navigation

ENet on ScanNet

This repo contains helper tools for extracting ENet features for ScanNet video frames, which can be used to generate projected multi-view per-point features for 3D spoint clouds as model input (e.g. ScanRefer, 3DVG-Transformer, 3D-SPS, 3DJCG, D3Net, M3DRef-CLIP, etc.).

Setup

Conda (recommended)

We recommend the use of miniconda to manage system dependencies.

# create and activate the conda environment
conda create -n enet python=3.10
conda activate enet

# install PyTorch 2.0.1
conda install pytorch torchvision pytorch-cuda=11.7 -c pytorch -c nvidia

# install packages
pip install -r requirements.txt

Pip

# create and activate the virtual environment
virtualenv env
source env/bin/activate

# install PyTorch 2.0.1
pip install torch torchvision

# install packages
pip install -r requirements.txt

Data Preparation

ScanNet v2 dataset

  1. Download the ScanNet v2 dataset (train/val/test), the raw dataset files should be organized as follows:
    enet-scannet # project root
    ├── dataset
    │   ├── scannetv2
    │   │   ├── scans
    │   │   │   ├── [scene_id]
    │   │   │   │   ├── [scene_id].sens
    │   │   │   │   ├── [scene_id]_vh_clean_2.ply
  2. Pre-process the data, it extracts video frames from .sens files:
    python dataset/scannetv2/preprocess_data.py +workers={cpu_count}
    The output files should have the following format:
    enet-scannet # project root
     ├── dataset
     │   ├── scannetv2
     │   │   ├── video_frames
     │   │   │   ├── [scene_id]
     │   │   │   │   ├── color
     │   │   │   │   │   ├── *.jpg
     │   │   │   │   ├── depth
     │   │   │   │   │   ├── *.png
     │   │   │   │   ├── pose
     │   │   │   │   │   ├── *.txt

Pre-trained ENet weights

  1. Download pre-trained ENet weights, the file should be organized as follows:
     enet-scannet # project root
     ├── checkpoints
     │   ├── scannetv2_enet.pth

Run

  1. Extract enet features for video frames:
python extract_enet_features.py

Then, it outputs the output/video_frame_features.h5 file with the following format:

 video_frame_features.h5 # the output file
 ├── [scene_id] # dataset
 │   ├── (frames, 128, 32, 41) # (#frames, #feature_channel, image_height, image_width)
 ...
  1. Project enet features to point cloud:
python project_features_to_points.py

Then, it outputs output/multiview_features.h5 files with the following format:

output/multiview_features.h5 # the output file
 ├── [scene_id] # dataset
 │   ├── (points, 128) # (#points, #feature_channel)
 ...

Citation

If you use this code to extract ENet-based multi-view features, please cite the following:

@inproceedings{dai20183dmv,
  title={3dmv: Joint 3d-multi-view prediction for 3d semantic scene segmentation},
  author={Dai, Angela and Nie{\ss}ner, Matthias},
  booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
  pages={452--468},
  year={2018}
}
@article{paszke2016enet,
  title={Enet: A deep neural network architecture for real-time semantic segmentation},
  author={Paszke, Adam and Chaurasia, Abhishek and Kim, Sangpil and Culurciello, Eugenio},
  journal={arXiv preprint arXiv:1606.02147},
  year={2016}
}

If you use the ScanNet data, please cite:

@inproceedings{dai2017scannet,
  title={Scannet: Richly-annotated 3d reconstructions of indoor scenes},
  author={Dai, Angela and Chang, Angel X and Savva, Manolis and Halber, Maciej and Funkhouser, Thomas and Nie{\ss}ner, Matthias},
  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
  pages={5828--5839},
  year={2017}
}

Acknowledgement

This repo is built upon 3DMV, ScanRefer and ScanNet.

About

Helper tools for extracting and projecting ENet features to ScanNet pointclouds.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages