Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is this project is still in work in progress? #4

Open
jagdishbhanushali opened this issue Aug 26, 2019 · 5 comments
Open

Is this project is still in work in progress? #4

jagdishbhanushali opened this issue Aug 26, 2019 · 5 comments

Comments

@jagdishbhanushali
Copy link

Hi,
Are you still working on this or it is finished?
I was get inspired by your paper and would like to see results.

Thanks,
Jagdish

@stratomaster31
Copy link

I'm working on this model. I've coded the CVAE and I have good results in training phase, but not for test phase...
Which are the decoder1 inputs? In the paper is not specified...

@i-chaochen
Copy link

I think this work only can handle Stanford Drone Dataset? Do you know how to process KITTI dataset?

In the original paper, as the following

As the dataset does not provide semantic labels for 3D points (which we need for scene context), we first perform semantic segmentations of images and project Velodyne laser scans onto the image plane using the provided camera matrix to label 3D points. The semantically labeled 3D points are then registered into the world coordinates using GPS-IMU tags. Finally we create top-down view feature maps I of size H ×W × C.

If I understood correctly, they did these:

  1. first do the semantic segmentation of all images, get masks of data.

  2. project laser data into 2d image and put mask data on this 2d image. // i.e., opencv's projectPoints() to do the project?

    2.1 Since KITTI is bin format, we need to convert it to PCD first.

    2.2 Do the registration for all PCD files to fuse as a global frame, and then finally we can use camera matrix (provided by KITTI) and extrinsic matrix (calculated by GPU-IMU) to covert it as a 2d image, and we also will project segmentation mask from step-1 to this projected 2d image.

Anyone can correct me if I'm wrong? Thanks in advance!

@sujithvemi
Copy link

sujithvemi commented Oct 31, 2019

@i-chaochen

I don't work with LiDAR data, so I can't comment on bin and PCD format etc. But the approach that you are taking sounds fine to me. To summarize, this is my understanding:

  • Project the Velodyne 3D laser scan to 2D image plane
  • All the points in the third dimension that fall on same point in the 2D image plane get the same label as that is recognised from semantic segmentation
  • Now the points are converted to world co-ordinate frame
  • Build a BEV 3D matrix with the third dimension being a one-hot vector corresponding to the class from semantic segmentation (cropping of this feature map can be done before building it)

Feel free to comment if I am wrong in any sense, so we can better understand. Thanks in advance.

@i-chaochen
Copy link

i-chaochen commented Nov 8, 2019

@i-chaochen

I don't work with LiDAR data, so I can't comment on bin and PCD format etc. But the approach that you are taking sounds fine to me. To summarize, this is my understanding:

  • Project the Velodyne 3D laser scan to 2D image plane
  • All the points in the third dimension that fall on same point in the 2D image plane get the same label as that is recognised from semantic segmentation
  • Now the points are converted to world co-ordinate frame
  • Build a BEV 3D matrix with the third dimension being a one-hot vector corresponding to the class from semantic segmentation (cropping of this feature map can be done before building it)

Feel free to comment if I am wrong in any sense, so we can better understand. Thanks in advance.

@sujithvemi Thanks for the feedback. I am not sure I fully understood what the original paper means for project Velodyne laser scans onto the image plane. What this image plane looks like? Does it look like this one?

Screenshot 2019-11-08 at 00 48 51

Also, since they already project 3D scans to 2D image plane, why they need to register 3D scans to the world coordinate using GPS-IMU tag? 2D image coordinate can be used for the prediction anyway.

If they want to do the register to the world coordinate, I think they will need intrinsic and extrinsic (it can provide by GPS-IMU?) matrices instead of GPS-IMU.

@sujithvemi
Copy link

@i-chaochen I really wish I could help you here. But I really don't know much about LIDAR and was not able to fully understand what the paper said.

You can check the supplemental material provided here, it might help you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants