Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Torch matrix inversion error #20

Open
koktavy opened this issue Mar 12, 2024 · 10 comments
Open

Torch matrix inversion error #20

koktavy opened this issue Mar 12, 2024 · 10 comments

Comments

@koktavy
Copy link

koktavy commented Mar 12, 2024

I've (almost) gotten the repo working on Windows with the help of Issue 16.

When I run on the sample data (even using --eval) I hit this issue:
torch._C._LinAlgError: torch.linalg.inv: The diagonal element 1 is zero, the inversion could not be completed because the input matrix is singular.

python slam.py --config configs/mono/tum/fr3_office.yaml --eval

MonoGS: Running MonoGS in Evaluation Mode
MonoGS: Following config will be overriden
MonoGS:         save_results=True
MonoGS:         use_gui=False
MonoGS:         eval_rendering=True
MonoGS:         use_wandb=True
MonoGS: saving results in results\datasets_tum\2024-03-12-13-18-30
wandb: (1) Create a W&B account
wandb: (2) Use an existing W&B account
wandb: (3) Don't visualize my results
wandb: Enter your choice: 3
wandb: You chose "Don't visualize my results"
wandb: Tracking run with wandb version 0.16.4
wandb: W&B syncing is set to `offline` in this directory.
wandb: Run `wandb online` or set WANDB_MODE=online to enable cloud syncing.
MonoGS: Resetting the system
MonoGS: Initialized map
Process Process-3:
Traceback (most recent call last):
  File "C:\Users\Tavius\miniconda3\envs\MonoGS\lib\multiprocessing\process.py", line 297, in _bootstrap
    self.run()
  File "C:\Users\Tavius\miniconda3\envs\MonoGS\lib\multiprocessing\process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "X:\Projects\_2024\MonoGS\utils\slam_backend.py", line 417, in run
    self.add_next_kf(cur_frame_idx, viewpoint, depth_map=depth_map)
  File "X:\Projects\_2024\MonoGS\utils\slam_backend.py", line 69, in add_next_kf
    viewpoint, kf_id=frame_idx, init=init, scale=scale, depthmap=depth_map
  File "X:\Projects\_2024\MonoGS\gaussian_splatting\scene\gaussian_model.py", line 239, in extend_from_pcd_seq
    self.create_pcd_from_image(cam_info, init, scale=scale, depthmap=depthmap)
  File "X:\Projects\_2024\MonoGS\gaussian_splatting\scene\gaussian_model.py", line 131, in create_pcd_from_image
    return self.create_pcd_from_image_and_depth(cam, rgb, depth, init)
  File "X:\Projects\_2024\MonoGS\gaussian_splatting\scene\gaussian_model.py", line 150, in create_pcd_from_image_and_depth
    W2C = getWorld2View2(cam.R, cam.T).cpu().numpy()
  File "X:\Projects\_2024\MonoGS\gaussian_splatting\utils\graphics_utils.py", line 41, in getWorld2View2
    C2W = torch.linalg.inv(Rt)
torch._C._LinAlgError: torch.linalg.inv: The diagonal element 1 is zero, the inversion could not be completed because the input matrix is singular.
[W C:\cb\pytorch_1000000000000\work\torch\csrc\CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]

This is a fresh install using --recursive and only incorporating the change noted above.

@koktavy
Copy link
Author

koktavy commented Mar 12, 2024

Also here's the batch script I used to download the data on Windows instead of Linux:

IF NOT EXIST "datasets\tum" mkdir "datasets\tum"
cd datasets\tum
curl -LJO https://vision.in.tum.de/rgbd/dataset/freiburg1/rgbd_dataset_freiburg1_desk.tgz
tar -xvzf rgbd_dataset_freiburg1_desk.tgz
curl -LJO https://vision.in.tum.de/rgbd/dataset/freiburg2/rgbd_dataset_freiburg2_xyz.tgz
tar -xvzf rgbd_dataset_freiburg2_xyz.tgz
curl -LJO https://vision.in.tum.de/rgbd/dataset/freiburg3/rgbd_dataset_freiburg3_long_office_household.tgz
tar -xvzf rgbd_dataset_freiburg3_long_office_household.tgz
cd ../..

Run from the root in Powershell as scripts\download_tum.bat

@zmf2022
Copy link

zmf2022 commented Mar 13, 2024

me too

@rmurai0610
Copy link
Collaborator

Hi, thank you for your interest!

Can you print out the variables R, t, Rt, in getWorld2View2 so we can check if the matrix is singular?

I suspect R,t are all zeros due to this bug, but I could be wrong:
https://discuss.pytorch.org/t/pytorch-multiprocessing-with-cuda-sets-tensors-to-0/179117

@zmf2022
Copy link

zmf2022 commented Mar 13, 2024 via email

@yanyan-li
Copy link

yanyan-li commented Mar 22, 2024

#> Hi, thank you for your interest!

Can you print out the variables R, t, Rt, in getWorld2View2 so we can check if the matrix is singular?

I suspect R,t are all zeros due to this bug, but I could be wrong: https://discuss.pytorch.org/t/pytorch-multiprocessing-with-cuda-sets-tensors-to-0/179117

It is true. Sometimes, the orientation and translation are all zeros. I printed the inputs of getWorld2View2(R,t, .....)
`w2v tensor([[-0.8280, 0.5254, -0.1956],
[ 0.4139, 0.3374, -0.8455],
[-0.3782, -0.7811, -0.4969]], device='cuda:0') tensor([-2.2574, 0.3327, 1.9227], device='cuda:0')

w2v tensor([[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]], device='cuda:0') tensor([0., 0., 0.], device='cuda:0')

w2v tensor([[-0.8280, 0.5254, -0.1956],
[ 0.4139, 0.3374, -0.8455],
[-0.3782, -0.7811, -0.4969]], device='cuda:0') tensor([-2.2574, 0.3327, 1.9227], device='cuda:0')

w2v tensor([[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]], device='cuda:0') tensor([0., 0., 0.], device='cuda:0')`

@muskie82
Copy link
Owner

As a quick look, some people reported the same issue in pytorch repo, but no one seems to find a solution. This problem seems to happen only in pytorch multiprocess on Windows.

Would appreciate it if you share the solution when you find it!
The last resort would be to set up an Ubuntu environment on Docker and run MonoGS.

@foreverlong
Copy link

Have you solve this error? I meet this error too on my Win10.

@zmf2022
Copy link

zmf2022 commented Apr 7, 2024

add this script to disable multithreads!
torch.set_num_interop_threads(1)

@hnglp
Copy link

hnglp commented Apr 9, 2024

add this script to disable multithreads! torch.set_num_interop_threads(1)

is this the solution? can you be more specific

@hnglp
Copy link

hnglp commented May 15, 2024

add this script to disable multithreads! torch.set_num_interop_threads(1)

hi i am not sure that i understand what you say, can you tell me where should i add this line

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants