Torch matrix inversion error #20

koktavy · 2024-03-12T18:57:29Z

I've (almost) gotten the repo working on Windows with the help of Issue 16.

When I run on the sample data (even using --eval) I hit this issue:
torch._C._LinAlgError: torch.linalg.inv: The diagonal element 1 is zero, the inversion could not be completed because the input matrix is singular.

python slam.py --config configs/mono/tum/fr3_office.yaml --eval

MonoGS: Running MonoGS in Evaluation Mode
MonoGS: Following config will be overriden
MonoGS:         save_results=True
MonoGS:         use_gui=False
MonoGS:         eval_rendering=True
MonoGS:         use_wandb=True
MonoGS: saving results in results\datasets_tum\2024-03-12-13-18-30
wandb: (1) Create a W&B account
wandb: (2) Use an existing W&B account
wandb: (3) Don't visualize my results
wandb: Enter your choice: 3
wandb: You chose "Don't visualize my results"
wandb: Tracking run with wandb version 0.16.4
wandb: W&B syncing is set to `offline` in this directory.
wandb: Run `wandb online` or set WANDB_MODE=online to enable cloud syncing.
MonoGS: Resetting the system
MonoGS: Initialized map
Process Process-3:
Traceback (most recent call last):
  File "C:\Users\Tavius\miniconda3\envs\MonoGS\lib\multiprocessing\process.py", line 297, in _bootstrap
    self.run()
  File "C:\Users\Tavius\miniconda3\envs\MonoGS\lib\multiprocessing\process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "X:\Projects\_2024\MonoGS\utils\slam_backend.py", line 417, in run
    self.add_next_kf(cur_frame_idx, viewpoint, depth_map=depth_map)
  File "X:\Projects\_2024\MonoGS\utils\slam_backend.py", line 69, in add_next_kf
    viewpoint, kf_id=frame_idx, init=init, scale=scale, depthmap=depth_map
  File "X:\Projects\_2024\MonoGS\gaussian_splatting\scene\gaussian_model.py", line 239, in extend_from_pcd_seq
    self.create_pcd_from_image(cam_info, init, scale=scale, depthmap=depthmap)
  File "X:\Projects\_2024\MonoGS\gaussian_splatting\scene\gaussian_model.py", line 131, in create_pcd_from_image
    return self.create_pcd_from_image_and_depth(cam, rgb, depth, init)
  File "X:\Projects\_2024\MonoGS\gaussian_splatting\scene\gaussian_model.py", line 150, in create_pcd_from_image_and_depth
    W2C = getWorld2View2(cam.R, cam.T).cpu().numpy()
  File "X:\Projects\_2024\MonoGS\gaussian_splatting\utils\graphics_utils.py", line 41, in getWorld2View2
    C2W = torch.linalg.inv(Rt)
torch._C._LinAlgError: torch.linalg.inv: The diagonal element 1 is zero, the inversion could not be completed because the input matrix is singular.
[W C:\cb\pytorch_1000000000000\work\torch\csrc\CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]

This is a fresh install using --recursive and only incorporating the change noted above.

The text was updated successfully, but these errors were encountered:

koktavy · 2024-03-12T18:59:57Z

Also here's the batch script I used to download the data on Windows instead of Linux:

IF NOT EXIST "datasets\tum" mkdir "datasets\tum"
cd datasets\tum
curl -LJO https://vision.in.tum.de/rgbd/dataset/freiburg1/rgbd_dataset_freiburg1_desk.tgz
tar -xvzf rgbd_dataset_freiburg1_desk.tgz
curl -LJO https://vision.in.tum.de/rgbd/dataset/freiburg2/rgbd_dataset_freiburg2_xyz.tgz
tar -xvzf rgbd_dataset_freiburg2_xyz.tgz
curl -LJO https://vision.in.tum.de/rgbd/dataset/freiburg3/rgbd_dataset_freiburg3_long_office_household.tgz
tar -xvzf rgbd_dataset_freiburg3_long_office_household.tgz
cd ../..

Run from the root in Powershell as scripts\download_tum.bat

zmf2022 · 2024-03-13T02:19:39Z

me too

rmurai0610 · 2024-03-13T15:37:29Z

Hi, thank you for your interest!

Can you print out the variables R, t, Rt, in getWorld2View2 so we can check if the matrix is singular?

I suspect R,t are all zeros due to this bug, but I could be wrong:
https://discuss.pytorch.org/t/pytorch-multiprocessing-with-cuda-sets-tensors-to-0/179117

zmf2022 · 2024-03-13T15:37:58Z

这是来自QQ邮箱的假期自动回复邮件。您好，已收到您的邮件，将尽快给您回复！

yanyan-li · 2024-03-22T12:12:43Z

#> Hi, thank you for your interest!

Can you print out the variables R, t, Rt, in getWorld2View2 so we can check if the matrix is singular?

I suspect R,t are all zeros due to this bug, but I could be wrong: https://discuss.pytorch.org/t/pytorch-multiprocessing-with-cuda-sets-tensors-to-0/179117

It is true. Sometimes, the orientation and translation are all zeros. I printed the inputs of getWorld2View2(R,t, .....)
`w2v tensor([[-0.8280, 0.5254, -0.1956],
[ 0.4139, 0.3374, -0.8455],
[-0.3782, -0.7811, -0.4969]], device='cuda:0') tensor([-2.2574, 0.3327, 1.9227], device='cuda:0')

w2v tensor([[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]], device='cuda:0') tensor([0., 0., 0.], device='cuda:0')

w2v tensor([[-0.8280, 0.5254, -0.1956],
[ 0.4139, 0.3374, -0.8455],
[-0.3782, -0.7811, -0.4969]], device='cuda:0') tensor([-2.2574, 0.3327, 1.9227], device='cuda:0')

w2v tensor([[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]], device='cuda:0') tensor([0., 0., 0.], device='cuda:0')`

muskie82 · 2024-03-22T15:21:45Z

As a quick look, some people reported the same issue in pytorch repo, but no one seems to find a solution. This problem seems to happen only in pytorch multiprocess on Windows.

Unexpected behaviour with shared modules in multiprocessing on WSL2 pytorch/pytorch#112340
Tensor on shared memory is set to 0 when using concurrent.futures and CUDA pytorch/pytorch#100358

Would appreciate it if you share the solution when you find it!
The last resort would be to set up an Ubuntu environment on Docker and run MonoGS.

foreverlong · 2024-03-27T02:47:36Z

Have you solve this error? I meet this error too on my Win10.

zmf2022 · 2024-04-07T07:54:36Z

add this script to disable multithreads!
torch.set_num_interop_threads(1)

hnglp · 2024-04-09T12:51:05Z

add this script to disable multithreads! torch.set_num_interop_threads(1)

is this the solution? can you be more specific

hnglp · 2024-05-15T14:57:53Z

add this script to disable multithreads! torch.set_num_interop_threads(1)

hi i am not sure that i understand what you say, can you tell me where should i add this line

muskie82 mentioned this issue Mar 22, 2024

The visualization of Gaussians is strange #36

Open

rmurai0610 mentioned this issue Apr 3, 2024

CUDA error: the launch timed out and was terminated #47

Open

This was referenced Apr 29, 2024

Errors while trying demo #80

Open

Real-time mode issue: Problem encountered when trying to connect ZED2 camera #82

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Torch matrix inversion error #20

Torch matrix inversion error #20

koktavy commented Mar 12, 2024 •

edited

koktavy commented Mar 12, 2024

zmf2022 commented Mar 13, 2024

rmurai0610 commented Mar 13, 2024

zmf2022 commented Mar 13, 2024 via email

yanyan-li commented Mar 22, 2024 •

edited

muskie82 commented Mar 22, 2024

foreverlong commented Mar 27, 2024

zmf2022 commented Apr 7, 2024

hnglp commented Apr 9, 2024

hnglp commented May 15, 2024

Torch matrix inversion error #20

Torch matrix inversion error #20

Comments

koktavy commented Mar 12, 2024 • edited

koktavy commented Mar 12, 2024

zmf2022 commented Mar 13, 2024

rmurai0610 commented Mar 13, 2024

zmf2022 commented Mar 13, 2024 via email

yanyan-li commented Mar 22, 2024 • edited

muskie82 commented Mar 22, 2024

foreverlong commented Mar 27, 2024

zmf2022 commented Apr 7, 2024

hnglp commented Apr 9, 2024

hnglp commented May 15, 2024

koktavy commented Mar 12, 2024 •

edited

yanyan-li commented Mar 22, 2024 •

edited