Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] rollout with non-uniform tensors no longer supported #2128

Closed
2 of 3 tasks
dtsaras opened this issue Apr 29, 2024 · 7 comments
Closed
2 of 3 tasks

[BUG] rollout with non-uniform tensors no longer supported #2128

dtsaras opened this issue Apr 29, 2024 · 7 comments
Assignees
Labels
bug Something isn't working

Comments

@dtsaras
Copy link

dtsaras commented Apr 29, 2024

Describe the bug

There seems to be a bug with performing a rollout when the tensordicts produced do not have the same shape. The rollout used to produce LazyStackedTensorDict, but it currently fails to stack them. It's possible to fix this issue by using the context manager: python with set_lazy_legacy(True):

When I was using the version below:
TorchRL:
git+ssh://[email protected]/pytorch/rl.git@7cd6a181020ba2d0ec420d7c3a92c8d689be6bd1#egg=torchrl
TensorDict:
tensordict-nightly==2024.3.20

To Reproduce

Steps to reproduce the behavior.

Please try to provide a minimal example to reproduce the bug. Error messages and stack traces are also helpful.

Please use the markdown code blocks for both code and stack traces.

import torchrl

env = make_env() # some env that produces sequential data

# this raises a stacking error (It works in older versions)
env.rollout()

# this works
with tensordict.utils.set_lazy_legacy(True):
    env.rollout()
Traceback (most recent call last):
  File "/Users/dtsaras/Documents/CS/circuit_gen/src/rl/mcts_policy.py", line 502, in simulate
    rollout = self.env.rollout(
  File "/opt/homebrew/Caskroom/miniconda/base/envs/c_gen/lib/python3.10/site-packages/torchrl/envs/common.py", line 2567, in rollout
    out_td = torch.stack(tensordicts, len(batch_size), out=out)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/c_gen/lib/python3.10/site-packages/tensordict/base.py", line 388, in __torch_function__
    return TD_HANDLED_FUNCTIONS[func](*args, **kwargs)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/c_gen/lib/python3.10/site-packages/tensordict/_torch_func.py", line 490, in _stack
    raise RuntimeError(
RuntimeError: The shapes of the tensors to stack is incompatible.

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

Expected behavior

It's expected to return a LazyStackedTensorDict

System info

Describe the characteristic of your environment:

  • Describe how the library was installed (pip, source, ...)
  • Python version 3.10
  • Versions of any other relevant libraries
import torchrl, numpy, sys
print(torchrl.__version__, numpy.__version__, sys.version, sys.platform)

Additional context

Add any other context about the problem here.

Reason and Possible fixes

If you know or suspect the reason for this bug, paste the code lines and suggest modifications.

Checklist

  • I have checked that there is no similar issue in the repo (required)
  • I have read the documentation (required)
  • I have provided a minimal working example to reproduce the bug (required) [there is no such environment to provide one]
@dtsaras dtsaras added the bug Something isn't working label Apr 29, 2024
@dtsaras
Copy link
Author

dtsaras commented Apr 29, 2024

Missed to include the versions:
0.4.0 1.26.2 3.10.13 (main, Sep 11 2023, 08:16:02) [Clang 14.0.6 ] darwin

@vmoens
Copy link
Contributor

vmoens commented Apr 29, 2024

Yes unfortunately we deprecated the behaviour of stack because that required loads of checks and also some surprising behaviours from time to time!
You can also change this by passing the contiguous=False argument to rollout

Note to self: we should make it so that contiguous=False calls LazyStackedTensorDict.maybe_dense_stack to get the "best" stack possible

@dtsaras
Copy link
Author

dtsaras commented Apr 29, 2024

Ok thanks that's helpful! Also, I wonder if we could create dense stacks of the nonuniform shaped tensors by batching them as nested tensors instead?

@vmoens
Copy link
Contributor

vmoens commented Apr 29, 2024

Yes we can do that, I don't think we have the util to stack things as nested tensors yet though
The main issue we'll be facing is that nested tensors can only be stacked along dim 0 and rollout stacks along the last+1 dim, so it will only work for non-batched envs

@dtsaras
Copy link
Author

dtsaras commented Apr 29, 2024

I wonder if we encounter such data maybe we can automatically pad the data and return a padding mask along the dynamic dimensions. This can be the fallback when the contained tensors are nested tensors. If you think that's an option, I can look into implementing something like this after the NeurIPS deadline.

@vmoens
Copy link
Contributor

vmoens commented Apr 29, 2024

Another option is to create the nested tensor along dim 0 but "lie" with tensordict and present it as a transposed version
we'll need to patch the get set etc

@vmoens
Copy link
Contributor

vmoens commented Jun 11, 2024

Closing this as now contiguous=False will do a best intention attempt to stack what can be stacked and return non-contiguous stacks everywhere else

@vmoens vmoens closed this as completed Jun 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants