Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION] Using LSTMs with vectorized envs without ParallelEnv #1493

Open
ADebor opened this issue Sep 5, 2023 · 6 comments
Open

[QUESTION] Using LSTMs with vectorized envs without ParallelEnv #1493

ADebor opened this issue Sep 5, 2023 · 6 comments

Comments

@ADebor
Copy link

ADebor commented Sep 5, 2023

Hi there,

As mentioned here, I'm trying to use torchrl with NVIDIA Orbit to train an agent in parallel robot environments. I tried to draw inspiration from your recent IsaacGymEnv class to create a simple OrbitEnv class, inheriting from GymEnv directly as Orbit environments are registered in gym (made sense to me, but I may be wrong). I'm thus able to create a torchrl environment and add transforms, but I get in trouble when trying to use ("parallel") RNNs and thus hidden states.

Since Orbit environments are vectorized environments, I only create one torchrl environment wrapping the orbit one and I set the batch_size equal to the number of environments in the Orbit vectorized one. As I want to use LSTMs, I add the make_tensordict_primer() transform to my environment. If I then try to reset my environment, I get the following error:

RuntimeError: batch dimension mismatch, got self.batch_size=torch.Size([2]) and value.shape[:self.batch_dims]=torch.Size([1]) with value tensor([[0., 0., 0., 0., 0.]])

where 2 is the number of parallel environments and the value tensor is a hidden state one (of length 5). I don't get what the problem is exactly, but it looks like the batch size is not taken into account when it comes to hidden states. When trying with a basic gym environment, I wrap my environment within a ParallelEnv instance and I have no issue, but in this case, using ParallelEnv causes a problem since (as I get it) torchrl tries to create multiple parallel and separate environments, which seems to be in conflict with Orbit where all environments are in the same scene in Isaac Sim.

Can you see a way to properly use LSTMs in torchrl while using a vectorized environment from Orbit? Don't hesitate to tell me if the question or context isn't clear. I apologize if I'm out of line here, I'm maybe missing something important in the way torchrl is supposed to be used.


Here is the complete stack trace:

Traceback (most recent call last):
  File "test.py", line 61, in <module>
    td = env.reset()
  File "/home/adebor/isaacsim_ws/rl/torchrl/envs/common.py", line 949, in reset
    tensordict_reset = self._reset(tensordict, **kwargs)
  File "/home/adebor/isaacsim_ws/rl/torchrl/envs/transforms/transforms.py", line 651, in _reset
    out_tensordict = self.transform.reset(out_tensordict)
  File "/home/adebor/isaacsim_ws/rl/torchrl/envs/transforms/transforms.py", line 889, in reset
    tensordict = t.reset(tensordict)
  File "/home/adebor/isaacsim_ws/rl/torchrl/envs/transforms/transforms.py", line 3016, in reset
    tensordict.set(key, value)
  File "/home/adebor/isaacsim_ws/tensordict/tensordict/tensordict.py", line 761, in set
    return self._set_tuple(key, item, inplace=inplace, validated=False)
  File "/home/adebor/isaacsim_ws/tensordict/tensordict/tensordict.py", line 4122, in _set_tuple
    return self._set_str(key[0], value, inplace=inplace, validated=validated)
  File "/home/adebor/isaacsim_ws/tensordict/tensordict/tensordict.py", line 4093, in _set_str
    value = self._validate_value(value, check_shape=True)
  File "/home/adebor/isaacsim_ws/tensordict/tensordict/tensordict.py", line 1732, in _validate_value
    f"batch dimension mismatch, got self.batch_size"
RuntimeError: batch dimension mismatch, got self.batch_size=torch.Size([2]) and value.shape[:self.batch_dims]=torch.Size([1]) with value tensor([[0., 0., 0., 0., 0.]])

Versions:
torchrl==0.1.1
torch==1.13.1

@vmoens vmoens pinned this issue Sep 7, 2023
@btx0424
Copy link
Contributor

btx0424 commented Sep 8, 2023

It would be clearer if you could provide the specs of your OrbitEnv before and after the transformation.

Also, it seems that the LSTMModule can not properly work with vectorized environments in general because the current implementation of TensordictPremier.reset would always reset all the hidden states.

@ADebor
Copy link
Author

ADebor commented Sep 18, 2023

I'm not sure these are the specs you mention, but here's what I get:

  • before any transformation
OrbitEnv(env=Isaac-Cartpole-v0, batch_size=torch.Size([2]), device=cuda:0)
  • after all the transforms except the make_tensordict_primer one
TransformedEnv(
    env=OrbitEnv(env=Isaac-Cartpole-v0, batch_size=torch.Size([2]), device=cuda:0),
    transform=Compose(
            InitTracker(keys=[]),
            DoubleToFloat(in_keys=['observation'], out_keys=['observation'], in_keys_inv=[], out_keys_inv=[]),
            StepCounter(keys=[]),
            ObservationNorm(keys=['observation'])))
  • after the make_tensordict_primer one
TransformedEnv(
    env=OrbitEnv(env=Isaac-Cartpole-v0, batch_size=torch.Size([2]), device=cuda:0),
    transform=Compose(
            InitTracker(keys=[]),
            DoubleToFloat(in_keys=['observation'], out_keys=['observation'], in_keys_inv=[], out_keys_inv=[]),
            StepCounter(keys=[]),
            ObservationNorm(keys=['observation']),
            TensorDictPrimer(primers={('recurrent_state_h',): UnboundedContinuousTensorSpec(
                shape=torch.Size([1, 5]),
                space=None,
                device=cpu,
                dtype=torch.float32,
                domain=continuous), ('recurrent_state_c',): UnboundedContinuousTensorSpec(
                shape=torch.Size([1, 5]),
                space=None,
                device=cpu,
                dtype=torch.float32,
                domain=continuous)}, default_value=0.0, random=False)))

Concerning the LSTMModule-Vectorized envs issue, I thought this had been fixed here.

@vmoens
Copy link
Contributor

vmoens commented Dec 23, 2023

I would be interested in seeing the what the OrbitEnv looks like @ADebor. It would be cool to integrate that in torchrl!
(and easier to debug :p )

@btx0424
Copy link
Contributor

btx0424 commented Dec 23, 2023

Hi there, I am glad to give it a try or share some experiences if you like. What I personally have been doing is very similar to GymWrapper. Some tricky differences prevent me from directly treating it as a Vectorized Gym environment. Meanwhile, Orbit's behavior upon reset is like early Gym versions, i.e., no terminal observations are returned. It could be a problem in some cases. I will try reaching out to the authors shortly to see what they plan to do about it.

@ADebor
Copy link
Author

ADebor commented Jan 12, 2024

Hi there, sorry for the late reply. I had to work on other stuff for a while, and still have. I'll get back to you as soon as I'm able to look at the code again. Since Orbit just merged their dev branch, I'm not sure my previous attempts are still compatible with this new version. Plus, my OrbitEnv was a very basic and naive class, and only small tests were conducted on my side. I'd be thrilled to see what you did @btx0424, thanks for the help.

@Cadene
Copy link
Contributor

Cadene commented Jan 15, 2024

@btx0424 @ADebor Thanks for your effort on porting Orbit. Is there a PR somewhere or an issue to track progress? Would be happy to see how you did it and maybe contribute :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants