Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the need for a dummy environment when instantiating a MultiSyncDataCollector. #1998

Open
AechPro opened this issue Mar 6, 2024 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@AechPro
Copy link

AechPro commented Mar 6, 2024

Motivation

Currently we must instantiate an instance of the environment we wish to solve before creating a MultiSyncDataCollector object. This is because we can't create a policy without knowing the environment's action and observation specs, and the MultiSyncDataCollector requires us to pass a policy to its constructor. In general we would rather not do this because creating and discarding a dummy environment is wasteful, but it may become a tangible problem for environments that are particularly large or slow to instantiate.

Ideally the MultiSyncDataCollector would allow us to access the observation and action specs from one of its sub-processes before we provide it a policy.

Solution

Construct the collector and query the environment specs before constructing a policy, like so:

collector = MultiSyncDataCollector(...)
action_spec = collector.get_env_action_spec()
obs_spec = collector.get_env_obs_spec()

Then instantiate a policy and pass that to the collector:

policy = MyPolicy(obs_spec, action_spec)
collector.set_policy(policy)

Alternatives

Pass a reference to the policy instantiation callable to the collector, then retrieve the policy object later:

collector = MultiSyncDataCollector(policy_callable=MyPolicy, ...)
policy = collector.get_policy()

Or we might prefer some sort of LUT mapping environment names to specs which does not actually instantiate the environment (this would require an instantiation once when the env is registered, but never again afterwards):

action_spec = GymEnv.get_action_spec("CartPole-v1")
obs_spec = GymEnv.get_obs_spec("CartPole-v1")
policy = MyPolicy(obs_spec, action_spec)

collector = MultiSyncDataCollector(...)

Additional context

This isn't a huge problem, but it would be nice at some point.

Checklist

  • [+] I have checked that there is no similar issue in the repo (required)
@AechPro AechPro added the enhancement New feature or request label Mar 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants