[Algorithm] CrossQ #2033

BY571 · 2024-03-21T17:47:04Z

Description

Motivation and Context

Why is this change required? What problem does it solve?
If it fixes an open issue, please link to the issue here.
You can use the syntax close #15213 if this solves the issue #15213

I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

What types of changes does your code introduce? Remove all that do not apply:

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds core functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)
Example (update in the folder of examples)

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

I have read the CONTRIBUTION guide (required)
My change requires a change to the documentation.
I have updated the tests accordingly (required for a bug fix or a new feature).
I have updated the documentation accordingly.

pytorch-bot · 2024-03-21T17:47:07Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2033

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 12 New Failures, 2 Unrelated Failures

As of commit 75d4cee with merge base f613eef ():

NEW FAILURES - The following jobs have failed:

Continuous Benchmark (PR) / CPU Pytest benchmark (gh)
Workflow failed! Resource not accessible by integration
Continuous Benchmark (PR) / GPU Pytest benchmark (gh)
Workflow failed! Resource not accessible by integration
Examples Tests on Linux / tests (3.9, 12.1) / linux-job (gh)
RuntimeError: Device string must not be empty
Habitat Tests on Linux / tests (3.9, 12.1) / linux-job (gh)
RuntimeError: Command docker exec -t 21327eda425337d90852fa374e1db2372c7c6b6fbe679bf0d6fb010f25a4aaef /exec failed with exit code 139
Unit-tests on Linux / tests-cpu (3.10) / linux-job (gh)
test/test_cost.py::TestCrossQ::test_crossq_reduction[sum]
Unit-tests on Linux / tests-cpu (3.11) / linux-job (gh)
test/test_cost.py::TestCrossQ::test_crossq_reduction[sum]
Unit-tests on Linux / tests-cpu (3.8) / linux-job (gh)
test/test_cost.py::TestCrossQ::test_crossq_reduction[sum]
Unit-tests on Linux / tests-cpu (3.9) / linux-job (gh)
test/test_cost.py::TestCrossQ::test_crossq_reduction[sum]
Unit-tests on Linux / tests-gpu (3.10, 12.1) / linux-job (gh)
test/test_cost.py::TestCrossQ::test_crossq_reduction[sum]
Unit-tests on Linux / tests-optdeps (3.10, 12.1) / linux-job (gh)
test/test_cost.py::TestCrossQ::test_crossq_reduction[sum]
Unit-tests on Linux / tests-stable-gpu (3.10, 11.8) / linux-job (gh)
test/test_cost.py::TestCrossQ::test_crossq_reduction[sum]
Unit-tests on Windows / unittests-cpu / windows-job (gh)
test/test_cost.py::TestCrossQ::test_crossq_reduction[sum]

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

Libs Tests on Linux / unittests-gym (3.9, 12.1) / linux-job (gh) (trunk failure)
##[error]The operation was canceled.
Unit-tests on Linux / tests-olddeps (3.8, 11.6) / linux-job (gh) (trunk failure)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

torchrl/objectives/crossq.py

BY571 · 2024-03-21T18:02:25Z

Performance with separate target_computation looks good:

But we need to check for speed. It should be similar to our sac implementation.

BY571 · 2024-03-26T19:18:00Z

torchrl/objectives/crossq.py

+ # self.qvalue_network_params,
+ # ).get(self.tensor_keys.state_action_value)
+
+ combined = torch.cat(


Solved the previous issue but the sequential forward pass for current state and next state values is still faster than the combined. Cat and splitting might be slowing down the computation.

Let's keep the commented vmaps, we can make them run faster eventually

# Conflicts: # .github/unittest/linux_examples/scripts/run_test.sh

vmoens

thanks for this
There are just a couple of things to fix before merging

vmoens · 2024-06-12T08:41:55Z

sota-implementations/crossq/crossq.py

+ sampled_tensordict = sampled_tensordict.clone()
+
+ # Compute loss
+ q_loss, *_ = loss_module._qvalue_loss(sampled_tensordict)


we should not use private attributes in examples. Let's make qvalue_loss a public method if that is needed

vmoens · 2024-06-12T08:42:12Z

sota-implementations/crossq/crossq.py

+ sampled_tensordict
+ )
+ actor_loss = actor_loss.mean()
+ alpha_loss = loss_module._alpha_loss(


vmoens · 2024-06-12T08:43:26Z

sota-implementations/crossq/crossq.py

+ )
+ actor_loss = actor_loss.mean()
+ alpha_loss = loss_module._alpha_loss(
+ log_prob=metadata_actor["log_prob"]


the fact that the example requires that much knowledge about the way the loss works is a bit worrying - the script should be immediate.
Is there a version of this were alpha_loss just takes the metadata dict?

vmoens · 2024-06-12T08:45:55Z

sota-implementations/crossq/utils.py

+ "num_cells": cfg.network.actor_hidden_sizes,
+ "out_features": 2 * action_spec.shape[-1],
+ "activation_class": get_activation(cfg.network.actor_activation),
+ "norm_class": nn.BatchNorm1d, # Should be BRN (https://arxiv.org/abs/1702.03275) not sure if added to torch


Suggested change

"norm_class": nn.BatchNorm1d, # Should be BRN (https://arxiv.org/abs/1702.03275) not sure if added to torch

"norm_class": nn.BatchNorm1d, # Should be BRN (https://arxiv.org/abs/1702.03275)

vmoens · 2024-06-12T08:46:26Z

sota-implementations/crossq/utils.py

+ "num_cells": cfg.network.critic_hidden_sizes,
+ "out_features": 1,
+ "activation_class": get_activation(cfg.network.critic_activation),
+ "norm_class": nn.BatchNorm1d, # Should be BRN (https://arxiv.org/abs/1702.03275) not sure if added to torch


Suggested change

"norm_class": nn.BatchNorm1d, # Should be BRN (https://arxiv.org/abs/1702.03275) not sure if added to torch

"norm_class": nn.BatchNorm1d, # Should be BRN (https://arxiv.org/abs/1702.03275)

vmoens · 2024-06-12T08:51:40Z

torchrl/objectives/crossq.py

+ qvalue_network (TensorDictModule): Q(s, a) parametric model.
+ This module typically outputs a ``"state_action_value"`` entry.
+
+ num_qvalue_nets (integer, optional): number of Q-Value networks used.


Suggested change

num_qvalue_nets (integer, optional): number of Q-Value networks used.

Keyword Args:

num_qvalue_nets (integer, optional): number of Q-Value networks used.

vmoens · 2024-06-12T08:53:05Z

torchrl/objectives/crossq.py

+
+ action: NestedKey = "action"
+ state_action_value: NestedKey = "state_action_value"
+ log_prob: NestedKey = "_log_prob"


not used I think

vmoens · 2024-06-12T08:53:46Z

torchrl/objectives/crossq.py

+ def _cached_detached_qvalue_params(self):
+ return self.qvalue_network_params.detach()
+
+ def _actor_loss(


make public

vmoens · 2024-06-12T08:53:52Z

torchrl/objectives/crossq.py

+
+ return self._alpha * log_prob - min_q_logprob, {"log_prob": log_prob.detach()}
+
+ def _qvalue_loss(


make public

vmoens · 2024-06-12T08:54:29Z

torchrl/objectives/crossq.py

+ # self.qvalue_network_params,
+ # ).get(self.tensor_keys.state_action_value)
+
+ combined = torch.cat(


Let's keep the commented vmaps, we can make them run faster eventually

vmoens · 2024-06-12T08:55:57Z

@BY571 we should also add it to the sota benchmarks

BY571 added 6 commits March 20, 2024 20:42

add crossQ examples

0a23ae8

add loss

9bdee71

update

5086249

update add tests

c3a927f

detach

d1c9c34

update tests

e879b7c

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 21, 2024

BY571 added 2 commits March 21, 2024 18:50

update run_test.sh

75255e7

move crossq to sota-implementations

a7b79c3

BY571 commented Mar 21, 2024

View reviewed changes

torchrl/objectives/crossq.py Outdated Show resolved Hide resolved

update loss

be84f3f

BY571 marked this pull request as ready for review March 26, 2024 18:40

update cat prediction

2170ad8

BY571 commented Mar 26, 2024

View reviewed changes

vmoens added the new algo New algorithm request or PR label Apr 8, 2024

Merge branch 'main' into crossQ

75d4cee

# Conflicts: # .github/unittest/linux_examples/scripts/run_test.sh

vmoens reviewed Jun 12, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Algorithm] CrossQ #2033

[Algorithm] CrossQ #2033

BY571 commented Mar 21, 2024

pytorch-bot bot commented Mar 21, 2024 •

edited

BY571 commented Mar 21, 2024

BY571 Mar 26, 2024

vmoens Jun 12, 2024

vmoens left a comment

vmoens Jun 12, 2024

vmoens Jun 12, 2024

vmoens Jun 12, 2024

vmoens Jun 12, 2024

vmoens Jun 12, 2024

vmoens Jun 12, 2024

vmoens Jun 12, 2024

vmoens Jun 12, 2024

vmoens Jun 12, 2024

vmoens Jun 12, 2024

vmoens commented Jun 12, 2024

	"norm_class": nn.BatchNorm1d, # Should be BRN (https://arxiv.org/abs/1702.03275) not sure if added to torch
	"norm_class": nn.BatchNorm1d, # Should be BRN (https://arxiv.org/abs/1702.03275)


		return self._alpha * log_prob - min_q_logprob, {"log_prob": log_prob.detach()}

		def _qvalue_loss(

[Algorithm] CrossQ #2033

Are you sure you want to change the base?

[Algorithm] CrossQ #2033

Conversation

BY571 commented Mar 21, 2024

Description

Motivation and Context

Types of changes

Checklist

pytorch-bot bot commented Mar 21, 2024 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2033

❌ 12 New Failures, 2 Unrelated Failures

BY571 commented Mar 21, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vmoens left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vmoens commented Jun 12, 2024

pytorch-bot bot commented Mar 21, 2024 •

edited