Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In the Doc “RECURRENT DQN: TRAINING RECURRENT POLICIES” #2083

Open
Zhaohhya opened this issue Apr 17, 2024 · 1 comment
Open

In the Doc “RECURRENT DQN: TRAINING RECURRENT POLICIES” #2083

Zhaohhya opened this issue Apr 17, 2024 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@Zhaohhya
Copy link

I managed to run the code, but during the process, I realized that the maximum STEP for each batch is only 50,
steps: 50, loss_val: 0.1930, action_spread: tensor([26, 24], device='cuda:0'): 18%|█▊ | 181450/1000000 [1:54:35<9:08:14, 24.88it/s]
I tried to output it
print(data[ "step_count"])

tensor([[ 0],
        [ 1],
        [ 2],
        [ 3],
        [ 4],
        [ 5],
        [ 6],
        [ 7],
        [ 8],
        [ 0],
        [ 1],
        [ 2],
        [ 3],
        [ 4],
        [ 5],
        [ 6],
        [ 7],
        [ 8],
        [ 9],
        [10],
        [11],
        [12],
        [ 0],
        [ 1],
        [ 2],
        [ 3],
        [ 4],
        [ 5],
        [ 6],
        [ 7],
        [ 8],
        [ 0],
        [ 1],
        [ 2],
        [ 3],
        [ 4],
        [ 5],
        [ 6],
        [ 7],
        [ 8],
        [ 0],
        [ 1],
        [ 2],
        [ 3],
        [ 4],
        [ 5],
        [ 6],
        [ 7],
        [ 8],
        [ 9]], device='cuda:0')

next output is

tensor([[10],
        [ 0],
        [ 1],
        [ 2],
        [ 3],
        [ 4],
        [ 5],
        [ 6],
        [ 7],
        [ 8],
        [ 9],
        [ 0],
        [ 1],
        [ 2],
        [ 3],
        [ 4],
        [ 5],
        [ 6],
        [ 7],
        [ 8],
        [ 0],
        [ 1],
        [ 2],
        [ 3],
        [ 4],
        [ 5],
        [ 6],
        [ 7],
        [ 8],
        [ 9],
        [10],
        [ 0],
        [ 1],
        [ 2],
        [ 3],
        [ 4],
        [ 5],
        [ 6],
        [ 7],
        [ 8],
        [ 0],
        [ 1],
        [ 2],
        [ 3],
        [ 4],
        [ 5],
        [ 6],
        [ 7],
        [ 8],
        [ 0]], device='cuda:0')

I've tried many times and it's the same pattern, that is to say, the accounting number will start again after each batch. I don't know why.

@Zhaohhya Zhaohhya added the enhancement New feature or request label Apr 17, 2024
@Zhaohhya
Copy link
Author

Using the 'is_init' key, I found that env is always reset on the second step of a batch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants