In the Doc “RECURRENT DQN: TRAINING RECURRENT POLICIES” #2083

Zhaohhya · 2024-04-17T06:33:10Z

I managed to run the code, but during the process, I realized that the maximum STEP for each batch is only 50，
steps: 50, loss_val: 0.1930, action_spread: tensor([26, 24], device='cuda:0'): 18%|█▊ | 181450/1000000 [1:54:35<9:08:14, 24.88it/s]
I tried to output it
print(data[ "step_count"])

tensor([[ 0],
        [ 1],
        [ 2],
        [ 3],
        [ 4],
        [ 5],
        [ 6],
        [ 7],
        [ 8],
        [ 0],
        [ 1],
        [ 2],
        [ 3],
        [ 4],
        [ 5],
        [ 6],
        [ 7],
        [ 8],
        [ 9],
        [10],
        [11],
        [12],
        [ 0],
        [ 1],
        [ 2],
        [ 3],
        [ 4],
        [ 5],
        [ 6],
        [ 7],
        [ 8],
        [ 0],
        [ 1],
        [ 2],
        [ 3],
        [ 4],
        [ 5],
        [ 6],
        [ 7],
        [ 8],
        [ 0],
        [ 1],
        [ 2],
        [ 3],
        [ 4],
        [ 5],
        [ 6],
        [ 7],
        [ 8],
        [ 9]], device='cuda:0')

next output is

tensor([[10],
        [ 0],
        [ 1],
        [ 2],
        [ 3],
        [ 4],
        [ 5],
        [ 6],
        [ 7],
        [ 8],
        [ 9],
        [ 0],
        [ 1],
        [ 2],
        [ 3],
        [ 4],
        [ 5],
        [ 6],
        [ 7],
        [ 8],
        [ 0],
        [ 1],
        [ 2],
        [ 3],
        [ 4],
        [ 5],
        [ 6],
        [ 7],
        [ 8],
        [ 9],
        [10],
        [ 0],
        [ 1],
        [ 2],
        [ 3],
        [ 4],
        [ 5],
        [ 6],
        [ 7],
        [ 8],
        [ 0],
        [ 1],
        [ 2],
        [ 3],
        [ 4],
        [ 5],
        [ 6],
        [ 7],
        [ 8],
        [ 0]], device='cuda:0')

I've tried many times and it's the same pattern, that is to say, the accounting number will start again after each batch. I don't know why.

The text was updated successfully, but these errors were encountered:

Zhaohhya · 2024-04-17T07:15:11Z

Using the 'is_init' key, I found that env is always reset on the second step of a batch

Zhaohhya added the enhancement New feature or request label Apr 17, 2024

Zhaohhya assigned vmoens Apr 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In the Doc “RECURRENT DQN: TRAINING RECURRENT POLICIES” #2083

In the Doc “RECURRENT DQN: TRAINING RECURRENT POLICIES” #2083

Zhaohhya commented Apr 17, 2024

Zhaohhya commented Apr 17, 2024

In the Doc “RECURRENT DQN: TRAINING RECURRENT POLICIES” #2083

In the Doc “RECURRENT DQN: TRAINING RECURRENT POLICIES” #2083

Comments

Zhaohhya commented Apr 17, 2024

Zhaohhya commented Apr 17, 2024