Seq2seq model with ppo_trainer samples strange output! #1633

sajastu · 2024-05-08T20:25:43Z

Hi,

I'm using PPO with BART (through some slight changes that I made to ppo_trainer to make it adept for seq2seq modeling). The general idea that I followed:

Having the model and reference sample outputs: y and y_b
Using y and y_b (and also gold data) to compute some kind of reward signals.
Having y into the ppo_trainer's step function, where the sampled response will be fed into the model and the reference to get log_probs, and then the rest of the calculation as it is (e.g., kl-divergence, etc.).

Problem: I'm facing a problem at that the model (BART-based) generates weird/strange/non-sense text after a few samples in the training are visited.

Given some issues and solutions proposed, I came to a fixed generation kwargs. Here's the generation kwargs used for sampling:

generation_kwargs = {
    "min_length": -1,
    "top_k": 0.0,
    "top_p": 1.0,
    "do_sample": True,
    "max_length": 128,
    "eos_token_id": -1
}

Here's the sample screenshot:

Any solution to fix?

The text was updated successfully, but these errors were encountered:

github-actions · 2024-06-08T15:05:08Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Seq2seq model with ppo_trainer samples strange output! #1633

Seq2seq model with ppo_trainer samples strange output! #1633

sajastu commented May 8, 2024

github-actions bot commented Jun 8, 2024

Seq2seq model with ppo_trainer samples strange output! #1633

Seq2seq model with ppo_trainer samples strange output! #1633

Comments

sajastu commented May 8, 2024

github-actions bot commented Jun 8, 2024