Ability to store generation logits and vals for training #1535

ejmejm · 2024-04-13T04:23:58Z

When using the PPO trainer, a user will generally generate responses with PPOTrainer.generate(). The input queries and resulting responses are then passed to PPOTrainer.step() to train. At the start of training, the initial logits and values of the these input sequences are again calculated in PPOTrainer.step() (line 1129 in this branch). The majority of this computation is a repeat of what was computed during the call to PPOTrainer.generate(), which leaves an opportunity to save the computation cost of one forward pass per training batch. This optimization was previous requested in ticket #848.

This change adds the ability to return the values and logits during generation, so that they can be fed back to PPOTrainer.step(), and reused to save on computation. Example usage:

response_tensors, values_and_logits = ppo_trainer.generate(
    query_tensors,
    return_prompt=False,
    return_values_and_logits=True,
    **generation_kwargs,
)

ppo_trainer.step(query_tensors, response_tensors, reward, values_and_logits=values_and_logits)

This is an easy addition to anyone who wants to save compute. The return_values_and_logits and values_and_logits arguments of these functions are optional, so using these functions without change is also not an issue.

vwxyzjn · 2024-04-17T13:57:43Z

@ejmejm I believe https://github.com/vwxyzjn/trl/blob/61e39010bd660fbddb9cff6a1f50d347ae375f9e/trl/trainer/ppov2_bandit_rloo_trainer.py#L461-L481 does what you are thinking? Ah I also realize the latest transformer just supported the output_logits which previously I needed to do output_score.

github-actions · 2024-05-13T15:05:51Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Ability to store generation logits and vals for training

c4d69e7

ejmejm mentioned this pull request Apr 13, 2024

PPO Performance improvement by reducing the number of model calls #848

Closed

github-actions bot closed this May 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ability to store generation logits and vals for training #1535

Ability to store generation logits and vals for training #1535

ejmejm commented Apr 13, 2024

vwxyzjn commented Apr 17, 2024

github-actions bot commented May 13, 2024

Ability to store generation logits and vals for training #1535

Ability to store generation logits and vals for training #1535

Conversation

ejmejm commented Apr 13, 2024

vwxyzjn commented Apr 17, 2024

github-actions bot commented May 13, 2024