Adds Online DPO #1605

edbeeching · 2024-04-30T19:46:00Z

WIP ignore for now

usage

accelerate launch --config_file deepspeed_zero3.yaml examples/scripts/dpo_online.py ----model_name_or_path=HuggingFaceH4/mistral-7b-ift --model_revision=v25.2 --output_dir=data/mistral-7b-odpo --dataset_name=HuggingFaceH4/ultrafeedback_binarized --dataset_train_split=train_gen --dataset_test_split=test_gen --gradient_accumulation_steps=1 --bf16=True --attn_implementation=flash_attention_2 --per_device_train_batch_size=2

…pped with deepspeed

olgavrou · 2024-05-02T15:02:13Z

This is cool, I was doing the same but by extending the training_step of the existing dpo trainer and generating the new pairs there before calling super().training_step. This looks like a more complete solution

github-actions · 2024-05-31T15:04:58Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

lewtun and others added 14 commits April 29, 2024 12:31

Add WinRateCallback

42e1170

Enable PairRM

52ae31e

Refactor

2065c3f

Streamline

efa2e21

port initial implementation

2dcd1a7

debugging

7f8916f

make it run

090b59c

testing multi gpu

cbdaa68

Add HF judge

27be35d

moved annotator to class init, cleanup

72f2b77

ensure the judge model is instantiated after the other models are wra…

7cc517c

…pped with deepspeed

Merge branch 'add-winrate-cb' into online-dpo

dc6e5a5

fix merge

bf07417

fix train_sft -> train_gen

3880a7a

lewtun mentioned this pull request May 3, 2024

Fix ZeRO-3 generation context manager #1617

Merged

github-actions bot closed this Jun 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds Online DPO #1605

Adds Online DPO #1605

edbeeching commented Apr 30, 2024 •

edited

olgavrou commented May 2, 2024

github-actions bot commented May 31, 2024

Adds Online DPO #1605

Adds Online DPO #1605

Conversation

edbeeching commented Apr 30, 2024 • edited

olgavrou commented May 2, 2024

github-actions bot commented May 31, 2024

edbeeching commented Apr 30, 2024 •

edited