Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do we need to consider the chat template when doing DPO/KTO training? #1640

Open
ZeroYuHuang opened this issue May 11, 2024 · 3 comments
Open
Labels
DPO Question related to DPO and DPOTrainer KTO

Comments

@ZeroYuHuang
Copy link

Hi! I closely check the data processing in the KTO trainer and DPO trainer. I find that the chatting template is not considered when combining the prompt and its completion.

For DPOTrainer: full_tokenized = self.tokenizer(prompt + answer, add_special_tokens=False)
And for KTOTrainer: prompt_and_completion = [prompt + completion for prompt, completion in zip(batch["prompt"], batch["completion"])]

Do we need to consider the chat template when combining them, something like: [INST] + prompt + [\INST] + completion.

@younesbelkada
Copy link
Collaborator

cc @kashif @kawine

@younesbelkada younesbelkada added DPO Question related to DPO and DPOTrainer KTO labels May 23, 2024
@kashif
Copy link
Collaborator

kashif commented May 23, 2024

let me check! Thanks for the report!

@yaoxiao1999
Copy link

let me check! Thanks for the report!

Hi, didn't mean to chase you up, but do you have any updates on this question? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DPO Question related to DPO and DPOTrainer KTO
Projects
None yet
Development

No branches or pull requests

4 participants