Do we need to consider the chat template when doing DPO/KTO training? #1640

ZeroYuHuang · 2024-05-11T12:53:45Z

Hi! I closely check the data processing in the KTO trainer and DPO trainer. I find that the chatting template is not considered when combining the prompt and its completion.

For DPOTrainer: full_tokenized = self.tokenizer(prompt + answer, add_special_tokens=False)
And for KTOTrainer: prompt_and_completion = [prompt + completion for prompt, completion in zip(batch["prompt"], batch["completion"])]

Do we need to consider the chat template when combining them, something like: [INST] + prompt + [\INST] + completion.

The text was updated successfully, but these errors were encountered:

younesbelkada · 2024-05-23T09:58:12Z

cc @kashif @kawine

kashif · 2024-05-23T10:04:28Z

let me check! Thanks for the report!

yaoxiao1999 · 2024-06-12T12:18:30Z

let me check! Thanks for the report!

Hi, didn't mean to chase you up, but do you have any updates on this question? Thanks!

younesbelkada added DPO Question related to DPO and DPOTrainer KTO labels May 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do we need to consider the chat template when doing DPO/KTO training? #1640

Do we need to consider the chat template when doing DPO/KTO training? #1640

ZeroYuHuang commented May 11, 2024

younesbelkada commented May 23, 2024

kashif commented May 23, 2024

yaoxiao1999 commented Jun 12, 2024

Do we need to consider the chat template when doing DPO/KTO training? #1640

Do we need to consider the chat template when doing DPO/KTO training? #1640

Comments

ZeroYuHuang commented May 11, 2024

younesbelkada commented May 23, 2024

kashif commented May 23, 2024

yaoxiao1999 commented Jun 12, 2024