LLaMAModel._causal_lm_process中的labels和logits对齐方法疑问 #53

chivychao · 2023-12-08T07:28:58Z

# [invalid] Shift so that tokens < n predict n
# Do not need to shift here
shift_logits = logits[..., :-1, :].contiguous()
shift_labels = labels[..., :-1].contiguous()

其中还有必要把最后一位给抹掉吗？是不是直接可以取整个序列呢？

The text was updated successfully, but these errors were encountered:

li-yi-dong · 2023-12-08T09:34:59Z

如果用的是Megatron-LM 原生的dataset 可以不用移位。具体要看label 和样本是怎么对应的。

chivychao · 2023-12-12T08:10:48Z

那是否可以不使用self._causal_lm_process()来计算loss，仍然使用post_language_model_processing()来计算loss呢？

li-yi-dong · 2023-12-12T08:15:39Z

那是否可以不使用self._causal_lm_process()来计算loss，仍然使用post_language_model_processing()来计算loss呢？

可以，但我们发现这个实现和HuggingFace 上的实现计算结果有些差异，需要你自己评估。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLaMAModel._causal_lm_process中的labels和logits对齐方法疑问 #53

LLaMAModel._causal_lm_process中的labels和logits对齐方法疑问 #53

chivychao commented Dec 8, 2023 •

edited

li-yi-dong commented Dec 8, 2023

chivychao commented Dec 12, 2023

li-yi-dong commented Dec 12, 2023

LLaMAModel._causal_lm_process中的labels和logits对齐方法疑问 #53

LLaMAModel._causal_lm_process中的labels和logits对齐方法疑问 #53

Comments

chivychao commented Dec 8, 2023 • edited

li-yi-dong commented Dec 8, 2023

chivychao commented Dec 12, 2023

li-yi-dong commented Dec 12, 2023

chivychao commented Dec 8, 2023 •

edited