forked from NVIDIA/Megatron-LM
-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LLaMAModel._causal_lm_process中的labels和logits对齐方法疑问 #53
Comments
如果用的是Megatron-LM 原生的dataset 可以不用移位。具体要看label 和样本是怎么对应的。 |
那是否可以不使用self._causal_lm_process()来计算loss,仍然使用post_language_model_processing()来计算loss呢? |
可以,但我们发现这个实现和HuggingFace 上的实现计算结果有些差异,需要你自己评估。 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
# [invalid] Shift so that tokens < n predict n
# Do not need to shift here
shift_logits = logits[..., :-1, :].contiguous()
shift_labels = labels[..., :-1].contiguous()
其中还有必要把最后一位给抹掉吗?是不是直接可以取整个序列呢?
The text was updated successfully, but these errors were encountered: