-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
大佬好,请问下数据构造中的特殊token #208
Comments
一个两个都可以,只是加强下结束符。 |
谢谢大佬,那请问第二个问题呢?不用换行符的话,更好一点吗? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
看到在TokenTruncation.process()中构造input_ids时,拼完a和b之后,在句尾添加了两个。
请问:
1.为什么需要两个呢,一个会怎么样?
2.如果我在句子a中需要一个特殊token来分隔一下a中的上下两句,请问选哪个好一些呢?我看ChatGLM tokenizer的特殊token只有<eop> <pad> <sop> <unk>和[MASK]
感谢🙏
The text was updated successfully, but these errors were encountered: