Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

太乙IDEA-CCNL/Taiyi-CLIP-Roberta-102M-Chinese在训练的过程如何做到中英文兼顾的? #455

Open
Klaus-Chow opened this issue Mar 13, 2024 · 0 comments

Comments

@Klaus-Chow
Copy link

我看了tokenizer对于中文的是20000+,整个中英文标记符什么的是70000+,但是我在使用transformers里面的berttoken后里面的nn.embedding(20000,768),我很好奇那对于英文是怎么同时兼容的,有大神讲讲吗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant