-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
分类算法sentence句子编码的时候,没理解到mask处理逻辑 #158
Comments
cwqJim2023
changed the title
分类算法sentence句子编码的时候,没看到maskbufe
分类算法sentence句子编码的时候,没理解到mask处理逻辑
Nov 14, 2023
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
提问时请尽可能提供如下信息:
基本信息
核心代码
# 请在此处贴上你的核心代码
def collate_fn(batch):
batch_token_ids, batch_segment_ids, batch_labels = [], [], []
for text, label in batch:
token_ids, segment_ids = tokenizer.encode(text, maxlen=maxlen)
batch_token_ids.append(token_ids)
batch_segment_ids.append(segment_ids)
batch_labels.append([label])
加载数据集
train_dataloader = DataLoader(MyDataset(['E:/data/corpus/sentence_classification/sentiment/sentiment.train.data']), batch_size=batch_size, shuffle=True, collate_fn=collate_fn)
valid_dataloader = DataLoader(MyDataset(['E:/data/corpus/sentence_classification/sentiment/sentiment.valid.data']), batch_size=batch_size, collate_fn=collate_fn)
test_dataloader = DataLoader(MyDataset(['E:/data/corpus/sentence_classification/sentiment/sentiment.test.data']), batch_size=batch_size, collate_fn=collate_fn)
请问 token_ids, segment_ids = tokenizer.encode(text, maxlen=maxlen),mask部分是怎么处理的?
输出信息
# 请在此处贴上你的调试输出
自我尝试
此处请贴上你的自我尝试过程
The text was updated successfully, but these errors were encountered: