Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG/Help] <Out of range: piece id is out of range.> #438

Closed
1 task done
LiuChen19960902 opened this issue Apr 7, 2023 · 22 comments
Closed
1 task done

[BUG/Help] <Out of range: piece id is out of range.> #438

LiuChen19960902 opened this issue Apr 7, 2023 · 22 comments

Comments

@LiuChen19960902
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

今天早上跟着hf里面更新的一遍模型和几个py文件,然后就开始出这个错误了

Expected Behavior

今天早上跟着hf里面更新的一遍模型和几个py文件,然后就开始出这个错误了

Steps To Reproduce

  • OS:
  • Python:
  • Transformers:
  • PyTorch:
  • CUDA Support (python -c "import torch; print(torch.cuda.is_available())") :

Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

Anything else?

No response

@natureLanguageQing
Copy link

image
我也遇到了这个问题

@duzx16
Copy link
Member

duzx16 commented Apr 7, 2023

如果是在ptuning里遇到这个问题的话,请重新pull一下本仓库的代码

@FeiWard
Copy link

FeiWard commented Apr 8, 2023

重新pull之后还是一样的错误

@zy86603465
Copy link

我也遇到同样的问题,怎么解决呢?

@genius0182
Copy link

@zy86603465
Copy link

@genius0182
我都改成
"bos_token_id": 130004,
"eos_token_id": 130005,
"mask_token_id": 130000,
"gmask_token_id": 130001,
试了一下,还是报原来的错

@Data2Me
Copy link

Data2Me commented Apr 11, 2023

我也遇到了这个问题

@zy86603465
Copy link

这个问题我解决了,把所有的模型文件和配置及代码更新一下就可以了

@duzx16
Copy link
Member

duzx16 commented Apr 12, 2023

我发现https://huggingface.co/THUDM/chatglm-6b/blob/main/configuration_chatglm.py上的这里 imagehttps://huggingface.co/THUDM/chatglm-6b/blob/main/config.json image 数据不一样。不知道是不是这里的问题。 @duzx16

图1是默认的值,图2是实际设置的值,不冲突。

@kingtigerc
Copy link

我今天新拉的代码和模型,也发生这个问题。这是什么原因导致的?

@kingtigerc
Copy link

重新pull,重新拉模型,重新pip install -r requirements.txt,解决了

@lrx1213
Copy link

lrx1213 commented Apr 14, 2023

遇到了同样的问题

@JimXiongGM
Copy link

遇到了同样的问题,应该是tokenizer decode时候有bug,但是再次生成,有时候又不会有问题。

可以尝试加try绕过这个问题:

  retry_cnt=0
  while retry_cnt < 5:
      try:
          response, history = model.chat(
              tokenizer,
              query,
              history=[],
              max_length=512,
              num_beams=5,
              do_sample=True,
              top_p=0.7,
              temperature=0.95,
          )
          break
      except Exception as e1:
          retry_cnt += 1

@zhangyuanscall
Copy link

就是decode源码有问题,我用同一个权重版本的chatglm,尝试多种解码参数,有的解码参数没问题,有的解码参数就有问题

@ysanimals
Copy link

用的是另外的仓库微调的代码,请问应该如何解决这个decode出现问题

@Doufanfan
Copy link

+1, decode源码的这个问题要怎么解决。。。

@godcrying
Copy link

所以,这个issue为什么关闭了?

@diaojunxian
Copy link

所以,这个issue为什么关闭了?

+1 我也遇到了

@yecphaha
Copy link

参考 https://huggingface.co/THUDM/chatglm3-6b/commit/ea563876364622a0a5c24e6b71db0b93a9861ba0#d2h-069285
在 tokenization_chatglm.py 里新增两行代码
Snipaste_2023-11-21_17-42-58

@ayrnb
Copy link

ayrnb commented Nov 30, 2023

+1

@yecphaha
Copy link

yecphaha commented Dec 1, 2023

+1

可以参考 https://huggingface.co/THUDM/chatglm3-6b/commit/ea563876364622a0a5c24e6b71db0b93a9861ba0#d2h-069285
在 tokenization_chatglm.py 里新增两行代码

@greyovo
Copy link

greyovo commented Dec 20, 2023

参考 https://huggingface.co/THUDM/chatglm3-6b/commit/ea563876364622a0a5c24e6b71db0b93a9861ba0#d2h-069285 在 tokenization_chatglm.py 里新增两行代码 Snipaste_2023-11-21_17-42-58

有用。再明确一些的话,对于ChatGLM-6B来说,则是在 tokenization_chatglm.py 中:

class TextTokenizer:
    def __init__(self, model_path):
        self.sp = spm.SentencePieceProcessor()
        self.sp.Load(model_path)
        self.num_tokens = self.sp.vocab_size()
   # .... 省略其他方法

    def convert_id_to_token(self, idx):
        if idx > self.num_tokens:  # 在这里添加判断
            return ""
        return self.sp.IdToPiece(idx)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests