数据加载出现问题：pad_sequence(): argument 'padding_value' (position 3) must be float, not NoneType #405

yanxp · 2023-11-13T12:08:57Z

提交前必须检查以下项目

请确保使用的是仓库最新代码（git pull），一些问题已被解决和修复。
我已阅读项目文档和FAQ章节并且已在Issue中对问题进行了搜索，没有找到相似问题和解决方案。
第三方插件问题：例如llama.cpp、LangChain、text-generation-webui等，同时建议到对应的项目中查找解决方案。

问题类型

其他问题

基础模型

Chinese-Alpaca-2 (7B/13B)

操作系统

Linux

详细描述问题

lr=1e-4
lora_rank=64
lora_alpha=128
lora_trainable="q_proj,v_proj,k_proj,o_proj,gate_proj,down_proj,up_proj"
modules_to_save="embed_tokens,lm_head"
lora_dropout=0.05

pretrained_model=/mnt/wx_feature/home/chopinyan/codework/Chinese-LLaMA-Alpaca-2/scripts/chinese-alpaca-2-7b/
chinese_tokenizer_path=/mnt/wx_feature/home/chopinyan/codework/Chinese-LLaMA-Alpaca-2/scripts/chinese-alpaca-2-7b/
dataset_dir=/mnt/wx_feature/home/chopinyan/codework/Chinese-LLaMA-Alpaca-2/scripts/firefly/
per_device_train_batch_size=1
per_device_eval_batch_size=1
gradient_accumulation_steps=1
max_seq_length=512
output_dir=output_dir
validation_file=/mnt/wx_feature/home/chopinyan/codework/Chinese-LLaMA-Alpaca-2/scripts/firefly/alpaca_data-0-3252.json

deepspeed_config_file=ds_zero2_no_offload.json

torchrun --nnodes 1 --nproc_per_node 1 run_clm_sft_with_peft.py \
    --deepspeed ${deepspeed_config_file} \
    --model_name_or_path ${pretrained_model} \
    --tokenizer_name_or_path ${chinese_tokenizer_path} \
    --dataset_dir ${dataset_dir} \
    --per_device_train_batch_size ${per_device_train_batch_size} \
    --per_device_eval_batch_size ${per_device_eval_batch_size} \
    --do_train \    --do_eval \
    --seed $RANDOM \
    --fp16 \
    --num_train_epochs 1 \
    --lr_scheduler_type cosine \
    --learning_rate ${lr} \
    --warmup_ratio 0.03 \
    --weight_decay 0 \
    --logging_strategy steps \
    --logging_steps 10 \
    --save_strategy steps \
    --save_total_limit 3 \
    --evaluation_strategy steps \
    --eval_steps 100 \
    --save_steps 200 \
    --gradient_accumulation_steps ${gradient_accumulation_steps} \
    --preprocessing_num_workers 8 \
    --max_seq_length ${max_seq_length} \
    --output_dir ${output_dir} \
    --overwrite_output_dir \
    --ddp_timeout 30000 \
    --logging_first_step True \
    --lora_rank ${lora_rank} \
    --lora_alpha ${lora_alpha} \
    --trainable ${lora_trainable} \
    --lora_dropout ${lora_dropout} \
    --modules_to_save ${modules_to_save} \
    --torch_dtype float16 \
    --validation_file ${validation_file} \
    --load_in_kbits 16 \
    --gradient_checkpointing \
    --ddp_find_unused_parameters False

依赖情况（代码类问题务必提供）

正常依赖

运行日志或截图

[INFO|trainer.py:386] 2023-11-13 20:05:21,910 >> You have loaded a model on multiple GPUs. `is_model_parallel` attribu
te will be force-set to `True` to avoid any unexpected behavior such as device placement mismatching.
Traceback (most recent call last):
  File "run_clm_sft_with_peft.py", line 531, in <module>
    main()
  File "run_clm_sft_with_peft.py", line 504, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/usr/local/python/lib/python3.8/site-packages/transformers/trainer.py", line 1544, in train
    return inner_training_loop(
  File "/usr/local/python/lib/python3.8/site-packages/transformers/trainer.py", line 1558, in _inner_training_loop
    train_dataloader = self.get_train_dataloader()
  File "/usr/local/python/lib/python3.8/site-packages/transformers/trainer.py", line 856, in get_train_dataloader
    for step, inputs in enumerate(train_dataloader):
  File "/usr/local/python/lib/python3.8/site-packages/accelerate/data_loader.py", line 451, in __iter__
    current_batch = next(dataloader_iter)
  File "/usr/local/python/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 634, in __next__
    data = self._next_data()
  File "/usr/local/python/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 678, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/usr/local/python/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch
    return self.collate_fn(data)
  File "/mnt/wx_feature/home/chopinyan/codework/Chinese-LLaMA-Alpaca-2/scripts/training/build_dataset.py", line 96, in
 __call__
    input_ids = torch.nn.utils.rnn.pad_sequence(
  File "/usr/local/python/lib/python3.8/site-packages/torch/nn/utils/rnn.py", line 399, in pad_sequence
    return torch._C._nn.pad_sequence(sequences, batch_first, padding_value)
TypeError: pad_sequence(): argument 'padding_value' (position 3) must be float, not NoneType
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 129986) of binary: /usr/l
ocal/python/bin/python3.8

The text was updated successfully, but these errors were encountered:

iMountTai · 2023-11-14T14:19:41Z

像是cache生成有问题，删掉cache重新生成一下试试

yanxp · 2023-11-15T02:22:50Z

像是cache生成有问题，删掉cache重新生成一下试试

删了重试，还是这个错误

iMountTai · 2023-11-15T12:13:38Z

不是很确定问题所在，代码在生成cache的过程中可能因为内存不足而程序失败的问题，重提就能解决，但是生成cache后训练中出现问题还没遇到

iMountTai · 2023-11-16T10:39:49Z

你实际输入的tokenizer不是我们发布的tokenizer，然后你的tokenizer.model中没有pad_token这个选项，所以会出现这个错误。

yusufcakmakk · 2023-11-22T10:42:30Z

I think we can add the following code block to sft trainer.

DEFAULT_PAD_TOKEN = "<pad>"
if tokenizer.pad_token is None:
   print(f"Adding pad token {DEFAULT_PAD_TOKEN}")
   tokenizer.add_special_tokens(dict(pad_token=DEFAULT_PAD_TOKEN))

yusufcakmakk · 2023-11-22T11:01:31Z

I have created pr for this issue.

github-actions · 2023-12-06T22:04:23Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your consideration.

github-actions · 2023-12-14T22:04:32Z

Closing the issue, since no updates observed. Feel free to re-open if you need any further assistance.

Shajiu · 2024-04-12T08:04:15Z

你实际输入的tokenizer不是我们发布的tokenizer，然后你的tokenizer.model中没有pad_token这个选项，所以会出现这个错误。

自己训练词汇表时，怎么加入这个pad_token？

Shajiu · 2024-04-12T08:05:44Z

你实际输入的tokenizer不是我们发布的tokenizer，然后你的tokenizer.model中没有pad_token这个选项，所以会出现这个错误。

我的打印后有如下：

Generate config GenerationConfig {
  "bos_token_id": 1,
  "do_sample": true,
  "eos_token_id": 2,
  "max_length": 4096,
  "pad_token_id": 0,
  "temperature": 0.6,
  "top_p": 0.9
}

但是也出现问题了；

github-actions bot added the stale label Dec 6, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

数据加载出现问题：pad_sequence(): argument 'padding_value' (position 3) must be float, not NoneType #405

数据加载出现问题：pad_sequence(): argument 'padding_value' (position 3) must be float, not NoneType #405

yanxp commented Nov 13, 2023 •

edited

iMountTai commented Nov 14, 2023

yanxp commented Nov 15, 2023

iMountTai commented Nov 15, 2023

iMountTai commented Nov 16, 2023

yusufcakmakk commented Nov 22, 2023

yusufcakmakk commented Nov 22, 2023

github-actions bot commented Dec 6, 2023

github-actions bot commented Dec 14, 2023

Shajiu commented Apr 12, 2024

Shajiu commented Apr 12, 2024

数据加载出现问题：pad_sequence(): argument 'padding_value' (position 3) must be float, not NoneType #405

数据加载出现问题：pad_sequence(): argument 'padding_value' (position 3) must be float, not NoneType #405

Comments

yanxp commented Nov 13, 2023 • edited

提交前必须检查以下项目

问题类型

基础模型

操作系统

详细描述问题

依赖情况（代码类问题务必提供）

运行日志或截图

iMountTai commented Nov 14, 2023

yanxp commented Nov 15, 2023

iMountTai commented Nov 15, 2023

iMountTai commented Nov 16, 2023

yusufcakmakk commented Nov 22, 2023

yusufcakmakk commented Nov 22, 2023

github-actions bot commented Dec 6, 2023

github-actions bot commented Dec 14, 2023

Shajiu commented Apr 12, 2024

Shajiu commented Apr 12, 2024

yanxp commented Nov 13, 2023 •

edited