Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

请问chatglm模型Lora微调完成之后,如何加载新模型? #72

Open
waynetest2024 opened this issue Apr 7, 2024 · 8 comments
Assignees

Comments

@waynetest2024
Copy link

请问chatglm模型Lora微调完成之后,如何加载新模型?
虽然使用示例中“模型推理”小节的方式可以生成结果,但是我希望能够通过curl或者其他方式直接使用新模型进行推理。尝试了“重新加载”小节给出的实例,但是本地找不到checkpoint-1000文件,希望能够在lora微调.py文件后能补充描述,感谢!

@Hongru0306
Copy link
Contributor

可以先设置save_strategy=5,看一下输出的路径在哪里。curl是指加载你自己lora后的模型吗?这个需要你合并模型后将相应部分推到modelscope或者hf上才可以的。

@waynetest2024
Copy link
Author

好的,我再试一下。之前试过调用save_pretrained()一直报错,提示不是json格式。
curl指的是“ChatGLM3-6B FastApi 部署调用”这一章里面介绍的办法,不过这个应该不是大问题,主要是前一步没有解决。
感谢回复!

@waynetest2024
Copy link
Author

可以先设置save_strategy=5,看一下输出的路径在哪里。curl是指加载你自己lora后的模型吗?这个需要你合并模型后将相应部分推到modelscope或者hf上才可以的。

现在这个参数改为save_strategy='epoch'了,不过加上去之后依然报错,报错信息如下。感觉像是版本的问题,不过python包我都是按照项目示例给的版本安装的。
Traceback (most recent call last):
File "train.py", line 79, in
trainer.train()
File "/root/miniconda3/lib/python3.8/site-packages/transformers/trainer.py", line 1539, in train
return inner_training_loop(
File "/root/miniconda3/lib/python3.8/site-packages/transformers/trainer.py", line 1944, in _inner_training_loop
self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
File "/root/miniconda3/lib/python3.8/site-packages/transformers/trainer.py", line 2302, in _maybe_log_save_evaluate
self._save_checkpoint(model, trial, metrics=metrics)
File "/root/miniconda3/lib/python3.8/site-packages/transformers/trainer.py", line 2378, in _save_checkpoint
self.save_model(staging_output_dir, _internal_call=True)
File "/root/miniconda3/lib/python3.8/site-packages/transformers/trainer.py", line 2886, in save_model
self._save(output_dir)
File "/root/miniconda3/lib/python3.8/site-packages/transformers/trainer.py", line 2958, in _save
self.model.save_pretrained(
File "/root/miniconda3/lib/python3.8/site-packages/peft/peft_model.py", line 201, in save_pretrained
peft_config.save_pretrained(output_dir, auto_mapping_dict=auto_mapping_dict)
File "/root/miniconda3/lib/python3.8/site-packages/peft/utils/config.py", line 92, in save_pretrained
writer.write(json.dumps(output_dict, indent=2, sort_keys=True))
File "/root/miniconda3/lib/python3.8/json/init.py", line 234, in dumps
return cls(
File "/root/miniconda3/lib/python3.8/json/encoder.py", line 201, in encode
chunks = list(chunks)
File "/root/miniconda3/lib/python3.8/json/encoder.py", line 431, in _iterencode
yield from _iterencode_dict(o, _current_indent_level)
File "/root/miniconda3/lib/python3.8/json/encoder.py", line 405, in _iterencode_dict
yield from chunks
File "/root/miniconda3/lib/python3.8/json/encoder.py", line 438, in _iterencode
o = _default(o)
File "/root/miniconda3/lib/python3.8/json/encoder.py", line 179, in default
raise TypeError(f'Object of type {o.class.name} '
TypeError: Object of type set is not JSON serializable
100%|██████████| 466/466 [07:32<00:00, 1.03it/s]

@liyunhan
Copy link

@waynetest2024 我想知道您微调多大的模型,多少数据大概用了多久? 我1W的训练数据,LoRA微调qwen1.5-32b-chat在A6000上慢的要死....batch我设置的16,一个batch就恨不得一分钟

@waynetest2024
Copy link
Author

@waynetest2024 我想知道您微调多大的模型,多少数据大概用了多久? 我1W的训练数据,LoRA微调qwen1.5-32b-chat在A6000上慢的要死....batch我设置的16,一个batch就恨不得一分钟

就是demo里的模型和数据,chatglm3-6b、huanhuan.json,4090上几分钟跑一趟吧。我只是熟悉下基本流程,要求比较低

@Hongru0306
Copy link
Contributor

现在这个参数改为save_strategy='epoch'了,不过加上去之后依然报错,报错信息如下。感觉像是版本的问题,不过python包我都是按照项目示例给的版本安装的。

您好,测试的话不用设置为epoch的,直接设置为iter,然后每5个iter就保存一下,看看是否正常。如果实在解决不了,后续我会创一个没问题的环境推到autodl上,更新后附到repo的相关链接中。

@Hongru0306
Copy link
Contributor

可以先设置save_strategy=5,看一下输出的路径在哪里。curl是指加载你自己lora后的模型吗?这个需要你合并模型后将相应部分推到modelscope或者hf上才可以的。

现在这个参数改为save_strategy='epoch'了,不过加上去之后依然报错,报错信息如下。感觉像是版本的问题,不过python包我都是按照项目示例给的版本安装的。 Traceback (most recent call last): File "train.py", line 79, in trainer.train() File "/root/miniconda3/lib/python3.8/site-packages/transformers/trainer.py", line 1539, in train return inner_training_loop( File "/root/miniconda3/lib/python3.8/site-packages/transformers/trainer.py", line 1944, in _inner_training_loop self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval) File "/root/miniconda3/lib/python3.8/site-packages/transformers/trainer.py", line 2302, in _maybe_log_save_evaluate self._save_checkpoint(model, trial, metrics=metrics) File "/root/miniconda3/lib/python3.8/site-packages/transformers/trainer.py", line 2378, in _save_checkpoint self.save_model(staging_output_dir, _internal_call=True) File "/root/miniconda3/lib/python3.8/site-packages/transformers/trainer.py", line 2886, in save_model self._save(output_dir) File "/root/miniconda3/lib/python3.8/site-packages/transformers/trainer.py", line 2958, in _save self.model.save_pretrained( File "/root/miniconda3/lib/python3.8/site-packages/peft/peft_model.py", line 201, in save_pretrained peft_config.save_pretrained(output_dir, auto_mapping_dict=auto_mapping_dict) File "/root/miniconda3/lib/python3.8/site-packages/peft/utils/config.py", line 92, in save_pretrained writer.write(json.dumps(output_dict, indent=2, sort_keys=True)) File "/root/miniconda3/lib/python3.8/json/init.py", line 234, in dumps return cls( File "/root/miniconda3/lib/python3.8/json/encoder.py", line 201, in encode chunks = list(chunks) File "/root/miniconda3/lib/python3.8/json/encoder.py", line 431, in _iterencode yield from _iterencode_dict(o, _current_indent_level) File "/root/miniconda3/lib/python3.8/json/encoder.py", line 405, in _iterencode_dict yield from chunks File "/root/miniconda3/lib/python3.8/json/encoder.py", line 438, in _iterencode o = _default(o) File "/root/miniconda3/lib/python3.8/json/encoder.py", line 179, in default raise TypeError(f'Object of type {o.class.name} ' TypeError: Object of type set is not JSON serializable 100%|██████████| 466/466 [07:32<00:00, 1.03it/s]

还有一个解决思路,使用提供的ipynb进行训练,然后在训练结束后手动保存model的权重,看下保存路径在哪里。

@waynetest2024
Copy link
Author

现在这个参数改为save_strategy='epoch'了,不过加上去之后依然报错,报错信息如下。感觉像是版本的问题,不过python包我都是按照项目示例给的版本安装的。

您好,测试的话不用设置为epoch的,直接设置为iter,然后每5个iter就保存一下,看看是否正常。如果实在解决不了,后续我会创一个没问题的环境推到autodl上,更新后附到repo的相关链接中。

哦哦,但是设置save_strategy=5会直接报错

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants