Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select) #89

Open
daiyizheng opened this issue Aug 11, 2023 · 2 comments

Comments

@daiyizheng
Copy link

我使用llama 65b 在多GPU A100 infer.py推理的时候报了错误, 在finetune.py没有问题

@daiyizheng
Copy link
Author

/slurm/home/yrd/shaolab/daiyizheng/.conda/envs/llama/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /slurm/home/yrd/shaolab/daiyizheng/.conda/envs/llama did not contain libcudart.so as expected! Searching further paths...
warn(msg)
/slurm/home/yrd/shaolab/daiyizheng/.conda/envs/llama/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/run/user/80104/vscode-git-4b808d81bf.sock')}
warn(msg)
/slurm/home/yrd/shaolab/daiyizheng/.conda/envs/llama/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/run/user/80104/vscode-ipc-0b328df5-e364-494a-b230-9f7e99271b5b.sock')}
warn(msg)
/slurm/home/yrd/shaolab/daiyizheng/.conda/envs/llama/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('() { eval /usr/bin/modulecmd bash $*\n}')}
warn(msg)
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'LLaMATokenizer'.
The class this function is called from is 'LlamaTokenizer'.
The model weights are not tied. Please use the tie_weights method before using the infer_auto_device function.

Loading checkpoint shards: 0%| | 0/81 [00:00<?, ?it/s]
Loading checkpoint shards: 1%| | 1/81 [00:00<01:08, 1.16it/s]
Loading checkpoint shards: 2%|▏ | 2/81 [00:01<01:11, 1.11it/s]
Loading checkpoint shards: 4%|▎ | 3/81 [00:02<01:13, 1.07it/s]
Loading checkpoint shards: 5%|▍ | 4/81 [00:03<01:10, 1.09it/s]
Loading checkpoint shards: 6%|▌ | 5/81 [00:04<01:09, 1.09it/s]
Loading checkpoint shards: 7%|▋ | 6/81 [00:05<01:09, 1.08it/s]
Loading checkpoint shards: 9%|▊ | 7/81 [00:06<01:10, 1.05it/s]
Loading checkpoint shards: 10%|▉ | 8/81 [00:07<01:07, 1.08it/s]
Loading checkpoint shards: 11%|█ | 9/81 [00:08<01:08, 1.05it/s]
Loading checkpoint shards: 12%|█▏ | 10/81 [00:09<01:06, 1.07it/s]
Loading checkpoint shards: 14%|█▎ | 11/81 [00:10<01:05, 1.06it/s]
Loading checkpoint shards: 15%|█▍ | 12/81 [00:11<01:04, 1.08it/s]
Loading checkpoint shards: 16%|█▌ | 13/81 [00:12<01:02, 1.08it/s]
Loading checkpoint shards: 17%|█▋ | 14/81 [00:13<01:02, 1.07it/s]
Loading checkpoint shards: 19%|█▊ | 15/81 [00:14<01:03, 1.05it/s]
Loading checkpoint shards: 20%|█▉ | 16/81 [00:15<01:02, 1.04it/s]
Loading checkpoint shards: 21%|██ | 17/81 [00:16<01:02, 1.02it/s]
Loading checkpoint shards: 22%|██▏ | 18/81 [00:16<01:01, 1.03it/s]
Loading checkpoint shards: 23%|██▎ | 19/81 [00:17<01:00, 1.03it/s]
Loading checkpoint shards: 25%|██▍ | 20/81 [00:18<00:59, 1.02it/s]
Loading checkpoint shards: 26%|██▌ | 21/81 [00:19<00:58, 1.02it/s]
Loading checkpoint shards: 27%|██▋ | 22/81 [00:20<00:57, 1.02it/s]
Loading checkpoint shards: 28%|██▊ | 23/81 [00:21<00:57, 1.01it/s]
Loading checkpoint shards: 30%|██▉ | 24/81 [00:22<00:55, 1.04it/s]
Loading checkpoint shards: 31%|███ | 25/81 [00:23<00:52, 1.07it/s]
Loading checkpoint shards: 32%|███▏ | 26/81 [00:24<00:51, 1.07it/s]
Loading checkpoint shards: 33%|███▎ | 27/81 [00:25<00:50, 1.07it/s]
Loading checkpoint shards: 35%|███▍ | 28/81 [00:26<00:49, 1.06it/s]
Loading checkpoint shards: 36%|███▌ | 29/81 [00:27<00:48, 1.08it/s]
Loading checkpoint shards: 37%|███▋ | 30/81 [00:28<00:47, 1.07it/s]
Loading checkpoint shards: 38%|███▊ | 31/81 [00:29<00:45, 1.09it/s]
Loading checkpoint shards: 40%|███▉ | 32/81 [00:30<00:45, 1.07it/s]
Loading checkpoint shards: 41%|████ | 33/81 [00:31<00:44, 1.09it/s]
Loading checkpoint shards: 42%|████▏ | 34/81 [00:31<00:42, 1.10it/s]
Loading checkpoint shards: 43%|████▎ | 35/81 [00:32<00:42, 1.09it/s]
Loading checkpoint shards: 44%|████▍ | 36/81 [00:33<00:40, 1.10it/s]
Loading checkpoint shards: 46%|████▌ | 37/81 [00:34<00:38, 1.13it/s]
Loading checkpoint shards: 47%|████▋ | 38/81 [00:35<00:38, 1.12it/s]
Loading checkpoint shards: 48%|████▊ | 39/81 [00:36<00:37, 1.12it/s]
Loading checkpoint shards: 49%|████▉ | 40/81 [00:37<00:37, 1.08it/s]
Loading checkpoint shards: 51%|█████ | 41/81 [00:38<00:36, 1.11it/s]
Loading checkpoint shards: 52%|█████▏ | 42/81 [00:39<00:34, 1.11it/s]
Loading checkpoint shards: 53%|█████▎ | 43/81 [00:40<00:34, 1.10it/s]
Loading checkpoint shards: 54%|█████▍ | 44/81 [00:41<00:34, 1.08it/s]
Loading checkpoint shards: 56%|█████▌ | 45/81 [00:42<00:33, 1.08it/s]
Loading checkpoint shards: 57%|█████▋ | 46/81 [00:43<00:33, 1.06it/s]
Loading checkpoint shards: 58%|█████▊ | 47/81 [00:43<00:31, 1.07it/s]
Loading checkpoint shards: 59%|█████▉ | 48/81 [00:44<00:30, 1.09it/s]
Loading checkpoint shards: 60%|██████ | 49/81 [00:45<00:29, 1.08it/s]
Loading checkpoint shards: 62%|██████▏ | 50/81 [00:46<00:29, 1.07it/s]
Loading checkpoint shards: 63%|██████▎ | 51/81 [00:47<00:27, 1.07it/s]
Loading checkpoint shards: 64%|██████▍ | 52/81 [00:48<00:27, 1.05it/s]
Loading checkpoint shards: 65%|██████▌ | 53/81 [00:49<00:27, 1.03it/s]
Loading checkpoint shards: 67%|██████▋ | 54/81 [00:50<00:25, 1.04it/s]
Loading checkpoint shards: 68%|██████▊ | 55/81 [00:51<00:26, 1.00s/it]
Loading checkpoint shards: 69%|██████▉ | 56/81 [00:52<00:24, 1.02it/s]
Loading checkpoint shards: 70%|███████ | 57/81 [00:53<00:23, 1.02it/s]
Loading checkpoint shards: 72%|███████▏ | 58/81 [00:54<00:22, 1.04it/s]
Loading checkpoint shards: 73%|███████▎ | 59/81 [00:55<00:20, 1.06it/s]
Loading checkpoint shards: 74%|███████▍ | 60/81 [00:56<00:19, 1.06it/s]
Loading checkpoint shards: 75%|███████▌ | 61/81 [00:57<00:19, 1.05it/s]
Loading checkpoint shards: 77%|███████▋ | 62/81 [00:58<00:18, 1.05it/s]
Loading checkpoint shards: 78%|███████▊ | 63/81 [00:59<00:17, 1.04it/s]
Loading checkpoint shards: 79%|███████▉ | 64/81 [01:00<00:15, 1.06it/s]
Loading checkpoint shards: 80%|████████ | 65/81 [01:01<00:14, 1.07it/s]
Loading checkpoint shards: 81%|████████▏ | 66/81 [01:01<00:13, 1.08it/s]
Loading checkpoint shards: 83%|████████▎ | 67/81 [01:02<00:12, 1.10it/s]
Loading checkpoint shards: 84%|████████▍ | 68/81 [01:03<00:12, 1.08it/s]
Loading checkpoint shards: 85%|████████▌ | 69/81 [01:04<00:10, 1.10it/s]
Loading checkpoint shards: 86%|████████▋ | 70/81 [01:05<00:10, 1.09it/s]
Loading checkpoint shards: 88%|████████▊ | 71/81 [01:06<00:09, 1.07it/s]
Loading checkpoint shards: 89%|████████▉ | 72/81 [01:07<00:08, 1.09it/s]
Loading checkpoint shards: 90%|█████████ | 73/81 [01:08<00:07, 1.05it/s]
Loading checkpoint shards: 91%|█████████▏| 74/81 [01:09<00:06, 1.07it/s]
Loading checkpoint shards: 93%|█████████▎| 75/81 [01:10<00:05, 1.08it/s]
Loading checkpoint shards: 94%|█████████▍| 76/81 [01:11<00:04, 1.09it/s]
Loading checkpoint shards: 95%|█████████▌| 77/81 [01:12<00:03, 1.08it/s]
Loading checkpoint shards: 96%|█████████▋| 78/81 [01:13<00:02, 1.05it/s]
Loading checkpoint shards: 98%|█████████▊| 79/81 [01:14<00:01, 1.06it/s]
Loading checkpoint shards: 99%|█████████▉| 80/81 [01:15<00:00, 1.06it/s]
Loading checkpoint shards: 100%|██████████| 81/81 [01:15<00:00, 1.21it/s]
Loading checkpoint shards: 100%|██████████| 81/81 [01:15<00:00, 1.07it/s]
Traceback (most recent call last):
File "/slurm/home/yrd/shaolab/daiyizheng/nlp/Huatuo-Llama-Med-Chinese/infer.py", line 132, in
fire.Fire(main)
File "/slurm/home/yrd/shaolab/daiyizheng/.conda/envs/llama/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/slurm/home/yrd/shaolab/daiyizheng/.conda/envs/llama/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/slurm/home/yrd/shaolab/daiyizheng/.conda/envs/llama/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/slurm/home/yrd/shaolab/daiyizheng/nlp/Huatuo-Llama-Med-Chinese/infer.py", line 118, in main
infer_from_json(instruct_dir)
File "/slurm/home/yrd/shaolab/daiyizheng/nlp/Huatuo-Llama-Med-Chinese/infer.py", line 105, in infer_from_json
model_output = evaluate(instruction)
File "/slurm/home/yrd/shaolab/daiyizheng/nlp/Huatuo-Llama-Med-Chinese/infer.py", line 87, in evaluate
generation_output = model.generate(
File "/slurm/home/yrd/shaolab/daiyizheng/.conda/envs/llama/lib/python3.10/site-packages/peft/peft_model.py", line 731, in generate
outputs = self.base_model.generate(**kwargs)
File "/slurm/home/yrd/shaolab/daiyizheng/.conda/envs/llama/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/slurm/home/yrd/shaolab/daiyizheng/.conda/envs/llama/lib/python3.10/site-packages/transformers/generation/utils.py", line 1611, in generate
return self.beam_search(
File "/slurm/home/yrd/shaolab/daiyizheng/.conda/envs/llama/lib/python3.10/site-packages/transformers/generation/utils.py", line 2982, in beam_search
model_kwargs["past_key_values"] = self._reorder_cache(model_kwargs["past_key_values"], beam_idx)
File "/slurm/home/yrd/shaolab/daiyizheng/.conda/envs/llama/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 762, in _reorder_cache
reordered_past += (tuple(past_state.index_select(0, beam_idx) for past_state in layer_past),)
File "/slurm/home/yrd/shaolab/daiyizheng/.conda/envs/llama/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 762, in
reordered_past += (tuple(past_state.index_select(0, beam_idx) for past_state in layer_past),)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)

@daiyizheng
Copy link
Author

daiyizheng commented Aug 11, 2023

我不规范的解决方案:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant