You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[2024-05-02 06:46:54,788] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.2
[WARNING] using untested triton version (2.2.0), only 1.0.0 is known to be compatible
[2024-05-02 06:46:56,799] torch.distributed.run: [WARNING]
[2024-05-02 06:46:56,799] torch.distributed.run: [WARNING] *****************************************
[2024-05-02 06:46:56,799] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
[2024-05-02 06:46:56,799] torch.distributed.run: [WARNING] *****************************************
[2024-05-02 06:47:02,228] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-05-02 06:47:02,234] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-05-02 06:47:02,239] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-05-02 06:47:02,241] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.2
[WARNING] using untested triton version (2.2.0), only 1.0.0 is known to be compatible
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.2
[WARNING] using untested triton version (2.2.0), only 1.0.0 is known to be compatible
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.2
[WARNING] using untested triton version (2.2.0), only 1.0.0 is known to be compatible
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.2
[WARNING] using untested triton version (2.2.0), only 1.0.0 is known to be compatible
[2024-05-02 06:47:02,967] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-05-02 06:47:02,967] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-05-02 06:47:02,972] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-05-02 06:47:02,991] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-05-02 06:47:02,992] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
`low_cpu_mem_usage` was None, now set to True since model is quantized.
`low_cpu_mem_usage` was None, now set to True since model is quantized.
`low_cpu_mem_usage` was None, now set to True since model is quantized.
`low_cpu_mem_usage` was None, now set to True since model is quantized.
Loading checkpoint shards: 100%|██████████████████████████████| 30/30 [01:25<00:00, 2.86s/it]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading checkpoint shards: 100%|██████████████████████████████| 30/30 [01:27<00:00, 2.91s/it]
Loading checkpoint shards: 100%|██████████████████████████████| 30/30 [01:27<00:00, 2.93s/it]
Loading checkpoint shards: 100%|██████████████████████████████| 30/30 [01:27<00:00, 2.93s/it]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Size of the train set: 10000. Size of the validation set: 2000
A sample of train dataset: {'content': "<|im_start|>user\nThese instructions apply to section-based themes (Responsive 6.0+, Retina 4.0+, Parallax 3.0+ Turbo 2.0+, Mobilia 5.0+). What theme version am I using?\nOn your Collections pages & Featured Collections sections, you can easily show the secondary image of a product on hover by enabling one of the theme's built-in settings!\nYour Collection pages & Featured Collections sections will now display the secondary product image just by hovering over that product image thumbnail.\nDoes this feature apply to all sections of the theme or just specific ones as listed in the text material?<|im_end|>\n<|im_start|>assistant\nThis feature only applies to Collection pages and Featured Collections sections of the section-based themes listed in the text material.<|im_end|>\n<|im_start|>user\nCan you guide me through the process of enabling the secondary image hover feature on my Collection pages and Featured Collections sections?<|im_end|>\n<|im_start|>assistant\nSure, here are the steps to enable the secondary image hover feature on your Collection pages and Featured Collections sections:\n\n1. Log in to your Shopify account and go to your Online Store.\n2. Click on Customize theme for the section-based theme you are using.\n3. Navigate to the Collection pages or Featured Collections section where you want to enable the secondary image hover feature.\n4. With the section open, click on the section's setting (gear) icon in the top-left corner.\n5. In the settings panel that appears, look for an option labeled 'Image display' or 'Image hover'.\n6. If available, select 'Show secondary image on hover'.\n7. Save the changes and preview the Collection/Featured Collection page to see the effect.\n\nIf you're having trouble finding the setting, the best thing to do is refer to your theme's documentation, since the location and labeling of settings can vary between themes.<|im_end|>\n<|im_start|>user\nCan you provide me with a link to the documentation for my theme?<|im_end|>\n<|im_start|>assistant\nI don't have access to your store's theme information. However, you can usually find the documentation for your theme by going to the shopify theme store, finding your theme and clicking on the 'support' link located in the bottom right corner of the page. Alternatively, you can do a google search for the name of your theme followed by 'documentation' or 'user guide'.<|im_end|>\n<|im_start|>user\nCan you confirm if this feature also works for the Quick Shop section of my theme?<|im_end|>\n<|im_start|>assistant\nThe secondary image hover feature may or may not work for your Quick Shop section, depending on the configuration of your theme. Some themes include this feature in the Quick Shop section by default, while others may require additional customization. To check if this feature is available for the Quick Shop section of your theme, follow these steps:\n\n1. Go to the Quick Shop section where you would like to enable the feature. 2. Click on the Quick Shop settings icon (gear icon) and look for 'Image display' or 'Image hover'. 3. If available, select 'Show secondary image on hover'. 4. Save the changes. If this option is not available in your Quick Shop section settings, you may need to reach out to your theme developer for assistance with customizing your Quick Shop section to include this feature.<|im_end|>\n"}
Size of the train set: 10000. Size of the validation set: 2000
A sample of train dataset: {'content': "<|im_start|>user\nThese instructions apply to section-based themes (Responsive 6.0+, Retina 4.0+, Parallax 3.0+ Turbo 2.0+, Mobilia 5.0+). What theme version am I using?\nOn your Collections pages & Featured Collections sections, you can easily show the secondary image of a product on hover by enabling one of the theme's built-in settings!\nYour Collection pages & Featured Collections sections will now display the secondary product image just by hovering over that product image thumbnail.\nDoes this feature apply to all sections of the theme or just specific ones as listed in the text material?<|im_end|>\n<|im_start|>assistant\nThis feature only applies to Collection pages and Featured Collections sections of the section-based themes listed in the text material.<|im_end|>\n<|im_start|>user\nCan you guide me through the process of enabling the secondary image hover feature on my Collection pages and Featured Collections sections?<|im_end|>\n<|im_start|>assistant\nSure, here are the steps to enable the secondary image hover feature on your Collection pages and Featured Collections sections:\n\n1. Log in to your Shopify account and go to your Online Store.\n2. Click on Customize theme for the section-based theme you are using.\n3. Navigate to the Collection pages or Featured Collections section where you want to enable the secondary image hover feature.\n4. With the section open, click on the section's setting (gear) icon in the top-left corner.\n5. In the settings panel that appears, look for an option labeled 'Image display' or 'Image hover'.\n6. If available, select 'Show secondary image on hover'.\n7. Save the changes and preview the Collection/Featured Collection page to see the effect.\n\nIf you're having trouble finding the setting, the best thing to do is refer to your theme's documentation, since the location and labeling of settings can vary between themes.<|im_end|>\n<|im_start|>user\nCan you provide me with a link to the documentation for my theme?<|im_end|>\n<|im_start|>assistant\nI don't have access to your store's theme information. However, you can usually find the documentation for your theme by going to the shopify theme store, finding your theme and clicking on the 'support' link located in the bottom right corner of the page. Alternatively, you can do a google search for the name of your theme followed by 'documentation' or 'user guide'.<|im_end|>\n<|im_start|>user\nCan you confirm if this feature also works for the Quick Shop section of my theme?<|im_end|>\n<|im_start|>assistant\nThe secondary image hover feature may or may not work for your Quick Shop section, depending on the configuration of your theme. Some themes include this feature in the Quick Shop section by default, while others may require additional customization. To check if this feature is available for the Quick Shop section of your theme, follow these steps:\n\n1. Go to the Quick Shop section where you would like to enable the feature. 2. Click on the Quick Shop settings icon (gear icon) and look for 'Image display' or 'Image hover'. 3. If available, select 'Show secondary image on hover'. 4. Save the changes. If this option is not available in your Quick Shop section settings, you may need to reach out to your theme developer for assistance with customizing your Quick Shop section to include this feature.<|im_end|>\n"}
Size of the train set: 10000. Size of the validation set: 2000
A sample of train dataset: {'content': "<|im_start|>user\nThese instructions apply to section-based themes (Responsive 6.0+, Retina 4.0+, Parallax 3.0+ Turbo 2.0+, Mobilia 5.0+). What theme version am I using?\nOn your Collections pages & Featured Collections sections, you can easily show the secondary image of a product on hover by enabling one of the theme's built-in settings!\nYour Collection pages & Featured Collections sections will now display the secondary product image just by hovering over that product image thumbnail.\nDoes this feature apply to all sections of the theme or just specific ones as listed in the text material?<|im_end|>\n<|im_start|>assistant\nThis feature only applies to Collection pages and Featured Collections sections of the section-based themes listed in the text material.<|im_end|>\n<|im_start|>user\nCan you guide me through the process of enabling the secondary image hover feature on my Collection pages and Featured Collections sections?<|im_end|>\n<|im_start|>assistant\nSure, here are the steps to enable the secondary image hover feature on your Collection pages and Featured Collections sections:\n\n1. Log in to your Shopify account and go to your Online Store.\n2. Click on Customize theme for the section-based theme you are using.\n3. Navigate to the Collection pages or Featured Collections section where you want to enable the secondary image hover feature.\n4. With the section open, click on the section's setting (gear) icon in the top-left corner.\n5. In the settings panel that appears, look for an option labeled 'Image display' or 'Image hover'.\n6. If available, select 'Show secondary image on hover'.\n7. Save the changes and preview the Collection/Featured Collection page to see the effect.\n\nIf you're having trouble finding the setting, the best thing to do is refer to your theme's documentation, since the location and labeling of settings can vary between themes.<|im_end|>\n<|im_start|>user\nCan you provide me with a link to the documentation for my theme?<|im_end|>\n<|im_start|>assistant\nI don't have access to your store's theme information. However, you can usually find the documentation for your theme by going to the shopify theme store, finding your theme and clicking on the 'support' link located in the bottom right corner of the page. Alternatively, you can do a google search for the name of your theme followed by 'documentation' or 'user guide'.<|im_end|>\n<|im_start|>user\nCan you confirm if this feature also works for the Quick Shop section of my theme?<|im_end|>\n<|im_start|>assistant\nThe secondary image hover feature may or may not work for your Quick Shop section, depending on the configuration of your theme. Some themes include this feature in the Quick Shop section by default, while others may require additional customization. To check if this feature is available for the Quick Shop section of your theme, follow these steps:\n\n1. Go to the Quick Shop section where you would like to enable the feature. 2. Click on the Quick Shop settings icon (gear icon) and look for 'Image display' or 'Image hover'. 3. If available, select 'Show secondary image on hover'. 4. Save the changes. If this option is not available in your Quick Shop section settings, you may need to reach out to your theme developer for assistance with customizing your Quick Shop section to include this feature.<|im_end|>\n"}
Size of the train set: 10000. Size of the validation set: 2000
A sample of train dataset: {'content': "<|im_start|>user\nThese instructions apply to section-based themes (Responsive 6.0+, Retina 4.0+, Parallax 3.0+ Turbo 2.0+, Mobilia 5.0+). What theme version am I using?\nOn your Collections pages & Featured Collections sections, you can easily show the secondary image of a product on hover by enabling one of the theme's built-in settings!\nYour Collection pages & Featured Collections sections will now display the secondary product image just by hovering over that product image thumbnail.\nDoes this feature apply to all sections of the theme or just specific ones as listed in the text material?<|im_end|>\n<|im_start|>assistant\nThis feature only applies to Collection pages and Featured Collections sections of the section-based themes listed in the text material.<|im_end|>\n<|im_start|>user\nCan you guide me through the process of enabling the secondary image hover feature on my Collection pages and Featured Collections sections?<|im_end|>\n<|im_start|>assistant\nSure, here are the steps to enable the secondary image hover feature on your Collection pages and Featured Collections sections:\n\n1. Log in to your Shopify account and go to your Online Store.\n2. Click on Customize theme for the section-based theme you are using.\n3. Navigate to the Collection pages or Featured Collections section where you want to enable the secondary image hover feature.\n4. With the section open, click on the section's setting (gear) icon in the top-left corner.\n5. In the settings panel that appears, look for an option labeled 'Image display' or 'Image hover'.\n6. If available, select 'Show secondary image on hover'.\n7. Save the changes and preview the Collection/Featured Collection page to see the effect.\n\nIf you're having trouble finding the setting, the best thing to do is refer to your theme's documentation, since the location and labeling of settings can vary between themes.<|im_end|>\n<|im_start|>user\nCan you provide me with a link to the documentation for my theme?<|im_end|>\n<|im_start|>assistant\nI don't have access to your store's theme information. However, you can usually find the documentation for your theme by going to the shopify theme store, finding your theme and clicking on the 'support' link located in the bottom right corner of the page. Alternatively, you can do a google search for the name of your theme followed by 'documentation' or 'user guide'.<|im_end|>\n<|im_start|>user\nCan you confirm if this feature also works for the Quick Shop section of my theme?<|im_end|>\n<|im_start|>assistant\nThe secondary image hover feature may or may not work for your Quick Shop section, depending on the configuration of your theme. Some themes include this feature in the Quick Shop section by default, while others may require additional customization. To check if this feature is available for the Quick Shop section of your theme, follow these steps:\n\n1. Go to the Quick Shop section where you would like to enable the feature. 2. Click on the Quick Shop settings icon (gear icon) and look for 'Image display' or 'Image hover'. 3. If available, select 'Show secondary image on hover'. 4. Save the changes. If this option is not available in your Quick Shop section settings, you may need to reach out to your theme developer for assistance with customizing your Quick Shop section to include this feature.<|im_end|>\n"}
Using auto half precision backend
PeftModelForCausalLM(
(base_model): LoraModel(
(model): LlamaForCausalLM(
(model): LlamaModel(
(embed_tokens): Embedding(128264, 8192)
(layers): ModuleList(
(0-79): 80 x LlamaDecoderLayer(
(self_attn): LlamaFlashAttention2(
(q_proj): lora.Linear4bit(
(base_layer): Linear4bit(in_features=8192, out_features=8192, bias=False)
(lora_dropout): ModuleDict(
(default): Dropout(p=0.1, inplace=False)
)
(lora_A): ModuleDict(
(default): Linear(in_features=8192, out_features=8, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=8, out_features=8192, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
)
(k_proj): lora.Linear4bit(
(base_layer): Linear4bit(in_features=8192, out_features=1024, bias=False)
(lora_dropout): ModuleDict(
(default): Dropout(p=0.1, inplace=False)
)
(lora_A): ModuleDict(
(default): Linear(in_features=8192, out_features=8, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=8, out_features=1024, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
)
(v_proj): lora.Linear4bit(
(base_layer): Linear4bit(in_features=8192, out_features=1024, bias=False)
(lora_dropout): ModuleDict(
(default): Dropout(p=0.1, inplace=False)
)
(lora_A): ModuleDict(
(default): Linear(in_features=8192, out_features=8, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=8, out_features=1024, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
)
(o_proj): lora.Linear4bit(
(base_layer): Linear4bit(in_features=8192, out_features=8192, bias=False)
(lora_dropout): ModuleDict(
(default): Dropout(p=0.1, inplace=False)
)
(lora_A): ModuleDict(
(default): Linear(in_features=8192, out_features=8, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=8, out_features=8192, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
)
(rotary_emb): LlamaRotaryEmbedding()
)
(mlp): LlamaMLP(
(gate_proj): lora.Linear4bit(
(base_layer): Linear4bit(in_features=8192, out_features=28672, bias=False)
(lora_dropout): ModuleDict(
(default): Dropout(p=0.1, inplace=False)
)
(lora_A): ModuleDict(
(default): Linear(in_features=8192, out_features=8, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=8, out_features=28672, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
)
(up_proj): lora.Linear4bit(
(base_layer): Linear4bit(in_features=8192, out_features=28672, bias=False)
(lora_dropout): ModuleDict(
(default): Dropout(p=0.1, inplace=False)
)
(lora_A): ModuleDict(
(default): Linear(in_features=8192, out_features=8, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=8, out_features=28672, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
)
(down_proj): lora.Linear4bit(
(base_layer): Linear4bit(in_features=28672, out_features=8192, bias=False)
(lora_dropout): ModuleDict(
(default): Dropout(p=0.1, inplace=False)
)
(lora_A): ModuleDict(
(default): Linear(in_features=28672, out_features=8, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=8, out_features=8192, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
)
(act_fn): SiLU()
)
(input_layernorm): LlamaRMSNorm()
(post_attention_layernorm): LlamaRMSNorm()
)
)
(norm): LlamaRMSNorm()
)
(lm_head): Linear(in_features=8192, out_features=128264, bias=False)
)
)
)
trainable params: 103,546,880 || all params: 70,657,384,448 || trainable%: 0.1465478531493122
[2024-05-02 06:48:45,278] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.14.2, git-hash=unknown, git-branch=unknown
trainable params: 103,546,880 || all params: 70,657,384,448 || trainable%: 0.1465478531493122
trainable params: 103,546,880 || all params: 70,657,384,448 || trainable%: 0.1465478531493122
trainable params: 103,546,880 || all params: 70,657,384,448 || trainable%: 0.1465478531493122
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 72, in wrapper
return func(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 1914, in broadcast
work = group.broadcast([tensor], opts)
torch.distributed.DistBackendError: NCCL error in: /opt/conda/conda-bld/pytorch_1711403380909/work/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1691, unhandled cuda error (run with NCCL_DEBUG=INFO for details), NCCL version 2.19.3
ncclUnhandledCudaError: Call to CUDA function failed.
Last error:
Cuda failure 2 'out of memory'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/root/hmohapa/peft/examples/sft/train.py", line 162, in <module>
main(model_args, data_args, training_args)
File "/root/hmohapa/peft/examples/sft/train.py", line 146, in main
trainer.train(resume_from_checkpoint=checkpoint)
File "/opt/conda/lib/python3.10/site-packages/trl/trainer/sft_trainer.py", line 361, in train
output = super().train(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train
return inner_training_loop(
File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 2012, in _inner_training_loop
model, self.optimizer = self.accelerator.prepare(self.model, self.optimizer)
File "/opt/conda/lib/python3.10/site-packages/accelerate/accelerator.py", line 1266, in prepare
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 72, in wrapper
return func(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 1914, in broadcast
result = self._prepare_deepspeed(*args)
File "/opt/conda/lib/python3.10/site-packages/accelerate/accelerator.py", line 1652, in _prepare_deepspeed
engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)work = group.broadcast([tensor], opts)
File "/opt/conda/lib/python3.10/site-packages/deepspeed/__init__.py", line 181, in initialize
torch.distributed.DistBackendError: NCCL error in: /opt/conda/conda-bld/pytorch_1711403380909/work/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1691, unhandled cuda error (run with NCCL_DEBUG=INFO for details), NCCL version 2.19.3
ncclUnhandledCudaError: Call to CUDA function failed.
Last error:
Cuda failure 2 'out of memory'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/root/hmohapa/peft/examples/sft/train.py", line 162, in <module>
engine = DeepSpeedEngine(args=args,
File "/opt/conda/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 262, in __init__
self._configure_distributed_model(model)
main(model_args, data_args, training_args) File "/opt/conda/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1157, in _configure_distributed_model
File "/root/hmohapa/peft/examples/sft/train.py", line 146, in main
trainer.train(resume_from_checkpoint=checkpoint)
File "/opt/conda/lib/python3.10/site-packages/trl/trainer/sft_trainer.py", line 361, in train
self._broadcast_model()
File "/opt/conda/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1077, in _broadcast_model
output = super().train(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train
dist.broadcast(p.data, groups._get_broadcast_src_rank(), group=self.seq_data_parallel_group)
File "/opt/conda/lib/python3.10/site-packages/deepspeed/comm/comm.py", line 117, in log_wrapper
return func(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/deepspeed/comm/comm.py", line 224, in broadcast
return cdb.broadcast(tensor=tensor, src=src, group=group, async_op=async_op)
File "/opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 489, in _fn
return fn(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/deepspeed/comm/torch.py", line 199, in broadcast
return torch.distributed.broadcast(tensor=tensor, src=src, group=group, async_op=async_op)
File "/opt/conda/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 74, in wrapper
msg_dict = _get_msg_dict(func.__name__, *args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 49, in _get_msg_dict
"args": f"{args}, {kwargs}",return inner_training_loop(
File "/opt/conda/lib/python3.10/site-packages/torch/_tensor.py", line 461, in __repr__
File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 2012, in _inner_training_loop
return torch._tensor_str._str(self, tensor_contents=tensor_contents)
File "/opt/conda/lib/python3.10/site-packages/torch/_tensor_str.py", line 677, in _str
return _str_intern(self, tensor_contents=tensor_contents)
File "/opt/conda/lib/python3.10/site-packages/torch/_tensor_str.py", line 597, in _str_intern
tensor_str = _tensor_str(self, indent)
File "/opt/conda/lib/python3.10/site-packages/torch/_tensor_str.py", line 331, in _tensor_str
self = self.float()
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 3.92 GiB. GPU 2 has a total capacity of 39.39 GiB of which 2.00 MiB is free. Process 112078 has 39.38 GiB memory in use. Of the allocated memory 37.36 GiB is allocated by PyTorch, and 76.66 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
model, self.optimizer = self.accelerator.prepare(self.model, self.optimizer)
File "/opt/conda/lib/python3.10/site-packages/accelerate/accelerator.py", line 1266, in prepare
result = self._prepare_deepspeed(*args)
File "/opt/conda/lib/python3.10/site-packages/accelerate/accelerator.py", line 1652, in _prepare_deepspeed
engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
File "/opt/conda/lib/python3.10/site-packages/deepspeed/__init__.py", line 181, in initialize
engine = DeepSpeedEngine(args=args,
File "/opt/conda/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 262, in __init__
self._configure_distributed_model(model)
File "/opt/conda/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1157, in _configure_distributed_model
self._broadcast_model()
File "/opt/conda/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1077, in _broadcast_model
dist.broadcast(p.data, groups._get_broadcast_src_rank(), group=self.seq_data_parallel_group)
File "/opt/conda/lib/python3.10/site-packages/deepspeed/comm/comm.py", line 117, in log_wrapper
return func(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/deepspeed/comm/comm.py", line 224, in broadcast
return cdb.broadcast(tensor=tensor, src=src, group=group, async_op=async_op)
File "/opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 489, in _fn
return fn(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/deepspeed/comm/torch.py", line 199, in broadcast
return torch.distributed.broadcast(tensor=tensor, src=src, group=group, async_op=async_op)
File "/opt/conda/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 74, in wrapper
msg_dict = _get_msg_dict(func.__name__, *args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 49, in _get_msg_dict
"args": f"{args}, {kwargs}",
File "/opt/conda/lib/python3.10/site-packages/torch/_tensor.py", line 461, in __repr__
return torch._tensor_str._str(self, tensor_contents=tensor_contents)
File "/opt/conda/lib/python3.10/site-packages/torch/_tensor_str.py", line 677, in _str
return _str_intern(self, tensor_contents=tensor_contents)
File "/opt/conda/lib/python3.10/site-packages/torch/_tensor_str.py", line 597, in _str_intern
tensor_str = _tensor_str(self, indent)
File "/opt/conda/lib/python3.10/site-packages/torch/_tensor_str.py", line 331, in _tensor_str
self = self.float()
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 3.92 GiB. GPU 1 has a total capacity of 39.39 GiB of which 2.00 MiB is free. Process 112077 has 39.38 GiB memory in use. Of the allocated memory 37.36 GiB is allocated by PyTorch, and 76.66 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
[2024-05-02 06:49:11,955] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 322 closing signal SIGTERM
[2024-05-02 06:49:11,955] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 325 closing signal SIGTERM
[2024-05-02 06:49:13,221] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 1 (pid: 323) of binary: /opt/conda/bin/python
Traceback (most recent call last):
File "/opt/conda/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 46, in main
args.func(args)
File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1060, in launch_command
deepspeed_launcher(args)
File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/launch.py", line 764, in deepspeed_launcher
distrib_run.run(args)
File "/opt/conda/lib/python3.10/site-packages/torch/distributed/run.py", line 803, in run
elastic_launch(
File "/opt/conda/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 135, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/opt/conda/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 268, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
train.py FAILED
------------------------------------------------------------
Failures:
[1]:
time : 2024-05-02_06:49:11
host : d34443f89434
rank : 2 (local_rank: 2)
exitcode : 1 (pid: 324)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2024-05-02_06:49:11
host : d34443f89434
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 323)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
Expected behavior
LoRA adapters should fine tune successfully
The text was updated successfully, but these errors were encountered:
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
System Info
Who can help?
@pacman100
Information
Tasks
examples
folderReproduction
deepspeed_config_z3_qlora_4g.yaml
run_peft_qlora_deepspeed_stage3_llama3_70b.sh
Using the official train.py example from https://github.com/huggingface/peft/blob/main/examples/sft/train.py
Log output
Expected behavior
LoRA adapters should fine tune successfully
The text was updated successfully, but these errors were encountered: