Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I used a 2060 graphics card and reported an error "Feature 'cvt with.f32.BF16 'requires.target sm_80 or higher". #434

Open
yangcecode opened this issue May 7, 2024 · 2 comments

Comments

@yangcecode
Copy link

==((====))== Unsloth: Fast Llama patching release 2024.4
\ /| GPU: NVIDIA GeForce RTX 2060 SUPER. Max memory: 7.785 GB. Platform = Linux.
O^O/ _/ \ Pytorch: 2.3.0. CUDA = 7.5. CUDA Toolkit = 12.1.
\ / Bfloat16 = TRUE. Xformers = 0.0.26.post1. FA = False.
"-____-" Free Apache license: http://github.com/unslothai/unsloth
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 4/4 [00:09<00:00, 2.40s/it]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
/home/chuhaitong/yangce/Meta-Llama-3-8B-Instruct does not have a padding or unknown token!
Will use the EOS token of id 128001 as padding.
Unsloth 2024.4 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.

True

Using the WANDB_DISABLED environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
max_steps is given, it will override any value given in num_train_epochs
==((====))== Unsloth - 2x faster free finetuning | Num GPUs = 1
\ /| Num examples = 1 | Num Epochs = 60
O^O/ _/ \ Batch size per device = 2 | Gradient Accumulation steps = 4
\ / Total batch size = 8 | Total steps = 60
"-____-" Number of trainable parameters = 41,943,040
0%| | 0/60 [00:00<?, ?it/s]Traceback (most recent call last):
File "/home/chuhaitong/yangce/app.py", line 114, in
trainer_stats = trainer.train()
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/trl/trainer/sft_trainer.py", line 361, in train
output = super().train(*args, **kwargs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train
return inner_training_loop(
File "", line 361, in _fast_inner_training_loop
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step
loss = self.compute_loss(model, inputs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss
outputs = model(**inputs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward
return model_forward(*args, **kwargs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
return func(*args, **kwargs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/unsloth/models/llama.py", line 882, in PeftModelForCausalLM_fast_forward
return self.base_model(
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 161, in forward
return self.model.forward(*args, **kwargs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/unsloth/models/llama.py", line 813, in _CausalLM_fast_forward
outputs = self.model(
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/unsloth/models/llama.py", line 650, in LlamaModel_fast_forward
hidden_states = Unsloth_Offloaded_Gradient_Checkpointer.apply(
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/autograd/function.py", line 598, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/cuda/amp/autocast_mode.py", line 115, in decorate_fwd
return fwd(*args, **kwargs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/unsloth/models/_utils.py", line 333, in forward
(output,) = forward_function(hidden_states, *args)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/unsloth/models/llama.py", line 432, in LlamaDecoderLayer_fast_forward
hidden_states = fast_rms_layernorm(self.input_layernorm, hidden_states)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/unsloth/kernels/rms_layernorm.py", line 190, in fast_rms_layernorm
out = Fast_RMS_Layernorm.apply(X, W, eps, gemma)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/autograd/function.py", line 598, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/unsloth/kernels/rms_layernorm.py", line 144, in forward
fx[(n_rows,)](
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/triton/runtime/jit.py", line 167, in
return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/triton/runtime/jit.py", line 416, in run
self.cache[device][key] = compile(
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/triton/compiler/compiler.py", line 193, in compile
next_module = compile_ir(module, metadata)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/triton/compiler/backends/cuda.py", line 201, in
stages["cubin"] = lambda src, metadata: self.make_cubin(src, metadata, options, self.capability)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/triton/compiler/backends/cuda.py", line 194, in make_cubin
return compile_ptx_to_cubin(src, ptxas, capability, opt.enable_fp_fusion)
RuntimeError: Internal Triton PTX codegen error:
ptxas /tmp/compile-ptx-src-d2fe88, line 100; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 100; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 102; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 102; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 104; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 104; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 106; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 106; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 108; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 108; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 110; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 110; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 112; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 112; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 114; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 114; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 116; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 116; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 118; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 118; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 120; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 120; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 122; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 122; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 124; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 124; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 126; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 126; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 128; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 128; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 130; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 130; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 316; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 316; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 318; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 318; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 320; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 320; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 322; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 322; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 324; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 324; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 326; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 326; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 328; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 328; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 330; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 330; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 332; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 332; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 334; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 334; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 336; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 336; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 338; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 338; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 340; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 340; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 342; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 342; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 344; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 344; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 346; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 346; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 350; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 350; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 354; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 354; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 358; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 358; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 362; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 362; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 366; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 366; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 370; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 370; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 374; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 374; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 378; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 378; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 382; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 382; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 386; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 386; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 390; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 390; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 394; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 394; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 398; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 398; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 402; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 402; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 406; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 406; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 410; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 410; error : Feature '.bf16' requires .target sm_80 or higher
ptxas fatal : Ptx assembly aborted due to errors

@seancarmod-y
Copy link

seancarmod-y commented May 7, 2024

I get the same when I run this on a v100. I thought setting bf16 to false should solve this.

import os
from unsloth import FastLanguageModel
import torch
from trl import SFTTrainer
from transformers import TrainingArguments
from datasets import load_dataset

max_seq_length = 1024
dataset_folder = "./datasets/train_dataset"
dataset = load_dataset(dataset_folder, split="train")

Load Llama3 model

model, tokenizer = FastLanguageModel.from_pretrained(
model_name="unsloth/llama-3-8b-bnb-4bit",
max_seq_length=max_seq_length,
dtype=None,
load_in_4bit=True,
)

Model patching and add fast LoRA weights and training

model = FastLanguageModel.get_peft_model(
model,
r=16,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",],
lora_alpha=16,
lora_dropout=0, # Supports any, but = 0 is optimized
bias="none", # Supports any, but = "none" is optimized
use_gradient_checkpointing=True,
random_state=3407,
max_seq_length=max_seq_length,
use_rslora=False, # Rank stabilized LoRA
loftq_config=None, # LoftQ
)

trainer = SFTTrainer(
model=model,
train_dataset=dataset,
dataset_text_field="text",
max_seq_length=max_seq_length,
tokenizer=tokenizer,
args=TrainingArguments(
per_device_train_batch_size=2,
gradient_accumulation_steps=4,
warmup_steps=10,
max_steps=150,
learning_rate=2e-4,
fp16=True,
bf16=False,
logging_steps=1,
output_dir="outputs",
optim="adamw_8bit",
seed=3407,
),
)

Show current memory stats

gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

trainer_stats = trainer.train()

Show final memory and time stats

used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory /max_memory100, 3)
lora_percentage = round(used_memory_for_lora/max_memory
100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.")
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

Save the model

model.save_pretrained("llama3_lora_model")
model.save_pretrained_merged("outputs", tokenizer, save_method="merged_16bit",)

Save to 8bit Q8_0 and q4

model.save_pretrained_gguf("llama3_model_q8", tokenizer,)
model.save_pretrained_gguf("llama3_model_q4", tokenizer, quantization_method="q4_k_m")

Error:
(unsloth_env) root@sean:/home/sean# python unsloth_llama3_fine_tune.py
/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True.
warnings.warn(
==((====))== Unsloth: Fast Llama patching release 2024.4
\ /| GPU: Tesla V100-SXM2-16GB. Max memory: 15.773 GB. Platform = Linux.
O^O/ _/ \ Pytorch: 2.3.0. CUDA = 7.0. CUDA Toolkit = 12.1.
\ / Bfloat16 = TRUE. Xformers = 0.0.26.post1. FA = False.
"-__-" Free Apache license: http://github.com/unslothai/unsloth
Unused kwargs: ['load_in_4bit', 'load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Unsloth 2024.4 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.
Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
max_steps is given, it will override any value given in num_train_epochs
==((====))== Unsloth - 2x faster free finetuning | Num GPUs = 1
\ /| Num examples = 13,533 | Num Epochs = 1
O^O/ _/ \ Batch size per device = 2 | Gradient Accumulation steps = 4
\ / Total batch size = 8 | Total steps = 150
"-
-" Number of trainable parameters = 41,943,040
0%| | 0/150 [00:00<?, ?it/s]Traceback (most recent call last):
File "/home/sean/unsloth_llama3_fine_tune.py", line 65, in
trainer_stats = trainer.train()
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/trl/trainer/sft_trainer.py", line 361, in train
output = super().train(*args, **kwargs)
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train
return inner_training_loop(
File "", line 361, in _fast_inner_training_loop
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step
loss = self.compute_loss(model, inputs)
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss
outputs = model(**inputs)
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward
return model_forward(*args, **kwargs)
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
return func(*args, **kwargs)
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/unsloth/models/llama.py", line 882, in PeftModelForCausalLM_fast_forward
return self.base_model(
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 161, in forward
return self.model.forward(*args, **kwargs)
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/unsloth/models/llama.py", line 813, in _CausalLM_fast_forward
outputs = self.model(
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/unsloth/models/llama.py", line 668, in LlamaModel_fast_forward
layer_outputs = torch.utils.checkpoint.checkpoint(
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/torch/_compile.py", line 24, in inner
return torch._dynamo.disable(fn, recursive)(*args, **kwargs)
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 451, in _fn
return fn(*args, **kwargs)
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 36, in inner
return fn(*args, **kwargs)
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 487, in checkpoint
return CheckpointFunction.apply(function, preserve, *args)
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/torch/autograd/function.py", line 598, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 262, in forward
outputs = run_function(*args)
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/unsloth/models/llama.py", line 664, in custom_forward
return module(*inputs, past_key_value, output_attentions, padding_mask = padding_mask)
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/unsloth/models/llama.py", line 432, in LlamaDecoderLayer_fast_forward
hidden_states = fast_rms_layernorm(self.input_layernorm, hidden_states)
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/unsloth/kernels/rms_layernorm.py", line 190, in fast_rms_layernorm
out = Fast_RMS_Layernorm.apply(X, W, eps, gemma)
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/torch/autograd/function.py", line 598, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/unsloth/kernels/rms_layernorm.py", line 144, in forward
fx[(n_rows,)](
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/triton/runtime/jit.py", line 167, in
return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/triton/runtime/jit.py", line 416, in run
self.cache[device][key] = compile(
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/triton/compiler/compiler.py", line 193, in compile
next_module = compile_ir(module, metadata)
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/triton/compiler/backends/cuda.py", line 201, in
stages["cubin"] = lambda src, metadata: self.make_cubin(src, metadata, options, self.capability)
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/triton/compiler/backends/cuda.py", line 194, in make_cubin
return compile_ptx_to_cubin(src, ptxas, capability, opt.enable_fp_fusion)
RuntimeError: Internal Triton PTX codegen error:
ptxas /tmp/compile-ptx-src-3c809d, line 100; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 100; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 102; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 102; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 104; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 104; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 106; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 106; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 108; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 108; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 110; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 110; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 112; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 112; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 114; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 114; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 116; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 116; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 118; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 118; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 120; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 120; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 122; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 122; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 124; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 124; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 126; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 126; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 128; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 128; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 130; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 130; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 316; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 316; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 318; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 318; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 320; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 320; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 322; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 322; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 324; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 324; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 326; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 326; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 328; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 328; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 330; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 330; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 332; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 332; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 334; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 334; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 336; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 336; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 338; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 338; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 340; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 340; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 342; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 342; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 344; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 344; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 346; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 346; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 350; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 350; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 354; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 354; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 358; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 358; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 362; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 362; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 366; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 366; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 370; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 370; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 374; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 374; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 378; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 378; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 382; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 382; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 386; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 386; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 390; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 390; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 394; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 394; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 398; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 398; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 402; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 402; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 406; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 406; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 410; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 410; error : Feature '.bf16' requires .target sm_80 or higher
ptxas fatal : Ptx assembly aborted due to errors

@ludekcizinsky
Copy link

ludekcizinsky commented May 15, 2024

I had the exact same problem using torch 2.3.0. As you said, even said the flag for bf16 to False did not work.

I resolved the issue by downgrading to torch 2.2.0 and installing the unsloth using:

pip install --upgrade --force-reinstall --no-cache-dir torch==2.2.0 triton \
  --index-url https://download.pytorch.org/whl/cu121

pip install "unsloth[cu121-torch220] @ git+https://github.com/unslothai/unsloth.git"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants