I used a 2060 graphics card and reported an error "Feature 'cvt with.f32.BF16 'requires.target sm_80 or higher". #434

yangcecode · 2024-05-07T07:25:49Z

==((====))== Unsloth: Fast Llama patching release 2024.4
\ /| GPU: NVIDIA GeForce RTX 2060 SUPER. Max memory: 7.785 GB. Platform = Linux.
O^O/ _/ \ Pytorch: 2.3.0. CUDA = 7.5. CUDA Toolkit = 12.1.
\ / Bfloat16 = TRUE. Xformers = 0.0.26.post1. FA = False.
"-____-" Free Apache license: http://github.com/unslothai/unsloth
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 4/4 [00:09<00:00, 2.40s/it]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
/home/chuhaitong/yangce/Meta-Llama-3-8B-Instruct does not have a padding or unknown token!
Will use the EOS token of id 128001 as padding.
Unsloth 2024.4 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.

True

Using the WANDB_DISABLED environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
max_steps is given, it will override any value given in num_train_epochs
==((====))== Unsloth - 2x faster free finetuning | Num GPUs = 1
\ /| Num examples = 1 | Num Epochs = 60
O^O/ _/ \ Batch size per device = 2 | Gradient Accumulation steps = 4
\ / Total batch size = 8 | Total steps = 60
"-____-" Number of trainable parameters = 41,943,040
0%| | 0/60 [00:00<?, ?it/s]Traceback (most recent call last):
File "/home/chuhaitong/yangce/app.py", line 114, in
trainer_stats = trainer.train()
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/trl/trainer/sft_trainer.py", line 361, in train
output = super().train(*args, **kwargs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train
return inner_training_loop(
File "", line 361, in _fast_inner_training_loop
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step
loss = self.compute_loss(model, inputs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss
outputs = model(**inputs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward
return model_forward(*args, **kwargs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
return func(*args, **kwargs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/unsloth/models/llama.py", line 882, in PeftModelForCausalLM_fast_forward
return self.base_model(
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 161, in forward
return self.model.forward(*args, **kwargs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/unsloth/models/llama.py", line 813, in _CausalLM_fast_forward
outputs = self.model(
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/unsloth/models/llama.py", line 650, in LlamaModel_fast_forward
hidden_states = Unsloth_Offloaded_Gradient_Checkpointer.apply(
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/autograd/function.py", line 598, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/cuda/amp/autocast_mode.py", line 115, in decorate_fwd
return fwd(*args, **kwargs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/unsloth/models/_utils.py", line 333, in forward
(output,) = forward_function(hidden_states, *args)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/unsloth/models/llama.py", line 432, in LlamaDecoderLayer_fast_forward
hidden_states = fast_rms_layernorm(self.input_layernorm, hidden_states)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/unsloth/kernels/rms_layernorm.py", line 190, in fast_rms_layernorm
out = Fast_RMS_Layernorm.apply(X, W, eps, gemma)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/autograd/function.py", line 598, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/unsloth/kernels/rms_layernorm.py", line 144, in forward
fx[(n_rows,)](
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/triton/runtime/jit.py", line 167, in
return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/triton/runtime/jit.py", line 416, in run
self.cache[device][key] = compile(
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/triton/compiler/compiler.py", line 193, in compile
next_module = compile_ir(module, metadata)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/triton/compiler/backends/cuda.py", line 201, in
stages["cubin"] = lambda src, metadata: self.make_cubin(src, metadata, options, self.capability)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/triton/compiler/backends/cuda.py", line 194, in make_cubin
return compile_ptx_to_cubin(src, ptxas, capability, opt.enable_fp_fusion)
RuntimeError: Internal Triton PTX codegen error:
ptxas /tmp/compile-ptx-src-d2fe88, line 100; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 100; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 102; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 102; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 104; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 104; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 106; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 106; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 108; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 108; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 110; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 110; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 112; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 112; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 114; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 114; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 116; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 116; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 118; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 118; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 120; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 120; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 122; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 122; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 124; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 124; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 126; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 126; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 128; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 128; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 130; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 130; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 316; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 316; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 318; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 318; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 320; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 320; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 322; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 322; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 324; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 324; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 326; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 326; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 328; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 328; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 330; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 330; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 332; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 332; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 334; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 334; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 336; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 336; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 338; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 338; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 340; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 340; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 342; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 342; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 344; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 344; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 346; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 346; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 350; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 350; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 354; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 354; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 358; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 358; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 362; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 362; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 366; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 366; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 370; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 370; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 374; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 374; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 378; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 378; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 382; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 382; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 386; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 386; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 390; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 390; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 394; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 394; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 398; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 398; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 402; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 402; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 406; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 406; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 410; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 410; error : Feature '.bf16' requires .target sm_80 or higher
ptxas fatal : Ptx assembly aborted due to errors

The text was updated successfully, but these errors were encountered:

seancarmod-y · 2024-05-07T14:37:40Z

I get the same when I run this on a v100. I thought setting bf16 to false should solve this.

import os
from unsloth import FastLanguageModel
import torch
from trl import SFTTrainer
from transformers import TrainingArguments
from datasets import load_dataset

max_seq_length = 1024
dataset_folder = "./datasets/train_dataset"
dataset = load_dataset(dataset_folder, split="train")

Load Llama3 model

model, tokenizer = FastLanguageModel.from_pretrained(
model_name="unsloth/llama-3-8b-bnb-4bit",
max_seq_length=max_seq_length,
dtype=None,
load_in_4bit=True,
)

Model patching and add fast LoRA weights and training

model = FastLanguageModel.get_peft_model(
model,
r=16,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",],
lora_alpha=16,
lora_dropout=0, # Supports any, but = 0 is optimized
bias="none", # Supports any, but = "none" is optimized
use_gradient_checkpointing=True,
random_state=3407,
max_seq_length=max_seq_length,
use_rslora=False, # Rank stabilized LoRA
loftq_config=None, # LoftQ
)

trainer = SFTTrainer(
model=model,
train_dataset=dataset,
dataset_text_field="text",
max_seq_length=max_seq_length,
tokenizer=tokenizer,
args=TrainingArguments(
per_device_train_batch_size=2,
gradient_accumulation_steps=4,
warmup_steps=10,
max_steps=150,
learning_rate=2e-4,
fp16=True,
bf16=False,
logging_steps=1,
output_dir="outputs",
optim="adamw_8bit",
seed=3407,
),
)

Show current memory stats

gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

trainer_stats = trainer.train()

Show final memory and time stats

used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory /max_memory100, 3)
lora_percentage = round(used_memory_for_lora/max_memory100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.")
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

Save the model

model.save_pretrained("llama3_lora_model")
model.save_pretrained_merged("outputs", tokenizer, save_method="merged_16bit",)

Save to 8bit Q8_0 and q4

model.save_pretrained_gguf("llama3_model_q8", tokenizer,)
model.save_pretrained_gguf("llama3_model_q4", tokenizer, quantization_method="q4_k_m")

Error:
(unsloth_env) root@sean:/home/sean# python unsloth_llama3_fine_tune.py
/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True.
warnings.warn(
==((====))== Unsloth: Fast Llama patching release 2024.4
\ /| GPU: Tesla V100-SXM2-16GB. Max memory: 15.773 GB. Platform = Linux.
O^O/ _/ \ Pytorch: 2.3.0. CUDA = 7.0. CUDA Toolkit = 12.1.
\ / Bfloat16 = TRUE. Xformers = 0.0.26.post1. FA = False.
"-__-" Free Apache license: http://github.com/unslothai/unsloth
Unused kwargs: ['load_in_4bit', 'load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Unsloth 2024.4 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.
Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
max_steps is given, it will override any value given in num_train_epochs
==((====))== Unsloth - 2x faster free finetuning | Num GPUs = 1
\ /| Num examples = 13,533 | Num Epochs = 1
O^O/ _/ \ Batch size per device = 2 | Gradient Accumulation steps = 4
\ / Total batch size = 8 | Total steps = 150
"--" Number of trainable parameters = 41,943,040
0%| | 0/150 [00:00<?, ?it/s]Traceback (most recent call last):
File "/home/sean/unsloth_llama3_fine_tune.py", line 65, in
trainer_stats = trainer.train()
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/trl/trainer/sft_trainer.py", line 361, in train
output = super().train(*args, **kwargs)
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train
return inner_training_loop(
File "", line 361, in _fast_inner_training_loop
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step
loss = self.compute_loss(model, inputs)
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss
outputs = model(**inputs)
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward
return model_forward(*args, **kwargs)
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
return func(*args, **kwargs)
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/unsloth/models/llama.py", line 882, in PeftModelForCausalLM_fast_forward
return self.base_model(
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 161, in forward
return self.model.forward(*args, **kwargs)
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/unsloth/models/llama.py", line 813, in _CausalLM_fast_forward
outputs = self.model(
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/unsloth/models/llama.py", line 668, in LlamaModel_fast_forward
layer_outputs = torch.utils.checkpoint.checkpoint(
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/torch/_compile.py", line 24, in inner
return torch._dynamo.disable(fn, recursive)(*args, **kwargs)
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 451, in _fn
return fn(*args, **kwargs)
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 36, in inner
return fn(*args, **kwargs)
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 487, in checkpoint
return CheckpointFunction.apply(function, preserve, *args)
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/torch/autograd/function.py", line 598, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 262, in forward
outputs = run_function(*args)
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/unsloth/models/llama.py", line 664, in custom_forward
return module(*inputs, past_key_value, output_attentions, padding_mask = padding_mask)
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/unsloth/models/llama.py", line 432, in LlamaDecoderLayer_fast_forward
hidden_states = fast_rms_layernorm(self.input_layernorm, hidden_states)
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/unsloth/kernels/rms_layernorm.py", line 190, in fast_rms_layernorm
out = Fast_RMS_Layernorm.apply(X, W, eps, gemma)
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/torch/autograd/function.py", line 598, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/unsloth/kernels/rms_layernorm.py", line 144, in forward
fx[(n_rows,)](
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/triton/runtime/jit.py", line 167, in
return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/triton/runtime/jit.py", line 416, in run
self.cache[device][key] = compile(
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/triton/compiler/compiler.py", line 193, in compile
next_module = compile_ir(module, metadata)
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/triton/compiler/backends/cuda.py", line 201, in
stages["cubin"] = lambda src, metadata: self.make_cubin(src, metadata, options, self.capability)
File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/triton/compiler/backends/cuda.py", line 194, in make_cubin
return compile_ptx_to_cubin(src, ptxas, capability, opt.enable_fp_fusion)
RuntimeError: Internal Triton PTX codegen error:
ptxas /tmp/compile-ptx-src-3c809d, line 100; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 100; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 102; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 102; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 104; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 104; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 106; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 106; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 108; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 108; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 110; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 110; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 112; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 112; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 114; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 114; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 116; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 116; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 118; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 118; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 120; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 120; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 122; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 122; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 124; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 124; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 126; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 126; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 128; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 128; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 130; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 130; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 316; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 316; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 318; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 318; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 320; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 320; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 322; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 322; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 324; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 324; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 326; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 326; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 328; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 328; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 330; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 330; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 332; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 332; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 334; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 334; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 336; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 336; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 338; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 338; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 340; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 340; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 342; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 342; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 344; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 344; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 346; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 346; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 350; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 350; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 354; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 354; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 358; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 358; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 362; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 362; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 366; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 366; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 370; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 370; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 374; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 374; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 378; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 378; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 382; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 382; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 386; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 386; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 390; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 390; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 394; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 394; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 398; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 398; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 402; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 402; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 406; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 406; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 410; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-3c809d, line 410; error : Feature '.bf16' requires .target sm_80 or higher
ptxas fatal : Ptx assembly aborted due to errors

ludekcizinsky · 2024-05-15T12:10:47Z

I had the exact same problem using torch 2.3.0. As you said, even said the flag for bf16 to False did not work.

I resolved the issue by downgrading to torch 2.2.0 and installing the unsloth using:

pip install --upgrade --force-reinstall --no-cache-dir torch==2.2.0 triton \
  --index-url https://download.pytorch.org/whl/cu121

pip install "unsloth[cu121-torch220] @ git+https://github.com/unslothai/unsloth.git"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I used a 2060 graphics card and reported an error "Feature 'cvt with.f32.BF16 'requires.target sm_80 or higher". #434

I used a 2060 graphics card and reported an error "Feature 'cvt with.f32.BF16 'requires.target sm_80 or higher". #434

yangcecode commented May 7, 2024

seancarmod-y commented May 7, 2024 •

edited

ludekcizinsky commented May 15, 2024 •

edited

I used a 2060 graphics card and reported an error "Feature 'cvt with.f32.BF16 'requires.target sm_80 or higher". #434

I used a 2060 graphics card and reported an error "Feature 'cvt with.f32.BF16 'requires.target sm_80 or higher". #434

Comments

yangcecode commented May 7, 2024

True

seancarmod-y commented May 7, 2024 • edited

Load Llama3 model

Model patching and add fast LoRA weights and training

Show current memory stats

Show final memory and time stats

Save the model

Save to 8bit Q8_0 and q4

ludekcizinsky commented May 15, 2024 • edited

seancarmod-y commented May 7, 2024 •

edited

ludekcizinsky commented May 15, 2024 •

edited