Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama_model_load: error loading model: done_getting_tensors: wrong number of tensors; expected 149, got 147 #7157

Open
YathenStianbase opened this issue May 9, 2024 · 0 comments

Comments

@YathenStianbase
Copy link

YathenStianbase commented May 9, 2024

Hello everyone,

I've encountered an issue while attempting to train data using the train-text-from-scratch process. Initially, everything seems to proceed smoothly:
running the command
.\\bin\\Release\\train-text-from-scratch.exe --vocab-model ..\\models\\ggml-vocab-llama-spm.gguf --train-data .\\shakespeare.txt
successfully generates a checkpoint-LATEST.gguf file.

However, I face a problem when executing the prediction command:
.\bin\Release\main.exe -m .\checkpoint-LATEST.gguf
The system returns the following error message:
llama_model_load: error loading model: done_getting_tensors: incorrect tensor count; expected 149, but received 147 llama_load_model_from_file: model loading failed llama_init_from_gpt_params: error initializing from '.\checkpoint-LATEST.gguf' main: failed to load the model

===

My current setup includes:

Operating System: Windows 11
Processor: Intel i7-12650H
RAM: 16.0 GB
Graphics: NVIDIA GeForce RTX 4060 with 8GB VRAM
llama.cpp version: 3fe0596

===

Here is my step and results:
.\bin\Release\train-text-from-scratch.exe --vocab-model ..\models\ggml-vocab-llama-spm.gguf --train-data .\shakespeare.txt
main: seed: 1715222748
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
llama_model_loader: loaded meta data with 22 key-value pairs and 0 tensors from ..\models\ggml-vocab-llama-spm.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.name str = llama-spm
llama_model_loader: - kv 2: llama.block_count u32 = 32
llama_model_loader: - kv 3: llama.context_length u32 = 4096
llama_model_loader: - kv 4: llama.embedding_length u32 = 4096
llama_model_loader: - kv 5: llama.feed_forward_length u32 = 11008
llama_model_loader: - kv 6: llama.attention.head_count u32 = 32
llama_model_loader: - kv 7: llama.attention.head_count_kv u32 = 32
llama_model_loader: - kv 8: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 9: general.file_type u32 = 1
llama_model_loader: - kv 10: llama.vocab_size u32 = 32000
llama_model_loader: - kv 11: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 12: tokenizer.ggml.model str = llama
llama_model_loader: - kv 13: tokenizer.ggml.pre str = default
llama_model_loader: - kv 14: tokenizer.ggml.tokens arr[str,32000] = ["", "", "", "<0x00>", "<...
llama_model_loader: - kv 15: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv 16: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv 17: tokenizer.ggml.bos_token_id u32 = 1
llama_model_loader: - kv 18: tokenizer.ggml.eos_token_id u32 = 2
llama_model_loader: - kv 19: tokenizer.ggml.unknown_token_id u32 = 0
llama_model_loader: - kv 20: tokenizer.ggml.add_bos_token bool = true
llama_model_loader: - kv 21: tokenizer.ggml.add_eos_token bool = false
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = SPM
llm_load_print_meta: n_vocab = 32000
llm_load_print_meta: n_merges = 0
llm_load_print_meta: n_ctx_train = 4096
llm_load_print_meta: n_embd = 4096
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 32
llm_load_print_meta: n_layer = 32
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_embd_head_k = 128
llm_load_print_meta: n_embd_head_v = 128
llm_load_print_meta: n_gqa = 1
llm_load_print_meta: n_embd_k_gqa = 4096
llm_load_print_meta: n_embd_v_gqa = 4096
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale = 0.0e+00
llm_load_print_meta: n_ff = 11008
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: causal attn = 1
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 4096
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: model type = 7B
llm_load_print_meta: model ftype = F16
llm_load_print_meta: model params = 0.00 K
llm_load_print_meta: model size = 0.00 MiB (-nan(ind) BPW)
llm_load_print_meta: general.name = llama-spm
llm_load_print_meta: BOS token = 1 ''
llm_load_print_meta: EOS token = 2 '
'
llm_load_print_meta: UNK token = 0 ''
llm_load_print_meta: LF token = 13 '<0x0A>'
llama_model_load: vocab only - skipping tensors
llama_new_context_with_model: n_ctx = 512
llama_new_context_with_model: n_batch = 512
llama_new_context_with_model: n_ubatch = 512
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
main: init model
print_params: n_vocab: 32000
print_params: n_ctx: 128
print_params: n_embd: 256
print_params: n_head: 8
print_params: n_ff: 768
print_params: n_layer: 16
print_params: n_rot: 32
main: total train_iterations 0
main: seen train_samples 0
main: seen train_tokens 0
main: completed train_epochs 0
main: model_size = 240309120 bytes (229.2 MB)
main: opt_size = 360288480 bytes (343.6 MB)
main: opt iter 0
main: input_size = 131076128 bytes (125.0 MB)
main: compute_size = 434373216 bytes (414.3 MB)
main: evaluation order = LEFT_TO_RIGHT
main: tokenize training data
tokenize_file: total number of samples: 28321
main: number of training tokens: 28449
main: train data seems to have changed. restarting shuffled epoch.
main: begin training
main: work_size = 768376 bytes (0.7 MB)
train_opt_callback: iter= 0 sample=1/28321 sched=0.000000 loss=0.000000 |->
train_opt_callback: iter= 1 sample=9/28321 sched=0.010000 loss=10.373936 dt=00:00:02 eta=00:08:43 |->
train_opt_callback: iter= 2 sample=17/28321 sched=0.020000 loss=10.373995 dt=00:00:01 eta=00:08:14 |->
train_opt_callback: iter= 3 sample=25/28321 sched=0.030000 loss=10.373626 dt=00:00:01 eta=00:08:08 |->
train_opt_callback: iter= 4 sample=33/28321 sched=0.040000 loss=10.372844 dt=00:00:01 eta=00:08:22 |->
train_opt_callback: iter= 5 sample=41/28321 sched=0.050000 loss=10.372040 dt=00:00:01 eta=00:08:19 |->
train_opt_callback: iter= 6 sample=49/28321 sched=0.060000 loss=10.371148 dt=00:00:01 eta=00:08:04 |->
train_opt_callback: iter= 7 sample=57/28321 sched=0.070000 loss=10.370434 dt=00:00:01 eta=00:08:14 |->
train_opt_callback: iter= 8 sample=65/28321 sched=0.080000 loss=10.369631 dt=00:00:01 eta=00:08:13 |->
train_opt_callback: iter= 9 sample=73/28321 sched=0.090000 loss=10.368422 dt=00:00:01 eta=00:08:07 |->
save_checkpoint_file: saving to checkpoint-10.gguf
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_checkpoint_file: saving to checkpoint-LATEST.gguf
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_llama_model_file: saving to ggml-checkpoint-f32.bin
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_llama_model_file: saving to ggml-checkpoint-f32.bin
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
train_opt_callback: iter= 10 sample=81/28321 sched=0.100000 loss=10.367327 dt=00:00:01 eta=00:08:01 |->
train_opt_callback: iter= 11 sample=89/28321 sched=0.110000 loss=10.366339 dt=00:00:02 eta=00:08:16 |->
train_opt_callback: iter= 12 sample=97/28321 sched=0.120000 loss=10.364574 dt=00:00:01 eta=00:07:58 |->
train_opt_callback: iter= 13 sample=105/28321 sched=0.130000 loss=10.363148 dt=00:00:01 eta=00:07:58 |->
train_opt_callback: iter= 14 sample=113/28321 sched=0.140000 loss=10.360633 dt=00:00:01 eta=00:08:03 |->
train_opt_callback: iter= 15 sample=121/28321 sched=0.150000 loss=10.358639 dt=00:00:01 eta=00:07:58 |->
train_opt_callback: iter= 16 sample=129/28321 sched=0.160000 loss=10.356090 dt=00:00:01 eta=00:07:51 |->
train_opt_callback: iter= 17 sample=137/28321 sched=0.170000 loss=10.353806 dt=00:00:01 eta=00:07:33 |->
train_opt_callback: iter= 18 sample=145/28321 sched=0.180000 loss=10.351995 dt=00:00:01 eta=00:07:37 |->
train_opt_callback: iter= 19 sample=153/28321 sched=0.190000 loss=10.349074 dt=00:00:01 eta=00:07:50 |->
save_checkpoint_file: saving to checkpoint-20.gguf
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_checkpoint_file: saving to checkpoint-LATEST.gguf
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_llama_model_file: saving to ggml-checkpoint-f32.bin
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_llama_model_file: saving to ggml-checkpoint-f32.bin
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
train_opt_callback: iter= 20 sample=161/28321 sched=0.200000 loss=10.345891 dt=00:00:01 eta=00:07:41 |->
train_opt_callback: iter= 21 sample=169/28321 sched=0.210000 loss=10.342021 dt=00:00:01 eta=00:07:43 |->
train_opt_callback: iter= 22 sample=177/28321 sched=0.220000 loss=10.339085 dt=00:00:01 eta=00:07:46 |->
train_opt_callback: iter= 23 sample=185/28321 sched=0.230000 loss=10.334445 dt=00:00:02 eta=00:07:56 |->
train_opt_callback: iter= 24 sample=193/28321 sched=0.240000 loss=10.330558 dt=00:00:01 eta=00:07:43 |->
train_opt_callback: iter= 25 sample=201/28321 sched=0.250000 loss=10.326590 dt=00:00:01 eta=00:07:37 |->
train_opt_callback: iter= 26 sample=209/28321 sched=0.260000 loss=10.321836 dt=00:00:01 eta=00:07:29 |-->
train_opt_callback: iter= 27 sample=217/28321 sched=0.270000 loss=10.317562 dt=00:00:01 eta=00:07:34 |-->
train_opt_callback: iter= 28 sample=225/28321 sched=0.280000 loss=10.312521 dt=00:00:01 eta=00:07:35 |-->
train_opt_callback: iter= 29 sample=233/28321 sched=0.290000 loss=10.305662 dt=00:00:01 eta=00:07:29 |-->
save_checkpoint_file: saving to checkpoint-30.gguf
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_checkpoint_file: saving to checkpoint-LATEST.gguf
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_llama_model_file: saving to ggml-checkpoint-f32.bin
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_llama_model_file: saving to ggml-checkpoint-f32.bin
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
train_opt_callback: iter= 30 sample=241/28321 sched=0.300000 loss=10.300337 dt=00:00:01 eta=00:07:30 |-->
train_opt_callback: iter= 31 sample=249/28321 sched=0.310000 loss=10.294631 dt=00:00:02 eta=00:07:34 |-->
train_opt_callback: iter= 32 sample=257/28321 sched=0.320000 loss=10.288112 dt=00:00:02 eta=00:07:38 |-->
train_opt_callback: iter= 33 sample=265/28321 sched=0.330000 loss=10.279509 dt=00:00:02 eta=00:07:29 |-->
train_opt_callback: iter= 34 sample=273/28321 sched=0.340000 loss=10.274133 dt=00:00:02 eta=00:07:28 |-->
train_opt_callback: iter= 35 sample=281/28321 sched=0.350000 loss=10.267101 dt=00:00:01 eta=00:07:09 |-->
train_opt_callback: iter= 36 sample=289/28321 sched=0.360000 loss=10.258820 dt=00:00:01 eta=00:07:05 |-->
train_opt_callback: iter= 37 sample=297/28321 sched=0.370000 loss=10.248584 dt=00:00:01 eta=00:06:58 |-->
train_opt_callback: iter= 38 sample=305/28321 sched=0.380000 loss=10.240044 dt=00:00:01 eta=00:06:54 |-->
train_opt_callback: iter= 39 sample=313/28321 sched=0.390000 loss=10.231194 dt=00:00:01 eta=00:06:57 |-->
save_checkpoint_file: saving to checkpoint-40.gguf
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_checkpoint_file: saving to checkpoint-LATEST.gguf
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_llama_model_file: saving to ggml-checkpoint-f32.bin
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_llama_model_file: saving to ggml-checkpoint-f32.bin
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
train_opt_callback: iter= 40 sample=321/28321 sched=0.400000 loss=10.220399 dt=00:00:01 eta=00:07:04 |--->
train_opt_callback: iter= 41 sample=329/28321 sched=0.410000 loss=10.210823 dt=00:00:01 eta=00:06:54 |--->
train_opt_callback: iter= 42 sample=337/28321 sched=0.420000 loss=10.197856 dt=00:00:01 eta=00:06:32 |--->
train_opt_callback: iter= 43 sample=345/28321 sched=0.430000 loss=10.188459 dt=00:00:01 eta=00:06:51 |--->
train_opt_callback: iter= 44 sample=353/28321 sched=0.440000 loss=10.184780 dt=00:00:01 eta=00:06:42 |--->
train_opt_callback: iter= 45 sample=361/28321 sched=0.450000 loss=10.166400 dt=00:00:01 eta=00:06:34 |--->
train_opt_callback: iter= 46 sample=369/28321 sched=0.460000 loss=10.156301 dt=00:00:01 eta=00:06:48 |--->
train_opt_callback: iter= 47 sample=377/28321 sched=0.470000 loss=10.146538 dt=00:00:01 eta=00:06:55 |--->
train_opt_callback: iter= 48 sample=385/28321 sched=0.480000 loss=10.124640 dt=00:00:02 eta=00:06:58 |--->
train_opt_callback: iter= 49 sample=393/28321 sched=0.490000 loss=10.112006 dt=00:00:02 eta=00:06:59 |---->
save_checkpoint_file: saving to checkpoint-50.gguf
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_checkpoint_file: saving to checkpoint-LATEST.gguf
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_llama_model_file: saving to ggml-checkpoint-f32.bin
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_llama_model_file: saving to ggml-checkpoint-f32.bin
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
train_opt_callback: iter= 50 sample=401/28321 sched=0.500000 loss=10.096354 dt=00:00:02 eta=00:07:00 |---->
train_opt_callback: iter= 51 sample=409/28321 sched=0.510000 loss=10.081652 dt=00:00:02 eta=00:07:18 |---->
train_opt_callback: iter= 52 sample=417/28321 sched=0.520000 loss=10.065023 dt=00:00:02 eta=00:07:15 |---->
train_opt_callback: iter= 53 sample=425/28321 sched=0.530000 loss=10.039602 dt=00:00:02 eta=00:07:17 |---->
train_opt_callback: iter= 54 sample=433/28321 sched=0.540000 loss=10.024182 dt=00:00:02 eta=00:07:09 |---->
train_opt_callback: iter= 55 sample=441/28321 sched=0.550000 loss=10.009372 dt=00:00:02 eta=00:07:10 |----->
train_opt_callback: iter= 56 sample=449/28321 sched=0.560000 loss=9.989566 dt=00:00:02 eta=00:07:07 |----->
train_opt_callback: iter= 57 sample=457/28321 sched=0.570000 loss=9.966852 dt=00:00:02 eta=00:07:06 |----->
train_opt_callback: iter= 58 sample=465/28321 sched=0.580000 loss=9.957384 dt=00:00:02 eta=00:07:07 |----->
train_opt_callback: iter= 59 sample=473/28321 sched=0.590000 loss=9.920803 dt=00:00:02 eta=00:07:09 |------>
save_checkpoint_file: saving to checkpoint-60.gguf
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_checkpoint_file: saving to checkpoint-LATEST.gguf
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_llama_model_file: saving to ggml-checkpoint-f32.bin
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_llama_model_file: saving to ggml-checkpoint-f32.bin
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
train_opt_callback: iter= 60 sample=481/28321 sched=0.600000 loss=9.898567 dt=00:00:02 eta=00:07:14 |------>
train_opt_callback: iter= 61 sample=489/28321 sched=0.610000 loss=9.881955 dt=00:00:02 eta=00:07:00 |------>
train_opt_callback: iter= 62 sample=497/28321 sched=0.620000 loss=9.854578 dt=00:00:02 eta=00:07:06 |------>
train_opt_callback: iter= 63 sample=505/28321 sched=0.630000 loss=9.828171 dt=00:00:02 eta=00:07:03 |------>
train_opt_callback: iter= 64 sample=513/28321 sched=0.640000 loss=9.800139 dt=00:00:02 eta=00:06:52 |------->
train_opt_callback: iter= 65 sample=521/28321 sched=0.650000 loss=9.778488 dt=00:00:02 eta=00:06:54 |------->
train_opt_callback: iter= 66 sample=529/28321 sched=0.660000 loss=9.731189 dt=00:00:02 eta=00:06:50 |------->
train_opt_callback: iter= 67 sample=537/28321 sched=0.670000 loss=9.711588 dt=00:00:02 eta=00:06:55 |-------->
train_opt_callback: iter= 68 sample=545/28321 sched=0.680000 loss=9.682224 dt=00:00:02 eta=00:06:49 |-------->
train_opt_callback: iter= 69 sample=553/28321 sched=0.690000 loss=9.650393 dt=00:00:02 eta=00:06:46 |-------->
save_checkpoint_file: saving to checkpoint-70.gguf
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_checkpoint_file: saving to checkpoint-LATEST.gguf
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_llama_model_file: saving to ggml-checkpoint-f32.bin
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_llama_model_file: saving to ggml-checkpoint-f32.bin
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
train_opt_callback: iter= 70 sample=561/28321 sched=0.700000 loss=9.610378 dt=00:00:02 eta=00:06:39 |--------->
train_opt_callback: iter= 71 sample=569/28321 sched=0.710000 loss=9.581768 dt=00:00:02 eta=00:06:38 |--------->
train_opt_callback: iter= 72 sample=577/28321 sched=0.720000 loss=9.544673 dt=00:00:02 eta=00:06:38 |--------->
train_opt_callback: iter= 73 sample=585/28321 sched=0.730000 loss=9.510117 dt=00:00:02 eta=00:06:38 |---------->
train_opt_callback: iter= 74 sample=593/28321 sched=0.740000 loss=9.482232 dt=00:00:02 eta=00:06:37 |---------->
train_opt_callback: iter= 75 sample=601/28321 sched=0.750000 loss=9.436307 dt=00:00:02 eta=00:06:46 |---------->
train_opt_callback: iter= 76 sample=609/28321 sched=0.760000 loss=9.377270 dt=00:00:02 eta=00:06:38 |----------->
train_opt_callback: iter= 77 sample=617/28321 sched=0.770000 loss=9.328225 dt=00:00:02 eta=00:06:24 |----------->
train_opt_callback: iter= 78 sample=625/28321 sched=0.780000 loss=9.270336 dt=00:00:02 eta=00:06:29 |------------>
train_opt_callback: iter= 79 sample=633/28321 sched=0.790000 loss=9.227901 dt=00:00:02 eta=00:06:29 |------------>
save_checkpoint_file: saving to checkpoint-80.gguf
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_checkpoint_file: saving to checkpoint-LATEST.gguf
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_llama_model_file: saving to ggml-checkpoint-f32.bin
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_llama_model_file: saving to ggml-checkpoint-f32.bin
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
train_opt_callback: iter= 80 sample=641/28321 sched=0.800000 loss=9.200651 dt=00:00:02 eta=00:06:28 |------------->
train_opt_callback: iter= 81 sample=649/28321 sched=0.810000 loss=9.126765 dt=00:00:02 eta=00:06:19 |------------->
train_opt_callback: iter= 82 sample=657/28321 sched=0.820000 loss=9.117662 dt=00:00:02 eta=00:06:10 |-------------->
train_opt_callback: iter= 83 sample=665/28321 sched=0.830000 loss=9.030342 dt=00:00:02 eta=00:06:18 |-------------->
train_opt_callback: iter= 84 sample=673/28321 sched=0.840000 loss=8.949368 dt=00:00:02 eta=00:06:19 |--------------->
train_opt_callback: iter= 85 sample=681/28321 sched=0.850000 loss=8.919739 dt=00:00:02 eta=00:06:20 |---------------->
train_opt_callback: iter= 86 sample=689/28321 sched=0.860000 loss=8.865231 dt=00:00:02 eta=00:06:14 |---------------->
train_opt_callback: iter= 87 sample=697/28321 sched=0.870000 loss=8.829300 dt=00:00:02 eta=00:06:03 |---------------->
train_opt_callback: iter= 88 sample=705/28321 sched=0.880000 loss=8.726158 dt=00:00:02 eta=00:05:59 |----------------->
train_opt_callback: iter= 89 sample=713/28321 sched=0.890000 loss=8.644808 dt=00:00:02 eta=00:06:01 |------------------>
save_checkpoint_file: saving to checkpoint-90.gguf
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_checkpoint_file: saving to checkpoint-LATEST.gguf
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_llama_model_file: saving to ggml-checkpoint-f32.bin
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_llama_model_file: saving to ggml-checkpoint-f32.bin
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
train_opt_callback: iter= 90 sample=721/28321 sched=0.900000 loss=8.616333 dt=00:00:02 eta=00:05:57 |------------------->
train_opt_callback: iter= 91 sample=729/28321 sched=0.910000 loss=8.507208 dt=00:00:02 eta=00:05:53 |-------------------->
train_opt_callback: iter= 92 sample=737/28321 sched=0.920000 loss=8.485923 dt=00:00:02 eta=00:05:52 |-------------------->
train_opt_callback: iter= 93 sample=745/28321 sched=0.930000 loss=8.388083 dt=00:00:02 eta=00:05:50 |--------------------->
train_opt_callback: iter= 94 sample=753/28321 sched=0.940000 loss=8.336430 dt=00:00:02 eta=00:05:53 |--------------------->
train_opt_callback: iter= 95 sample=761/28321 sched=0.950000 loss=8.292239 dt=00:00:02 eta=00:05:58 |---------------------->
train_opt_callback: iter= 96 sample=769/28321 sched=0.960000 loss=8.217867 dt=00:00:02 eta=00:05:52 |----------------------->
train_opt_callback: iter= 97 sample=777/28321 sched=0.970000 loss=8.140039 dt=00:00:02 eta=00:05:50 |----------------------->
train_opt_callback: iter= 98 sample=785/28321 sched=0.980000 loss=8.075435 dt=00:00:02 eta=00:05:43 |------------------------>
train_opt_callback: iter= 99 sample=793/28321 sched=0.990000 loss=7.974571 dt=00:00:02 eta=00:05:38 |------------------------->
save_checkpoint_file: saving to checkpoint-100.gguf
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_checkpoint_file: saving to checkpoint-LATEST.gguf
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_llama_model_file: saving to ggml-checkpoint-f32.bin
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_llama_model_file: saving to ggml-checkpoint-f32.bin
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
train_opt_callback: iter= 100 sample=801/28321 sched=0.977975 loss=7.915875 dt=00:00:02 eta=00:05:41 |-------------------------->
train_opt_callback: iter= 101 sample=809/28321 sched=0.977536 loss=7.814181 dt=00:00:02 eta=00:05:42 |--------------------------->
train_opt_callback: iter= 102 sample=817/28321 sched=0.977093 loss=7.791835 dt=00:00:02 eta=00:05:30 |--------------------------->
train_opt_callback: iter= 103 sample=825/28321 sched=0.976646 loss=7.758586 dt=00:00:02 eta=00:05:26 |--------------------------->
train_opt_callback: iter= 104 sample=833/28321 sched=0.976194 loss=7.616244 dt=00:00:02 eta=00:05:29 |----------------------------->
train_opt_callback: iter= 105 sample=841/28321 sched=0.975738 loss=7.531526 dt=00:00:02 eta=00:05:31 |----------------------------->
train_opt_callback: iter= 106 sample=849/28321 sched=0.975278 loss=7.499986 dt=00:00:02 eta=00:05:25 |------------------------------>
train_opt_callback: iter= 107 sample=857/28321 sched=0.974814 loss=7.424263 dt=00:00:02 eta=00:05:22 |------------------------------>
train_opt_callback: iter= 108 sample=865/28321 sched=0.974346 loss=7.415683 dt=00:00:02 eta=00:05:23 |------------------------------->
train_opt_callback: iter= 109 sample=873/28321 sched=0.973873 loss=7.327204 dt=00:00:02 eta=00:05:17 |------------------------------->
save_checkpoint_file: saving to checkpoint-110.gguf
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_checkpoint_file: saving to checkpoint-LATEST.gguf
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_llama_model_file: saving to ggml-checkpoint-f32.bin
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_llama_model_file: saving to ggml-checkpoint-f32.bin
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
train_opt_callback: iter= 110 sample=881/28321 sched=0.973396 loss=7.219823 dt=00:00:02 eta=00:05:23 |--------------------------------->
train_opt_callback: iter= 111 sample=889/28321 sched=0.972915 loss=7.214115 dt=00:00:02 eta=00:05:13 |--------------------------------->
train_opt_callback: iter= 112 sample=897/28321 sched=0.972430 loss=7.203124 dt=00:00:02 eta=00:05:06 |--------------------------------->
train_opt_callback: iter= 113 sample=905/28321 sched=0.971941 loss=7.114408 dt=00:00:02 eta=00:05:11 |---------------------------------->
train_opt_callback: iter= 114 sample=913/28321 sched=0.971447 loss=6.982309 dt=00:00:02 eta=00:05:10 |----------------------------------->
train_opt_callback: iter= 115 sample=921/28321 sched=0.970950 loss=6.949381 dt=00:00:02 eta=00:05:07 |----------------------------------->
train_opt_callback: iter= 116 sample=929/28321 sched=0.970448 loss=6.892489 dt=00:00:02 eta=00:05:04 |------------------------------------>
train_opt_callback: iter= 117 sample=937/28321 sched=0.969942 loss=6.860984 dt=00:00:02 eta=00:04:57 |------------------------------------>
train_opt_callback: iter= 118 sample=945/28321 sched=0.969432 loss=6.854873 dt=00:00:02 eta=00:04:56 |------------------------------------>
train_opt_callback: iter= 119 sample=953/28321 sched=0.968918 loss=6.841059 dt=00:00:02 eta=00:04:54 |------------------------------------>
save_checkpoint_file: saving to checkpoint-120.gguf
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_checkpoint_file: saving to checkpoint-LATEST.gguf
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_llama_model_file: saving to ggml-checkpoint-f32.bin
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_llama_model_file: saving to ggml-checkpoint-f32.bin
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
train_opt_callback: iter= 120 sample=961/28321 sched=0.968399 loss=6.658863 dt=00:00:02 eta=00:04:49 |-------------------------------------->
train_opt_callback: iter= 121 sample=969/28321 sched=0.967877 loss=6.838943 dt=00:00:02 eta=00:04:54 |------------------------------------>
train_opt_callback: iter= 122 sample=977/28321 sched=0.967350 loss=6.667257 dt=00:00:02 eta=00:04:55 |-------------------------------------->
train_opt_callback: iter= 123 sample=985/28321 sched=0.966820 loss=6.655440 dt=00:00:02 eta=00:04:51 |-------------------------------------->
train_opt_callback: iter= 124 sample=993/28321 sched=0.966285 loss=6.622534 dt=00:00:02 eta=00:04:47 |--------------------------------------->
train_opt_callback: iter= 125 sample=1001/28321 sched=0.965746 loss=6.544344 dt=00:00:02 eta=00:04:44 |--------------------------------------->
train_opt_callback: iter= 126 sample=1009/28321 sched=0.965203 loss=6.578655 dt=00:00:02 eta=00:04:46 |--------------------------------------->
train_opt_callback: iter= 127 sample=1017/28321 sched=0.964656 loss=6.624712 dt=00:00:02 eta=00:04:40 |-------------------------------------->
train_opt_callback: iter= 128 sample=1025/28321 sched=0.964104 loss=6.389845 dt=00:00:02 eta=00:04:40 |----------------------------------------->
train_opt_callback: iter= 129 sample=1033/28321 sched=0.963549 loss=6.496925 dt=00:00:02 eta=00:04:35 |---------------------------------------->
save_checkpoint_file: saving to checkpoint-130.gguf
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_checkpoint_file: saving to checkpoint-LATEST.gguf
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_llama_model_file: saving to ggml-checkpoint-f32.bin
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_llama_model_file: saving to ggml-checkpoint-f32.bin
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
train_opt_callback: iter= 130 sample=1041/28321 sched=0.962990 loss=6.544175 dt=00:00:02 eta=00:04:28 |--------------------------------------->
train_opt_callback: iter= 131 sample=1049/28321 sched=0.962426 loss=6.397351 dt=00:00:02 eta=00:04:24 |----------------------------------------->
train_opt_callback: iter= 132 sample=1057/28321 sched=0.961859 loss=6.442997 dt=00:00:02 eta=00:04:26 |---------------------------------------->
train_opt_callback: iter= 133 sample=1065/28321 sched=0.961287 loss=6.426161 dt=00:00:02 eta=00:04:27 |---------------------------------------->
train_opt_callback: iter= 134 sample=1073/28321 sched=0.960711 loss=6.393265 dt=00:00:02 eta=00:04:21 |----------------------------------------->
train_opt_callback: iter= 135 sample=1081/28321 sched=0.960131 loss=6.394783 dt=00:00:02 eta=00:04:22 |----------------------------------------->
train_opt_callback: iter= 136 sample=1089/28321 sched=0.959548 loss=6.480754 dt=00:00:02 eta=00:04:24 |---------------------------------------->
train_opt_callback: iter= 137 sample=1097/28321 sched=0.958960 loss=6.240848 dt=00:00:02 eta=00:04:19 |------------------------------------------>
train_opt_callback: iter= 138 sample=1105/28321 sched=0.958368 loss=6.290392 dt=00:00:02 eta=00:04:15 |------------------------------------------>
train_opt_callback: iter= 139 sample=1113/28321 sched=0.957772 loss=6.392591 dt=00:00:02 eta=00:04:07 |----------------------------------------->
save_checkpoint_file: saving to checkpoint-140.gguf
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_checkpoint_file: saving to checkpoint-LATEST.gguf
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_llama_model_file: saving to ggml-checkpoint-f32.bin
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_llama_model_file: saving to ggml-checkpoint-f32.bin
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
train_opt_callback: iter= 140 sample=1121/28321 sched=0.957172 loss=6.269489 dt=00:00:02 eta=00:04:07 |------------------------------------------>
train_opt_callback: iter= 141 sample=1129/28321 sched=0.956568 loss=6.425651 dt=00:00:02 eta=00:04:07 |---------------------------------------->
train_opt_callback: iter= 142 sample=1137/28321 sched=0.955960 loss=6.342426 dt=00:00:02 eta=00:04:06 |----------------------------------------->
train_opt_callback: iter= 143 sample=1145/28321 sched=0.955348 loss=6.284517 dt=00:00:02 eta=00:04:10 |------------------------------------------>
train_opt_callback: iter= 144 sample=1153/28321 sched=0.954732 loss=6.126918 dt=00:00:02 eta=00:04:06 |------------------------------------------->
train_opt_callback: iter= 145 sample=1161/28321 sched=0.954112 loss=6.294505 dt=00:00:02 eta=00:04:20 |------------------------------------------>
train_opt_callback: iter= 146 sample=1169/28321 sched=0.953488 loss=6.327740 dt=00:00:02 eta=00:04:21 |----------------------------------------->
train_opt_callback: iter= 147 sample=1177/28321 sched=0.952861 loss=6.176448 dt=00:00:02 eta=00:04:11 |------------------------------------------->
train_opt_callback: iter= 148 sample=1185/28321 sched=0.952229 loss=6.288748 dt=00:00:02 eta=00:04:05 |------------------------------------------>
train_opt_callback: iter= 149 sample=1193/28321 sched=0.951593 loss=6.275007 dt=00:00:02 eta=00:03:55 |------------------------------------------>
save_checkpoint_file: saving to checkpoint-150.gguf
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_checkpoint_file: saving to checkpoint-LATEST.gguf
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_llama_model_file: saving to ggml-checkpoint-f32.bin
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_llama_model_file: saving to ggml-checkpoint-f32.bin
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
train_opt_callback: iter= 150 sample=1201/28321 sched=0.950953 loss=6.117251 dt=00:00:02 eta=00:03:51 |-------------------------------------------->
train_opt_callback: iter= 151 sample=1209/28321 sched=0.950309 loss=6.235055 dt=00:00:02 eta=00:03:47 |------------------------------------------>
train_opt_callback: iter= 152 sample=1217/28321 sched=0.949661 loss=6.058559 dt=00:00:02 eta=00:03:50 |-------------------------------------------->
train_opt_callback: iter= 153 sample=1225/28321 sched=0.949010 loss=6.167670 dt=00:00:02 eta=00:03:42 |------------------------------------------->
train_opt_callback: iter= 154 sample=1233/28321 sched=0.948354 loss=6.293919 dt=00:00:02 eta=00:03:41 |------------------------------------------>
train_opt_callback: iter= 155 sample=1241/28321 sched=0.947695 loss=6.178796 dt=00:00:02 eta=00:03:43 |------------------------------------------->
train_opt_callback: iter= 156 sample=1249/28321 sched=0.947031 loss=6.188849 dt=00:00:02 eta=00:03:44 |------------------------------------------->
train_opt_callback: iter= 157 sample=1257/28321 sched=0.946364 loss=6.276544 dt=00:00:02 eta=00:03:35 |------------------------------------------>
train_opt_callback: iter= 158 sample=1265/28321 sched=0.945692 loss=6.168988 dt=00:00:02 eta=00:03:30 |------------------------------------------->
train_opt_callback: iter= 159 sample=1273/28321 sched=0.945017 loss=6.117286 dt=00:00:02 eta=00:03:25 |-------------------------------------------->
save_checkpoint_file: saving to checkpoint-160.gguf
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_checkpoint_file: saving to checkpoint-LATEST.gguf
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_llama_model_file: saving to ggml-checkpoint-f32.bin
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_llama_model_file: saving to ggml-checkpoint-f32.bin
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
train_opt_callback: iter= 160 sample=1281/28321 sched=0.944338 loss=5.980232 dt=00:00:02 eta=00:03:27 |--------------------------------------------->
train_opt_callback: iter= 161 sample=1289/28321 sched=0.943655 loss=6.071966 dt=00:00:02 eta=00:03:30 |-------------------------------------------->
train_opt_callback: iter= 162 sample=1297/28321 sched=0.942968 loss=6.048306 dt=00:00:02 eta=00:03:31 |-------------------------------------------->
train_opt_callback: iter= 163 sample=1305/28321 sched=0.942277 loss=5.984722 dt=00:00:02 eta=00:03:32 |--------------------------------------------->
train_opt_callback: iter= 164 sample=1313/28321 sched=0.941583 loss=6.087429 dt=00:00:02 eta=00:03:31 |-------------------------------------------->
train_opt_callback: iter= 165 sample=1321/28321 sched=0.940884 loss=6.090616 dt=00:00:02 eta=00:03:30 |-------------------------------------------->
train_opt_callback: iter= 166 sample=1329/28321 sched=0.940182 loss=6.049459 dt=00:00:02 eta=00:03:29 |-------------------------------------------->
train_opt_callback: iter= 167 sample=1337/28321 sched=0.939476 loss=6.061348 dt=00:00:02 eta=00:03:28 |-------------------------------------------->
train_opt_callback: iter= 168 sample=1345/28321 sched=0.938765 loss=6.083156 dt=00:00:02 eta=00:03:27 |-------------------------------------------->
train_opt_callback: iter= 169 sample=1353/28321 sched=0.938052 loss=5.960547 dt=00:00:02 eta=00:03:25 |--------------------------------------------->
save_checkpoint_file: saving to checkpoint-170.gguf
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_checkpoint_file: saving to checkpoint-LATEST.gguf
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_llama_model_file: saving to ggml-checkpoint-f32.bin
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_llama_model_file: saving to ggml-checkpoint-f32.bin
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
train_opt_callback: iter= 170 sample=1361/28321 sched=0.937334 loss=5.994433 dt=00:00:02 eta=00:03:20 |--------------------------------------------->
train_opt_callback: iter= 171 sample=1369/28321 sched=0.936612 loss=6.018966 dt=00:00:02 eta=00:03:12 |--------------------------------------------->
train_opt_callback: iter= 172 sample=1377/28321 sched=0.935887 loss=5.987111 dt=00:00:02 eta=00:03:06 |--------------------------------------------->
train_opt_callback: iter= 173 sample=1385/28321 sched=0.935158 loss=5.964705 dt=00:00:02 eta=00:03:05 |--------------------------------------------->
train_opt_callback: iter= 174 sample=1393/28321 sched=0.934425 loss=6.016423 dt=00:00:02 eta=00:03:03 |--------------------------------------------->
train_opt_callback: iter= 175 sample=1401/28321 sched=0.933688 loss=5.967543 dt=00:00:02 eta=00:03:04 |--------------------------------------------->
train_opt_callback: iter= 176 sample=1409/28321 sched=0.932948 loss=5.863286 dt=00:00:02 eta=00:03:11 |---------------------------------------------->
train_opt_callback: iter= 177 sample=1417/28321 sched=0.932203 loss=5.860222 dt=00:00:02 eta=00:03:02 |---------------------------------------------->
train_opt_callback: iter= 178 sample=1425/28321 sched=0.931455 loss=6.080163 dt=00:00:02 eta=00:02:55 |-------------------------------------------->
train_opt_callback: iter= 179 sample=1433/28321 sched=0.930703 loss=5.915224 dt=00:00:02 eta=00:02:52 |---------------------------------------------->
save_checkpoint_file: saving to checkpoint-180.gguf
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_checkpoint_file: saving to checkpoint-LATEST.gguf
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_llama_model_file: saving to ggml-checkpoint-f32.bin
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_llama_model_file: saving to ggml-checkpoint-f32.bin
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
train_opt_callback: iter= 180 sample=1441/28321 sched=0.929948 loss=5.848048 dt=00:00:02 eta=00:02:49 |---------------------------------------------->
train_opt_callback: iter= 181 sample=1449/28321 sched=0.929188 loss=5.947498 dt=00:00:02 eta=00:02:48 |--------------------------------------------->
train_opt_callback: iter= 182 sample=1457/28321 sched=0.928425 loss=5.933592 dt=00:00:02 eta=00:02:47 |--------------------------------------------->
train_opt_callback: iter= 183 sample=1465/28321 sched=0.927658 loss=5.981134 dt=00:00:02 eta=00:02:43 |--------------------------------------------->
train_opt_callback: iter= 184 sample=1473/28321 sched=0.926888 loss=5.779394 dt=00:00:02 eta=00:02:40 |----------------------------------------------->
train_opt_callback: iter= 185 sample=1481/28321 sched=0.926113 loss=5.855101 dt=00:00:02 eta=00:02:38 |---------------------------------------------->
train_opt_callback: iter= 186 sample=1489/28321 sched=0.925335 loss=5.932856 dt=00:00:02 eta=00:02:37 |--------------------------------------------->
train_opt_callback: iter= 187 sample=1497/28321 sched=0.924554 loss=5.715711 dt=00:00:02 eta=00:02:34 |------------------------------------------------>
train_opt_callback: iter= 188 sample=1505/28321 sched=0.923768 loss=5.827988 dt=00:00:02 eta=00:02:34 |---------------------------------------------->
train_opt_callback: iter= 189 sample=1513/28321 sched=0.922979 loss=5.728333 dt=00:00:02 eta=00:02:29 |----------------------------------------------->
save_checkpoint_file: saving to checkpoint-190.gguf
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_checkpoint_file: saving to checkpoint-LATEST.gguf
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_llama_model_file: saving to ggml-checkpoint-f32.bin
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_llama_model_file: saving to ggml-checkpoint-f32.bin
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
train_opt_callback: iter= 190 sample=1521/28321 sched=0.922186 loss=5.779024 dt=00:00:02 eta=00:02:28 |----------------------------------------------->
train_opt_callback: iter= 191 sample=1529/28321 sched=0.921390 loss=5.835272 dt=00:00:02 eta=00:02:28 |---------------------------------------------->
train_opt_callback: iter= 192 sample=1537/28321 sched=0.920590 loss=5.641994 dt=00:00:02 eta=00:02:26 |------------------------------------------------>
train_opt_callback: iter= 193 sample=1545/28321 sched=0.919786 loss=5.765279 dt=00:00:02 eta=00:02:26 |----------------------------------------------->
train_opt_callback: iter= 194 sample=1553/28321 sched=0.918978 loss=5.766979 dt=00:00:02 eta=00:02:28 |----------------------------------------------->
train_opt_callback: iter= 195 sample=1561/28321 sched=0.918167 loss=5.803986 dt=00:00:02 eta=00:02:22 |----------------------------------------------->
train_opt_callback: iter= 196 sample=1569/28321 sched=0.917353 loss=5.770722 dt=00:00:02 eta=00:02:18 |----------------------------------------------->
train_opt_callback: iter= 197 sample=1577/28321 sched=0.916534 loss=5.574189 dt=00:00:02 eta=00:02:14 |------------------------------------------------->
train_opt_callback: iter= 198 sample=1585/28321 sched=0.915712 loss=5.670218 dt=00:00:02 eta=00:02:14 |------------------------------------------------>
train_opt_callback: iter= 199 sample=1593/28321 sched=0.914887 loss=5.765743 dt=00:00:02 eta=00:02:16 |----------------------------------------------->
save_checkpoint_file: saving to checkpoint-200.gguf
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_checkpoint_file: saving to checkpoint-LATEST.gguf
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_llama_model_file: saving to ggml-checkpoint-f32.bin
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_llama_model_file: saving to ggml-checkpoint-f32.bin
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
train_opt_callback: iter= 200 sample=1601/28321 sched=0.914058 loss=5.599796 dt=00:00:02 eta=00:02:13 |------------------------------------------------->
train_opt_callback: iter= 201 sample=1609/28321 sched=0.913225 loss=5.565473 dt=00:00:02 eta=00:02:08 |------------------------------------------------->
train_opt_callback: iter= 202 sample=1617/28321 sched=0.912389 loss=5.592797 dt=00:00:02 eta=00:02:09 |------------------------------------------------->
train_opt_callback: iter= 203 sample=1625/28321 sched=0.911549 loss=5.419349 dt=00:00:02 eta=00:02:03 |--------------------------------------------------->
train_opt_callback: iter= 204 sample=1633/28321 sched=0.910705 loss=5.488876 dt=00:00:02 eta=00:01:59 |-------------------------------------------------->
train_opt_callback: iter= 205 sample=1641/28321 sched=0.909858 loss=5.660602 dt=00:00:02 eta=00:01:57 |------------------------------------------------>
train_opt_callback: iter= 206 sample=1649/28321 sched=0.909007 loss=5.594108 dt=00:00:02 eta=00:01:57 |------------------------------------------------->
train_opt_callback: iter= 207 sample=1657/28321 sched=0.908153 loss=5.543122 dt=00:00:02 eta=00:01:53 |------------------------------------------------->
train_opt_callback: iter= 208 sample=1665/28321 sched=0.907296 loss=5.565341 dt=00:00:02 eta=00:01:53 |------------------------------------------------->
train_opt_callback: iter= 209 sample=1673/28321 sched=0.906434 loss=5.646884 dt=00:00:02 eta=00:01:53 |------------------------------------------------>
save_checkpoint_file: saving to checkpoint-210.gguf
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_checkpoint_file: saving to checkpoint-LATEST.gguf
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_llama_model_file: saving to ggml-checkpoint-f32.bin
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_llama_model_file: saving to ggml-checkpoint-f32.bin
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
train_opt_callback: iter= 210 sample=1681/28321 sched=0.905570 loss=5.593593 dt=00:00:02 eta=00:01:52 |------------------------------------------------->
train_opt_callback: iter= 211 sample=1689/28321 sched=0.904702 loss=5.484448 dt=00:00:02 eta=00:01:46 |-------------------------------------------------->
train_opt_callback: iter= 212 sample=1697/28321 sched=0.903830 loss=5.608016 dt=00:00:02 eta=00:01:45 |------------------------------------------------->
train_opt_callback: iter= 213 sample=1705/28321 sched=0.902955 loss=5.617675 dt=00:00:02 eta=00:01:40 |------------------------------------------------->
train_opt_callback: iter= 214 sample=1713/28321 sched=0.902076 loss=5.360622 dt=00:00:02 eta=00:01:37 |--------------------------------------------------->
train_opt_callback: iter= 215 sample=1721/28321 sched=0.901194 loss=5.553976 dt=00:00:02 eta=00:01:35 |------------------------------------------------->
train_opt_callback: iter= 216 sample=1729/28321 sched=0.900308 loss=5.343686 dt=00:00:02 eta=00:01:35 |--------------------------------------------------->
train_opt_callback: iter= 217 sample=1737/28321 sched=0.899419 loss=5.289958 dt=00:00:02 eta=00:01:30 |---------------------------------------------------->
train_opt_callback: iter= 218 sample=1745/28321 sched=0.898526 loss=5.487991 dt=00:00:02 eta=00:01:30 |-------------------------------------------------->
train_opt_callback: iter= 219 sample=1753/28321 sched=0.897630 loss=5.356018 dt=00:00:02 eta=00:01:28 |--------------------------------------------------->
save_checkpoint_file: saving to checkpoint-220.gguf
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_checkpoint_file: saving to checkpoint-LATEST.gguf
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_llama_model_file: saving to ggml-checkpoint-f32.bin
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_llama_model_file: saving to ggml-checkpoint-f32.bin
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
train_opt_callback: iter= 220 sample=1761/28321 sched=0.896731 loss=5.530602 dt=00:00:02 eta=00:01:25 |------------------------------------------------->
train_opt_callback: iter= 221 sample=1769/28321 sched=0.895828 loss=5.356627 dt=00:00:02 eta=00:01:20 |--------------------------------------------------->
train_opt_callback: iter= 222 sample=1777/28321 sched=0.894922 loss=5.583591 dt=00:00:02 eta=00:01:18 |------------------------------------------------->
train_opt_callback: iter= 223 sample=1785/28321 sched=0.894012 loss=5.491447 dt=00:00:02 eta=00:01:16 |-------------------------------------------------->
train_opt_callback: iter= 224 sample=1793/28321 sched=0.893099 loss=5.562267 dt=00:00:02 eta=00:01:14 |------------------------------------------------->
train_opt_callback: iter= 225 sample=1801/28321 sched=0.892183 loss=5.336006 dt=00:00:02 eta=00:01:13 |--------------------------------------------------->
train_opt_callback: iter= 226 sample=1809/28321 sched=0.891263 loss=5.361383 dt=00:00:02 eta=00:01:10 |--------------------------------------------------->
train_opt_callback: iter= 227 sample=1817/28321 sched=0.890340 loss=5.268160 dt=00:00:02 eta=00:01:07 |---------------------------------------------------->
train_opt_callback: iter= 228 sample=1825/28321 sched=0.889413 loss=5.191448 dt=00:00:02 eta=00:01:05 |----------------------------------------------------->
train_opt_callback: iter= 229 sample=1833/28321 sched=0.888483 loss=5.374747 dt=00:00:02 eta=00:01:03 |--------------------------------------------------->
save_checkpoint_file: saving to checkpoint-230.gguf
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_checkpoint_file: saving to checkpoint-LATEST.gguf
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_llama_model_file: saving to ggml-checkpoint-f32.bin
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_llama_model_file: saving to ggml-checkpoint-f32.bin
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
train_opt_callback: iter= 230 sample=1841/28321 sched=0.887550 loss=5.427733 dt=00:00:02 eta=00:01:01 |-------------------------------------------------->
train_opt_callback: iter= 231 sample=1849/28321 sched=0.886613 loss=5.408707 dt=00:00:02 eta=00:00:58 |--------------------------------------------------->
train_opt_callback: iter= 232 sample=1857/28321 sched=0.885674 loss=5.342621 dt=00:00:02 eta=00:00:56 |--------------------------------------------------->
train_opt_callback: iter= 233 sample=1865/28321 sched=0.884730 loss=5.274462 dt=00:00:02 eta=00:00:53 |---------------------------------------------------->
train_opt_callback: iter= 234 sample=1873/28321 sched=0.883784 loss=5.193012 dt=00:00:02 eta=00:00:52 |----------------------------------------------------->
train_opt_callback: iter= 235 sample=1881/28321 sched=0.882834 loss=5.432597 dt=00:00:02 eta=00:00:49 |-------------------------------------------------->
train_opt_callback: iter= 236 sample=1889/28321 sched=0.881881 loss=5.362617 dt=00:00:02 eta=00:00:47 |--------------------------------------------------->
train_opt_callback: iter= 237 sample=1897/28321 sched=0.880925 loss=5.160692 dt=00:00:02 eta=00:00:44 |----------------------------------------------------->
train_opt_callback: iter= 238 sample=1905/28321 sched=0.879965 loss=5.230626 dt=00:00:02 eta=00:00:43 |---------------------------------------------------->
train_opt_callback: iter= 239 sample=1913/28321 sched=0.879002 loss=5.159585 dt=00:00:02 eta=00:00:40 |----------------------------------------------------->
save_checkpoint_file: saving to checkpoint-240.gguf
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_checkpoint_file: saving to checkpoint-LATEST.gguf
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_llama_model_file: saving to ggml-checkpoint-f32.bin
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_llama_model_file: saving to ggml-checkpoint-f32.bin
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
train_opt_callback: iter= 240 sample=1921/28321 sched=0.878036 loss=5.420028 dt=00:00:02 eta=00:00:38 |--------------------------------------------------->
train_opt_callback: iter= 241 sample=1929/28321 sched=0.877067 loss=5.245462 dt=00:00:02 eta=00:00:35 |---------------------------------------------------->
train_opt_callback: iter= 242 sample=1937/28321 sched=0.876094 loss=5.187116 dt=00:00:02 eta=00:00:32 |----------------------------------------------------->
train_opt_callback: iter= 243 sample=1945/28321 sched=0.875118 loss=5.193003 dt=00:00:02 eta=00:00:29 |----------------------------------------------------->
train_opt_callback: iter= 244 sample=1953/28321 sched=0.874139 loss=5.292401 dt=00:00:02 eta=00:00:27 |---------------------------------------------------->
train_opt_callback: iter= 245 sample=1961/28321 sched=0.873157 loss=5.290339 dt=00:00:02 eta=00:00:25 |---------------------------------------------------->
train_opt_callback: iter= 246 sample=1969/28321 sched=0.872171 loss=5.286678 dt=00:00:02 eta=00:00:23 |---------------------------------------------------->
train_opt_callback: iter= 247 sample=1977/28321 sched=0.871183 loss=5.267505 dt=00:00:02 eta=00:00:21 |---------------------------------------------------->
train_opt_callback: iter= 248 sample=1985/28321 sched=0.870191 loss=5.302953 dt=00:00:02 eta=00:00:19 |---------------------------------------------------->
train_opt_callback: iter= 249 sample=1993/28321 sched=0.869196 loss=5.162525 dt=00:00:02 eta=00:00:16 |----------------------------------------------------->
save_checkpoint_file: saving to checkpoint-250.gguf
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_checkpoint_file: saving to checkpoint-LATEST.gguf
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_llama_model_file: saving to ggml-checkpoint-f32.bin
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_llama_model_file: saving to ggml-checkpoint-f32.bin
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
train_opt_callback: iter= 250 sample=2001/28321 sched=0.868198 loss=5.052213 dt=00:00:02 eta=00:00:14 |------------------------------------------------------>
train_opt_callback: iter= 251 sample=2009/28321 sched=0.867197 loss=5.261308 dt=00:00:02 eta=00:00:11 |---------------------------------------------------->
train_opt_callback: iter= 252 sample=2017/28321 sched=0.866192 loss=5.191775 dt=00:00:02 eta=00:00:09 |----------------------------------------------------->
train_opt_callback: iter= 253 sample=2025/28321 sched=0.865185 loss=5.049337 dt=00:00:02 eta=00:00:06 |------------------------------------------------------>
train_opt_callback: iter= 254 sample=2033/28321 sched=0.864174 loss=5.121353 dt=00:00:02 eta=00:00:04 |------------------------------------------------------>
train_opt_callback: iter= 255 sample=2041/28321 sched=0.863161 loss=5.211682 dt=00:00:02 eta=00:00:02 |----------------------------------------------------->
train_opt_callback: iter= 256 sample=2049/28321 sched=0.862144 loss=5.230773 dt=00:00:02 eta=0.0ms |---------------------------------------------------->
main: total training time: 00:10:54
save_checkpoint_file: saving to checkpoint-256.gguf
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_checkpoint_file: saving to checkpoint-LATEST.gguf
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_llama_model_file: saving to ggml-checkpoint-f32.bin
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
save_llama_model_file: saving to ggml-checkpoint-f32.bin
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
PS C:\D\Custom\Dev\CCPP\llama.cpp\build> .\bin\Release\main.exe -m .\checkpoint-LATEST.gguf
Log start
main: build = 2749 (928e0b7)
main: built with MSVC 19.37.32824.0 for x64
main: seed = 1715224429
llama_model_loader: loaded meta data with 40 key-value pairs and 149 tensors from .\checkpoint-LATEST.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: training.type str = train_model
llama_model_loader: - kv 1: general.architecture str = llama
llama_model_loader: - kv 2: general.name str = llama
llama_model_loader: - kv 3: general.file_type u32 = 0
llama_model_loader: - kv 4: llama.context_length u32 = 128
llama_model_loader: - kv 5: llama.embedding_length u32 = 256
llama_model_loader: - kv 6: llama.feed_forward_length u32 = 768
llama_model_loader: - kv 7: llama.attention.head_count u32 = 8
llama_model_loader: - kv 8: llama.block_count u32 = 16
llama_model_loader: - kv 9: llama.rope.dimension_count u32 = 32
llama_model_loader: - kv 10: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 11: llama.rope.freq_base f32 = 10000.000000
llama_model_loader: - kv 12: llama.rope.scale_linear f32 = 1.000000
llama_model_loader: - kv 13: tokenizer.ggml.model str = llama
llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv 16: tokenizer.ggml.tokens arr[str,32000] = ["", "", "", "<0x00>", "<...
llama_model_loader: - kv 17: tokenizer.ggml.bos_token_id u32 = 1
llama_model_loader: - kv 18: tokenizer.ggml.eos_token_id u32 = 2
llama_model_loader: - kv 19: tokenizer.ggml.unknown_token_id u32 = 0
llama_model_loader: - kv 20: tokenizer.ggml.seperator_token_id u32 = 4294967295
llama_model_loader: - kv 21: tokenizer.ggml.padding_token_id u32 = 4294967295
llama_model_loader: - kv 22: training.file_version u32 = 1
llama_model_loader: - kv 23: training.iteration_count u64 = 256
llama_model_loader: - kv 24: training.sample_count u64 = 2056
llama_model_loader: - kv 25: training.token_count u64 = 262144
llama_model_loader: - kv 26: training.epoch_count u64 = 0
llama_model_loader: - kv 27: training.shuffle.samples_hash u64 = 1715152105487630865
llama_model_loader: - kv 28: training.shuffle.rng_state str = 1715222748 1513862194 4267860769 2986...
llama_model_loader: - kv 29: training.shuffle.sample_count u64 = 28321
llama_model_loader: - kv 30: training.shuffle.next_sample u64 = 2056
llama_model_loader: - kv 31: optimizer.file_version u32 = 0
llama_model_loader: - kv 32: optimizer.convergence_past_count u32 = 0
llama_model_loader: - kv 33: optimizer.parameter_count u64 = 30023936
llama_model_loader: - kv 34: optimizer.iteration_count u32 = 256
llama_model_loader: - kv 35: optimizer.just_initialized bool = false
llama_model_loader: - kv 36: optimizer.type str = adam
llama_model_loader: - kv 37: optimizer.adam.best_loss f32 = 10.373936
llama_model_loader: - kv 38: optimizer.adam.previous_loss f32 = 5.340094
llama_model_loader: - kv 39: optimizer.adam.no_improvement_count u32 = 0
llama_model_loader: - type f32: 149 tensors
llm_load_vocab: bad special token: 'tokenizer.ggml.seperator_token_id' = 4294967295d, using default id -1
llm_load_vocab: bad special token: 'tokenizer.ggml.padding_token_id' = 4294967295d, using default id -1
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = SPM
llm_load_print_meta: n_vocab = 32000
llm_load_print_meta: n_merges = 0
llm_load_print_meta: n_ctx_train = 128
llm_load_print_meta: n_embd = 256
llm_load_print_meta: n_head = 8
llm_load_print_meta: n_head_kv = 8
llm_load_print_meta: n_layer = 16
llm_load_print_meta: n_rot = 32
llm_load_print_meta: n_embd_head_k = 32
llm_load_print_meta: n_embd_head_v = 32
llm_load_print_meta: n_gqa = 1
llm_load_print_meta: n_embd_k_gqa = 256
llm_load_print_meta: n_embd_v_gqa = 256
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale = 0.0e+00
llm_load_print_meta: n_ff = 768
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: causal attn = 1
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 128
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: model type = ?B
llm_load_print_meta: model ftype = all F32
llm_load_print_meta: model params = 90.07 M
llm_load_print_meta: model size = 343.60 MiB (32.00 BPW)
llm_load_print_meta: general.name = llama
llm_load_print_meta: BOS token = 1 ''
llm_load_print_meta: EOS token = 2 '
'
llm_load_print_meta: UNK token = 0 ''
llm_load_print_meta: LF token = 13 '<0x0A>'
llm_load_tensors: ggml ctx size = 0.08 MiB
llama_model_load: error loading model: done_getting_tensors: wrong number of tensors; expected 149, got 147
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model '.\checkpoint-LATEST.gguf'
main: error: unable to load model

Any help would be greatly appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant