You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, first: thank you and admirations for your great work and dedication! I'm trying to convert an old h5 GPT2 model to ggml (a GPT2-Medium, trained on Colab/Tesla T4 with tf on Bulgarian texts in 2021) in order to "replay" it a bit, inspired by the impressive speed of your library.
However I face an error of wrong shapes during the conversion (first I add @ line 73: from_tf=True as another error suggests).
I haven't tried with another instance of a gpt2 model, in case if it is possible something to be wrong with mine as it was not finetuned, but instantiated from scratch (it was a small growing dataset, maybe about 140 MB at the maximum).
EDIT 2 27.2.2024: Comparing the numbers, I realized it could be because I've created it with a slightly shorter vocabulary of 50255 intstead of 50257 ... h shape torch.Size([50255, 1024]) and I see somebody in another issue has created one with a size of 50259, again causing problems "gpt2_model_load: n_vocab = 50259"...: gpt2 error #371
Could that be the problem or actually the vocabulary size shouldn't matter? (Because on the other hand the mismatch is [1024,1024]. I tried to edit the vocabulary files, changed to 50257, added two more tokens etc., but now there was another mismatch, 51461120 = 50255*1024 | 50257 | 1024 | 51463168, I guess the "vocab_size": in the config.json, if I revert it to 50257, the previous error returns.
I guess a solution would be to open it with tf, create another proper GPT2 instance with the right vocab.size, copy the appropriate weights at the lower level and save. Or maybe do this on the fly.
I tried the second, I managed to pad an initial tensor 50255,1024 to 50257,1024, while preserving 50255 in model.config. Then the reading of the tf model passes, but it seems it fails again when it starts with the conversion to pt, although now with the proper dimension of 50257.
tf.functiondefeager_f(symbolic_weight):
print("PAD????", symbolic_weight.shape[0]*symbolic_weight.shape[1])
paddings=tf.constant([[0, 2], [0,0]]) #add 2 after dim 0symbolic_weight=tf.pad(symbolic_weight, paddings, "constant", 0)
print(symbolic_weight.shape)
returnsymbolic_weight
In:
(...)
modeling_tf_utils.py
defload_tf_weights_from_h5(model, resolved_archive_file, ignore_mismatched_sizes=False, _prefix=None):
#(...)ifsaved_weight_valueisnotNone:
print("saved_weight_value=",saved_weight_value)
print(saved_weight_value.shape)
# Check if the shape of the current weight and the one from the H5 file are differentprint("SAVED_WEIGHT")
print(saved_weight_value)
print(saved_weight_value.shape)
ifsaved_weight_value.shape[0] ==50255:
saved_weight_value=eager_f(saved_weight_value)
print("AFTER PADDING SAVED_WEIGHT:")
print(saved_weight_value)
print(saved_weight_value.shape)
ss=input("Press a key...")
(...)
K.int_shape(symbolic_weight)= (1024,)
Traceback (most recent call last):
File "/home/tosh/ggml/examples/gpt-2/convert-h5-to-ggml.py", line 80, in <module>
model = GPT2Model.from_pretrained(dir_model, low_cpu_mem_usage=True, from_tf=True) #from_tf
File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3469, in from_pretrained
model, loading_info = load_tf2_checkpoint_in_pytorch_model(
File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_tf_pytorch_utils.py", line 468, in load_tf2_checkpoint_in_pytorch_model
return load_tf2_model_in_pytorch_model(
File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_tf_pytorch_utils.py", line 477, in load_tf2_model_in_pytorch_model
return load_tf2_weights_in_pytorch_model(
File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_tf_pytorch_utils.py", line 495, in load_tf2_weights_in_pytorch_model
return load_tf2_state_dict_in_pytorch_model(
File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_tf_pytorch_utils.py", line 565, in load_tf2_state_dict_in_pytorch_model
missing_keys, unexpected_keys = pt_model.load_state_dict(new_pt_params_dict, strict=False)
File "/home/tosh/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for GPT2Model:
size mismatch for wpe.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
.../transformers/modeling_tf_utils.py
def load_tf_weights_from_h5(model, resolved_archive_file, ignore_mismatched_sizes=False, _prefix=None):
mismatched_layers = []
# Read the H5 file
with h5py.File(resolved_archive_file, "r") as sharded_checkpoint_file:
# Retrieve the name of each layer from the H5 file
saved_h5_model_layers_name = set(load_attributes_from_hdf5_group(sharded_checkpoint_file, "layer_names"))
...
2024-02-23 09:31:49.259251: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-02-23 09:31:49.259329: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-02-23 09:31:49.261800: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-02-23 09:31:51.094485: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-02-23 09:31:54.027130: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 205844480 exceeds 10% of free system memory.
2024-02-23 09:31:55.341677: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 205844480 exceeds 10% of free system memory.
2024-02-23 09:31:55.623982: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 205844480 exceeds 10% of free system memory.
2024-02-23 09:31:56.140302: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 16777216 exceeds 10% of free system memory.
2024-02-23 09:31:56.229922: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 16777216 exceeds 10% of free system memory.
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:2025: UserWarning: for wte.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
Traceback (most recent call last):
File "/content/ggml/examples/gpt-2/convert-h5-to-ggml.py", line 73, in <module>
model = GPT2Model.from_pretrained(dir_model, low_cpu_mem_usage=True, from_tf=True)
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 3817, in from_pretrained
model, loading_info = load_tf2_checkpoint_in_pytorch_model(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_tf_pytorch_utils.py", line 469, in load_tf2_checkpoint_in_pytorch_model
return load_tf2_model_in_pytorch_model(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_tf_pytorch_utils.py", line 478, in load_tf2_model_in_pytorch_model
return load_tf2_weights_in_pytorch_model(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_tf_pytorch_utils.py", line 496, in load_tf2_weights_in_pytorch_model
return load_tf2_state_dict_in_pytorch_model(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_tf_pytorch_utils.py", line 566, in load_tf2_state_dict_in_pytorch_model
missing_keys, unexpected_keys = pt_model.load_state_dict(new_pt_params_dict, strict=False)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2152, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for GPT2Model:
size mismatch for wpe.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for h.0.ln_1.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.0.ln_1.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.0.attn.c_attn.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
size mismatch for h.0.attn.c_attn.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for h.0.attn.c_proj.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for h.0.attn.c_proj.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.0.ln_2.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.0.ln_2.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.0.mlp.c_fc.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for h.0.mlp.c_fc.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for h.0.mlp.c_proj.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for h.0.mlp.c_proj.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.1.ln_1.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.1.ln_1.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.1.attn.c_attn.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
size mismatch for h.1.attn.c_attn.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for h.1.attn.c_proj.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for h.1.attn.c_proj.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.1.ln_2.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.1.ln_2.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.1.mlp.c_fc.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for h.1.mlp.c_fc.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for h.1.mlp.c_proj.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for h.1.mlp.c_proj.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.2.ln_1.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.2.ln_1.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.2.attn.c_attn.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
size mismatch for h.2.attn.c_attn.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for h.2.attn.c_proj.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for h.2.attn.c_proj.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.2.ln_2.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.2.ln_2.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.2.mlp.c_fc.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for h.2.mlp.c_fc.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for h.2.mlp.c_proj.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for h.2.mlp.c_proj.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.3.ln_1.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.3.ln_1.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.3.attn.c_attn.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
size mismatch for h.3.attn.c_attn.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for h.3.attn.c_proj.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for h.3.attn.c_proj.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.3.ln_2.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.3.ln_2.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.3.mlp.c_fc.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for h.3.mlp.c_fc.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for h.3.mlp.c_proj.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for h.3.mlp.c_proj.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.4.ln_1.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.4.ln_1.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.4.attn.c_attn.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
size mismatch for h.4.attn.c_attn.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for h.4.attn.c_proj.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for h.4.attn.c_proj.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.4.ln_2.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.4.ln_2.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.4.mlp.c_fc.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for h.4.mlp.c_fc.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for h.4.mlp.c_proj.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for h.4.mlp.c_proj.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.5.ln_1.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.5.ln_1.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.5.attn.c_attn.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
size mismatch for h.5.attn.c_attn.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for h.5.attn.c_proj.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for h.5.attn.c_proj.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.5.ln_2.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.5.ln_2.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.5.mlp.c_fc.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for h.5.mlp.c_fc.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for h.5.mlp.c_proj.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for h.5.mlp.c_proj.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.6.ln_1.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.6.ln_1.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.6.attn.c_attn.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
size mismatch for h.6.attn.c_attn.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for h.6.attn.c_proj.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for h.6.attn.c_proj.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.6.ln_2.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.6.ln_2.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.6.mlp.c_fc.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for h.6.mlp.c_fc.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for h.6.mlp.c_proj.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for h.6.mlp.c_proj.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.7.ln_1.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.7.ln_1.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.7.attn.c_attn.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
size mismatch for h.7.attn.c_attn.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for h.7.attn.c_proj.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for h.7.attn.c_proj.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.7.ln_2.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.7.ln_2.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.7.mlp.c_fc.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for h.7.mlp.c_fc.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for h.7.mlp.c_proj.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for h.7.mlp.c_proj.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.8.ln_1.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.8.ln_1.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.8.attn.c_attn.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
size mismatch for h.8.attn.c_attn.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for h.8.attn.c_proj.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for h.8.attn.c_proj.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.8.ln_2.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.8.ln_2.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.8.mlp.c_fc.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for h.8.mlp.c_fc.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for h.8.mlp.c_proj.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for h.8.mlp.c_proj.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.9.ln_1.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.9.ln_1.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.9.attn.c_attn.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
size mismatch for h.9.attn.c_attn.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for h.9.attn.c_proj.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for h.9.attn.c_proj.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.9.ln_2.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.9.ln_2.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.9.mlp.c_fc.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
The text was updated successfully, but these errors were encountered:
Hi, first: thank you and admirations for your great work and dedication! I'm trying to convert an old h5 GPT2 model to ggml (a GPT2-Medium, trained on Colab/Tesla T4 with tf on Bulgarian texts in 2021) in order to "replay" it a bit, inspired by the impressive speed of your library.
However I face an error of wrong shapes during the conversion (first I add @ line 73: from_tf=True as another error suggests).
I haven't tried with another instance of a gpt2 model, in case if it is possible something to be wrong with mine as it was not finetuned, but instantiated from scratch (it was a small growing dataset, maybe about 140 MB at the maximum).
The Bulgarian model can be downloaded from here: https://mega.nz/folder/0NpXwbhQ#8mid7QKtsjVxj2a6dP5d8Q
The same error appears both locally and on Google Colab. (I noticed an issue about something related, but it was closed.)
EDIT: This one: convert-h5-to-ggml.py does not match the official convert-ckpt-to-ggml.py #72
EDIT 2 27.2.2024: Comparing the numbers, I realized it could be because I've created it with a slightly shorter vocabulary of 50255 intstead of 50257 ...
h shape torch.Size([50255, 1024])
and I see somebody in another issue has created one with a size of 50259, again causing problems "gpt2_model_load: n_vocab = 50259"...: gpt2 error #371Could that be the problem or actually the vocabulary size shouldn't matter? (Because on the other hand the mismatch is [1024,1024]. I tried to edit the vocabulary files, changed to 50257, added two more tokens etc., but now there was another mismatch, 51461120 = 50255*1024 | 50257 | 1024 | 51463168, I guess the "vocab_size": in the config.json, if I revert it to 50257, the previous error returns.
I guess a solution would be to open it with tf, create another proper GPT2 instance with the right vocab.size, copy the appropriate weights at the lower level and save. Or maybe do this on the fly.
I tried the second, I managed to pad an initial tensor 50255,1024 to 50257,1024, while preserving 50255 in model.config. Then the reading of the tf model passes, but it seems it fails again when it starts with the conversion to pt, although now with the proper dimension of 50257.
In:
(...)
modeling_tf_utils.py
.../transformers/modeling_tf_utils.py
...
The initial errors from Colab and local:
The text was updated successfully, but these errors were encountered: