Inappropriate saving of the merged fine tuned llama-2 model #3931

raghavbj24 · 2024-02-15T08:07:56Z

Hi,
I am trying to fine tune the llama-2 model with the help of the following config file->

base_model: /home/ubuntu/llama-2-7b-hf_for_merge

quantization:
  bits: 8

adapter:
  type: lora
  r: 8
  dropout: 0.05
  target_modules: null
  alpha: 16
  pretrained_adapter_weights: null
  postprocessor:
      merge_adapter_into_base_model: true
      progressbar: true
  bias_type: none

prompt:
  template: |
    ### Instruction:
    {Instruction}

    ### Context:
    {Context}

    ### Response:


input_features:
  - name: prompt
    type: text
    preprocessing:
      max_sequence_length: 1024

output_features:
  - name: Response
    type: text
    preprocessing:
      max_sequence_length: 512

trainer:
  type: finetune
  learning_rate: 0.0001
  batch_size: 1
  max_batch_size: 1
  gradient_accumulation_steps: 1
  enable_gradient_checkpointing: true
  epochs: 3
  learning_rate_scheduler:
    warmup_fraction: 0.01

preprocessing:
  sample_ratio: 1.0

backend:
  type: local

The fine tuning is successful and I can see that merge and unload process was also completed as shown->

Unloading and merging model:   0%|          | 0/518 [00:00<?, ?it/s]/opt/conda/envs/ludwig_train_env/lib/python3.10/site-packages/peft/tuners/lora/bnb.py:67: UserWarning: Merge lora module to 8-bit linear may get different generations due to rounding errors.
  warnings.warn(

Unloading and merging model:   1%|▏         | 7/518 [00:00<00:07, 66.47it/s]
Unloading and merging model:   4%|▍         | 23/518 [00:00<00:06, 80.72it/s]
Unloading and merging model:   8%|▊         | 39/518 [00:00<00:05, 84.70it/s]
Unloading and merging model:  11%|█         | 55/518 [00:00<00:05, 86.39it/s]
Unloading and merging model:  14%|█▎        | 71/518 [00:00<00:05, 87.25it/s]
Unloading and merging model:  17%|█▋        | 87/518 [00:01<00:04, 87.56it/s]
Unloading and merging model:  20%|█▉        | 103/518 [00:01<00:04, 87.93it/s]
Unloading and merging model:  23%|██▎       | 119/518 [00:01<00:04, 87.98it/s]
Unloading and merging model:  26%|██▌       | 135/518 [00:01<00:04, 88.05it/s]
Unloading and merging model:  29%|██▉       | 151/518 [00:01<00:04, 88.08it/s]
Unloading and merging model:  32%|███▏      | 167/518 [00:01<00:03, 88.15it/s]
Unloading and merging model:  35%|███▌      | 183/518 [00:02<00:03, 87.92it/s]
Unloading and merging model:  38%|███▊      | 199/518 [00:02<00:03, 87.75it/s]
Unloading and merging model:  42%|████▏     | 215/518 [00:02<00:03, 87.50it/s]
Unloading and merging model:  45%|████▍     | 231/518 [00:02<00:03, 87.44it/s]
Unloading and merging model:  48%|████▊     | 247/518 [00:02<00:03, 87.47it/s]
Unloading and merging model:  51%|█████     | 263/518 [00:03<00:02, 87.39it/s]
Unloading and merging model:  54%|█████▍    | 279/518 [00:03<00:02, 87.43it/s]
Unloading and merging model:  57%|█████▋    | 295/518 [00:03<00:02, 87.35it/s]
Unloading and merging model:  60%|██████    | 311/518 [00:03<00:02, 87.30it/s]
Unloading and merging model:  63%|██████▎   | 327/518 [00:03<00:02, 87.31it/s]
Unloading and merging model:  66%|██████▌   | 343/518 [00:03<00:02, 87.48it/s]
Unloading and merging model:  69%|██████▉   | 359/518 [00:04<00:01, 87.61it/s]
Unloading and merging model:  72%|███████▏  | 375/518 [00:04<00:01, 87.59it/s]
Unloading and merging model:  75%|███████▌  | 391/518 [00:04<00:01, 87.64it/s]
Unloading and merging model:  79%|███████▊  | 407/518 [00:04<00:01, 87.65it/s]
Unloading and merging model:  82%|████████▏ | 423/518 [00:04<00:01, 87.69it/s]
Unloading and merging model:  85%|████████▍ | 439/518 [00:05<00:00, 87.76it/s]
Unloading and merging model:  88%|████████▊ | 455/518 [00:05<00:00, 87.79it/s]
Unloading and merging model:  91%|█████████ | 471/518 [00:05<00:00, 87.74it/s]
Unloading and merging model:  94%|█████████▍| 487/518 [00:05<00:00, 87.77it/s]
Unloading and merging model:  97%|█████████▋| 503/518 [00:05<00:00, 87.75it/s]
Unloading and merging model: 100%|██████████| 518/518 [00:05<00:00, 88.56it/s]
Removed shared tensor {'model.layers.7.self_attn.o_proj.weight_format', 'model.layers.17.self_attn.q_proj.weight_format', 'model.layers.19.self_attn.o_proj.weight_format', 'model.layers.20.mlp.down_proj.weight_format', 'model.layers.21.mlp.down_proj.weight_format', 'model.layers.17.self_attn.v_proj.weight_format', 'model.layers.2.self_attn.v_proj.weight_format', 'model.layers.29.self_attn.o_proj.weight_format', 'model.layers.14.mlp.down_proj.weight_format', 'model.layers.27.mlp.up_proj.weight_format', 'model.layers.0.mlp.gate_proj.weight_format', 'model.layers.27.self_attn.k_proj.weight_format', 'model.layers.12.mlp.up_proj.weight_format', 'model.layers.30.mlp.gate_proj.weight_format', 'model.layers.8.mlp.down_proj.weight_format', 'model.layers.27.self_attn.q_proj.weight_format', 'model.layers.6.mlp.gate_proj.weight_format', 'model.layers.24.mlp.gate_proj.weight_format', 'model.layers.1.self_attn.v_proj.weight_format', 'model.layers.21.mlp.gate_proj.weight_format', 'model.layers.30.mlp.down_proj.weight_format', 'model.layers.15.mlp.up_proj.weight_format', 'model.layers.11.mlp.down_proj.weight_format', 'model.layers.10.mlp.gate_proj.weight_format', 'model.layers.24.self_attn.v_proj.weight_format', 'model.layers.29.mlp.gate_proj.weight_format', 'model.layers.17.mlp.gate_proj.weight_format', 'model.layers.8.self_attn.k_proj.weight_format', 'model.layers.21.self_attn.k_proj.weight_format', 'model.layers.14.self_attn.v_proj.weight_format', 'model.layers.4.self_attn.q_proj.weight_format', 'model.layers.0.mlp.up_proj.weight_format', 'model.layers.12.self_attn.q_proj.weight_format', 'model.layers.26.self_attn.q_proj.weight_format', 'model.layers.15.self_attn.k_proj.weight_format', 'model.layers.2.self_attn.q_proj.weight_format', 'model.layers.15.mlp.down_proj.weight_format', 'model.layers.5.self_attn.k_proj.weight_format', 'model.layers.20.self_attn.o_proj.weight_format', 'model.layers.6.mlp.down_proj.weight_format', 'model.layers.14.self_attn.k_proj.weight_format', 'model.layers.30.self_attn.v_proj.weight_format', 'model.layers.5.mlp.up_proj.weight_format', 'model.layers.22.self_attn.v_proj.weight_format', 'model.layers.28.self_attn.o_proj.weight_format', 'model.layers.9.mlp.gate_proj.weight_format', 'model.layers.18.mlp.down_proj.weight_format', 'model.layers.13.mlp.down_proj.weight_format', 'model.layers.8.self_attn.q_proj.weight_format', 'model.layers.13.self_attn.q_proj.weight_format', 'model.layers.27.mlp.gate_proj.weight_format', 'model.layers.3.self_attn.k_proj.weight_format', 'model.layers.8.mlp.gate_proj.weight_format', 'model.layers.12.mlp.down_proj.weight_format', 'model.layers.16.self_attn.o_proj.weight_format', 'model.layers.28.mlp.down_proj.weight_format', 'model.layers.30.self_attn.k_proj.weight_format', 'model.layers.31.mlp.up_proj.weight_format', 'model.layers.20.mlp.up_proj.weight_format', 'model.layers.16.self_attn.k_proj.weight_format', 'model.layers.30.self_attn.q_proj.weight_format', 'model.layers.11.mlp.up_proj.weight_format', 'model.layers.3.self_attn.o_proj.weight_format', 'model.layers.0.self_attn.v_proj.weight_format', 'model.layers.5.mlp.gate_proj.weight_format', 'model.layers.7.self_attn.v_proj.weight_format', 'model.layers.22.mlp.up_proj.weight_format', 'model.layers.17.self_attn.o_proj.weight_format', 'model.layers.4.self_attn.k_proj.weight_format', 'model.layers.25.self_attn.k_proj.weight_format', 'model.layers.5.self_attn.o_proj.weight_format', 'model.layers.15.self_attn.v_proj.weight_format', 'model.layers.2.mlp.down_proj.weight_format', 'model.layers.19.mlp.up_proj.weight_format', 'model.layers.11.self_attn.q_proj.weight_format', 'model.layers.3.self_attn.v_proj.weight_format', 'model.layers.9.self_attn.k_proj.weight_format', 'model.layers.11.mlp.gate_proj.weight_format', 'model.layers.17.mlp.up_proj.weight_format', 'model.layers.10.self_attn.q_proj.weight_format', 'model.layers.20.self_attn.k_proj.weight_format', 'model.layers.5.mlp.down_proj.weight_format', 'model.layers.23.mlp.gate_proj.weight_format', 'model.layers.23.mlp.down_proj.weight_format', 'model.layers.25.mlp.gate_proj.weight_format', 'model.layers.26.mlp.down_proj.weight_format', 'model.layers.4.mlp.down_proj.weight_format', 'model.layers.14.mlp.gate_proj.weight_format', 'model.layers.27.self_attn.v_proj.weight_format', 'model.layers.1.mlp.gate_proj.weight_format', 'model.layers.17.mlp.down_proj.weight_format', 'model.layers.20.mlp.gate_proj.weight_format', 'model.layers.1.mlp.up_proj.weight_format', 'model.layers.27.self_attn.o_proj.weight_format', 'model.layers.24.mlp.up_proj.weight_format', 'model.layers.10.self_attn.k_proj.weight_format', 'model.layers.18.mlp.gate_proj.weight_format', 'model.layers.13.self_attn.v_proj.weight_format', 'model.layers.18.self_attn.o_proj.weight_format', 'model.layers.15.mlp.gate_proj.weight_format', 'model.layers.16.self_attn.v_proj.weight_format', 'model.layers.23.self_attn.q_proj.weight_format', 'model.layers.12.self_attn.o_proj.weight_format', 'model.layers.23.mlp.up_proj.weight_format', 'model.layers.9.self_attn.q_proj.weight_format', 'model.layers.21.self_attn.q_proj.weight_format', 'model.layers.3.mlp.gate_proj.weight_format', 'model.layers.19.mlp.gate_proj.weight_format', 'model.layers.27.mlp.down_proj.weight_format', 'model.layers.10.mlp.up_proj.weight_format', 'model.layers.21.self_attn.o_proj.weight_format', 'model.layers.5.self_attn.v_proj.weight_format', 'model.layers.28.mlp.gate_proj.weight_format', 'model.layers.18.self_attn.v_proj.weight_format', 'model.layers.20.self_attn.v_proj.weight_format', 'model.layers.22.mlp.gate_proj.weight_format', 'model.layers.10.mlp.down_proj.weight_format', 'model.layers.1.self_attn.o_proj.weight_format', 'model.layers.4.mlp.gate_proj.weight_format', 'model.layers.21.self_attn.v_proj.weight_format', 'model.layers.28.mlp.up_proj.weight_format', 'model.layers.1.self_attn.q_proj.weight_format', 'model.layers.16.mlp.gate_proj.weight_format', 'model.layers.20.self_attn.q_proj.weight_format', 'model.layers.26.mlp.up_proj.weight_format', 'model.layers.8.self_attn.v_proj.weight_format', 'model.layers.30.self_attn.o_proj.weight_format', 'model.layers.30.mlp.up_proj.weight_format', 'model.layers.7.mlp.down_proj.weight_format', 'model.layers.4.self_attn.o_proj.weight_format', 'model.layers.21.mlp.up_proj.weight_format', 'model.layers.1.mlp.down_proj.weight_format', 'model.layers.19.self_attn.k_proj.weight_format', 'model.layers.8.self_attn.o_proj.weight_format', 'model.layers.15.self_attn.q_proj.weight_format', 'model.layers.23.self_attn.o_proj.weight_format', 'model.layers.6.mlp.up_proj.weight_format', 'model.layers.16.mlp.up_proj.weight_format', 'model.layers.19.self_attn.v_proj.weight_format', 'model.layers.29.mlp.down_proj.weight_format', 'model.layers.31.mlp.gate_proj.weight_format', 'model.layers.18.self_attn.q_proj.weight_format', 'model.layers.19.mlp.down_proj.weight_format', 'model.layers.0.self_attn.o_proj.weight_format', 'model.layers.2.mlp.gate_proj.weight_format', 'model.layers.23.self_attn.k_proj.weight_format', 'model.layers.29.self_attn.q_proj.weight_format', 'model.layers.10.self_attn.o_proj.weight_format', 'model.layers.25.self_attn.v_proj.weight_format', 'model.layers.13.self_attn.o_proj.weight_format', 'model.layers.25.self_attn.q_proj.weight_format', 'model.layers.12.mlp.gate_proj.weight_format', 'model.layers.0.mlp.down_proj.weight_format', 'model.layers.12.self_attn.v_proj.weight_format', 'model.layers.7.self_attn.q_proj.weight_format', 'model.layers.24.self_attn.k_proj.weight_format', 'model.layers.1.self_attn.k_proj.weight_format', 'model.layers.24.mlp.down_proj.weight_format', 'model.layers.31.self_attn.q_proj.weight_format', 'model.layers.11.self_attn.o_proj.weight_format', 'model.layers.22.mlp.down_proj.weight_format', 'model.layers.7.self_attn.k_proj.weight_format', 'model.layers.26.self_attn.o_proj.weight_format', 'model.layers.14.self_attn.o_proj.weight_format', 'model.layers.29.self_attn.k_proj.weight_format', 'model.layers.23.self_attn.v_proj.weight_format', 'model.layers.7.mlp.up_proj.weight_format', 'model.layers.2.self_attn.k_proj.weight_format', 'model.layers.3.mlp.up_proj.weight_format', 'model.layers.6.self_attn.k_proj.weight_format', 'model.layers.19.self_attn.q_proj.weight_format', 'model.layers.3.mlp.down_proj.weight_format', 'model.layers.7.mlp.gate_proj.weight_format', 'model.layers.22.self_attn.o_proj.weight_format', 'model.layers.2.self_attn.o_proj.weight_format', 'model.layers.0.self_attn.k_proj.weight_format', 'model.layers.29.self_attn.v_proj.weight_format', 'model.layers.24.self_attn.q_proj.weight_format', 'model.layers.4.mlp.up_proj.weight_format', 'model.layers.4.self_attn.v_proj.weight_format', 'model.layers.13.mlp.up_proj.weight_format', 'model.layers.25.self_attn.o_proj.weight_format', 'model.layers.10.self_attn.v_proj.weight_format', 'model.layers.28.self_attn.q_proj.weight_format', 'model.layers.5.self_attn.q_proj.weight_format', 'model.layers.6.self_attn.q_proj.weight_format', 'model.layers.14.self_attn.q_proj.weight_format', 'model.layers.9.self_attn.o_proj.weight_format', 'model.layers.15.self_attn.o_proj.weight_format', 'model.layers.26.self_attn.v_proj.weight_format', 'model.layers.13.mlp.gate_proj.weight_format', 'model.layers.28.self_attn.k_proj.weight_format', 'model.layers.9.self_attn.v_proj.weight_format', 'model.layers.31.self_attn.k_proj.weight_format', 'model.layers.31.mlp.down_proj.weight_format', 'model.layers.2.mlp.up_proj.weight_format', 'model.layers.9.mlp.down_proj.weight_format', 'model.layers.31.self_attn.v_proj.weight_format', 'model.layers.17.self_attn.k_proj.weight_format', 'model.layers.12.self_attn.k_proj.weight_format', 'model.layers.28.self_attn.v_proj.weight_format', 'model.layers.29.mlp.up_proj.weight_format', 'model.layers.26.mlp.gate_proj.weight_format', 'model.layers.8.mlp.up_proj.weight_format', 'model.layers.25.mlp.down_proj.weight_format', 'model.layers.11.self_attn.v_proj.weight_format', 'model.layers.26.self_attn.k_proj.weight_format', 'model.layers.6.self_attn.o_proj.weight_format', 'model.layers.24.self_attn.o_proj.weight_format', 'model.layers.9.mlp.up_proj.weight_format', 'model.layers.31.self_attn.o_proj.weight_format', 'model.layers.18.mlp.up_proj.weight_format', 'model.layers.16.mlp.down_proj.weight_format', 'model.layers.11.self_attn.k_proj.weight_format', 'model.layers.13.self_attn.k_proj.weight_format', 'model.layers.14.mlp.up_proj.weight_format', 'model.layers.22.self_attn.q_proj.weight_format', 'model.layers.22.self_attn.k_proj.weight_format', 'model.layers.6.self_attn.v_proj.weight_format', 'model.layers.3.self_attn.q_proj.weight_format', 'model.layers.25.mlp.up_proj.weight_format', 'model.layers.18.self_attn.k_proj.weight_format', 'model.layers.16.self_attn.q_proj.weight_format'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading

╒══════════╕
│ FINISHED │
╘══════════╛

Finetuning process has been completed..
Saving the finetuned base model..
Saving the finetuned base model completed..

When I checked the disk size for this saved model it was 7.6MB only...indicating that the merge did not happen appropriately.

Environment:

absl-py==2.0.0
accelerate==0.24.1
aiohttp==3.8.6
aiohttp-cors==0.7.0
aiorwlock==1.4.0
aiosignal==1.3.1
anyio==4.2.0
asttokens @ file:///home/conda/feedstock_root/build_artifacts/asttokens_1698341106958/work
async-timeout==4.0.3
attrs==23.1.0
awscli==1.32.25
backports.functools-lru-cache @ file:///home/conda/feedstock_root/build_artifacts/backports.functools_lru_cache_1687772187254/work
beautifulsoup4==4.12.3
bitsandbytes==0.40.2
bleach==6.1.0
blessed==1.20.0
blinker==1.7.0
blis==0.7.11
botocore==1.34.25
Brotli==1.1.0
cachetools==5.3.2
captum==0.7.0
catalogue==2.0.10
certifi==2023.7.22
charset-normalizer==3.3.2
click==8.1.7
cloudpathlib==0.16.0
cloudpickle==3.0.0
colorama==0.4.4
colorful==0.5.6
comm @ file:///home/conda/feedstock_root/build_artifacts/comm_1691044910542/work
commonmark==0.9.1
confection==0.1.3
contourpy==1.2.0
cycler==0.12.1
cymem==2.0.8
Cython==3.0.5
dask==2023.3.2
dataclasses-json==0.6.2
datasets==2.14.6
debugpy @ file:///home/conda/feedstock_root/build_artifacts/debugpy_1695534290310/work
decorator @ file:///home/conda/feedstock_root/build_artifacts/decorator_1641555617451/work
deepspeed==0.12.3
dill==0.3.7
distlib==0.3.8
docutils==0.16
et-xmlfile==1.1.0
exceptiongroup @ file:///home/conda/feedstock_root/build_artifacts/exceptiongroup_1692026125334/work
executing @ file:///home/conda/feedstock_root/build_artifacts/executing_1698579936712/work
faiss-cpu==1.7.4
fastapi==0.109.0
filelock==3.13.1
Flask==3.0.1
Flask-Compress==1.14
fonttools==4.47.2
frozenlist==1.4.0
fsspec==2023.9.2
future==0.18.3
getdaft==0.1.20
google-api-core==2.15.0
google-auth==2.23.4
google-auth-oauthlib==1.1.0
googleapis-common-protos==1.62.0
gpustat==1.1.1
GPUtil==1.4.0
grpcio==1.59.2
h11==0.14.0
h5py==3.10.0
hiplot==0.1.33
hjson==3.1.0
html5lib==1.1
httpcore==1.0.2
httpx==0.26.0
huggingface-hub==0.20.3
hummingbird-ml==0.4.10
hyperopt==0.2.7
idna==3.4
imagecodecs==2024.1.1
importlib-metadata @ file:///home/conda/feedstock_root/build_artifacts/importlib-metadata_1688754491823/work
ipykernel @ file:///home/conda/feedstock_root/build_artifacts/ipykernel_1698244021190/work
ipython @ file:///home/conda/feedstock_root/build_artifacts/ipython_1698846603011/work
ipywidgets==8.1.1
itsdangerous==2.1.2
jedi @ file:///home/conda/feedstock_root/build_artifacts/jedi_1696326070614/work
Jinja2==3.1.2
jmespath==1.0.1
joblib==1.3.2
jsonschema==4.6.2
jupyter_client @ file:///home/conda/feedstock_root/build_artifacts/jupyter_client_1699283905679/work
jupyter_core @ file:///home/conda/feedstock_root/build_artifacts/jupyter_core_1698673647019/work
jupyterlab-widgets==3.0.9
kaggle==1.5.16
kiwisolver==1.4.5
langcodes==3.3.0
lightgbm==4.2.0
lightgbm-ray==0.1.9
lightning-utilities==0.9.0
locket==1.0.0
loguru==0.7.2
loralib==0.1.2
ludwig==0.9.3
lxml==4.9.3
Markdown==3.5.1
MarkupSafe==2.1.3
marshmallow==3.20.1
marshmallow-dataclass==8.5.4
marshmallow-jsonschema==0.13.0
matplotlib==3.8.2
matplotlib-inline @ file:///home/conda/feedstock_root/build_artifacts/matplotlib-inline_1660814786464/work
mpi4py==3.1.5
mpmath==1.3.0
msgpack==1.0.7
multidict==6.0.4
multiprocess==0.70.15
murmurhash==1.0.10
mypy-extensions==1.0.0
nest-asyncio @ file:///home/conda/feedstock_root/build_artifacts/nest-asyncio_1697083700168/work
networkx==3.2.1
ninja==1.11.1.1
nltk==3.8.1
numpy==1.26.2
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-ml-py==12.535.133
nvidia-nccl-cu12==2.18.1
nvidia-nvjitlink-cu12==12.3.52
nvidia-nvtx-cu12==12.1.105
oauthlib==3.2.2
onnx==1.15.0
onnxconverter-common==1.13.0
opencensus==0.11.4
opencensus-context==0.1.3
openpyxl==3.1.2
packaging @ file:///home/conda/feedstock_root/build_artifacts/packaging_1696202382185/work
pandas==2.1.3
parso @ file:///home/conda/feedstock_root/build_artifacts/parso_1638334955874/work
partd==1.4.1
peft==0.6.2
pexpect @ file:///home/conda/feedstock_root/build_artifacts/pexpect_1667297516076/work
pickleshare @ file:///home/conda/feedstock_root/build_artifacts/pickleshare_1602536217715/work
Pillow==10.1.0
platformdirs @ file:///home/conda/feedstock_root/build_artifacts/platformdirs_1699715570510/work
preshed==3.0.9
prometheus-client==0.19.0
prompt-toolkit @ file:///home/conda/feedstock_root/build_artifacts/prompt-toolkit_1699963054032/work
protobuf==3.20.3
psutil==5.9.4
ptitprince==0.2.7
ptyprocess @ file:///home/conda/feedstock_root/build_artifacts/ptyprocess_1609419310487/work/dist/ptyprocess-0.7.0-py2.py3-none-any.whl
pure-eval @ file:///home/conda/feedstock_root/build_artifacts/pure_eval_1642875951954/work
py==1.11.0
py-cpuinfo==9.0.0
py-spy==0.3.14
py4j==0.10.9.7
pyarrow==14.0.1
pyasn1==0.5.0
pyasn1-modules==0.3.0
pydantic==1.10.13
Pygments @ file:///home/conda/feedstock_root/build_artifacts/pygments_1691408637400/work
pynvml==11.5.0
pyparsing==3.1.1
pyrsistent==0.20.0
python-dateutil @ file:///home/conda/feedstock_root/build_artifacts/python-dateutil_1626286286081/work
python-multipart==0.0.6
python-slugify==8.0.1
pytz==2023.3.post1
pyxlsb==1.0.10
PyYAML==6.0
pyzmq @ file:///home/conda/feedstock_root/build_artifacts/pyzmq_1698062401223/work
ray==2.3.1
regex==2023.10.3
requests==2.31.0
requests-oauthlib==1.3.1
retry==0.9.2
rich==12.4.4
rsa==4.7.2
s3fs==0.4.2
s3transfer==0.10.0
sacremoses==0.1.1
safetensors==0.4.2
scikit-learn==1.3.2
scipy==1.11.3
seaborn==0.11.0
sentence-transformers==2.2.2
sentencepiece==0.1.99
six @ file:///home/conda/feedstock_root/build_artifacts/six_1620240208055/work
smart-open==6.4.0
sniffio==1.3.0
soupsieve==2.5
spacy==3.7.2
spacy-legacy==3.0.12
spacy-loggers==1.0.5
srsly==2.4.8
stack-data @ file:///home/conda/feedstock_root/build_artifacts/stack_data_1669632077133/work
starlette==0.35.1
sympy==1.12
tabulate==0.9.0
tblib==3.0.0
tensorboard==2.15.1
tensorboard-data-server==0.7.2
tensorboardX==2.2
text-unidecode==1.3
thinc==8.2.1
threadpoolctl==3.2.0
tifffile==2024.2.12
tokenizers==0.15.2
toolz==0.12.0
torch==2.1.0
torchaudio==2.1.0
torchdata==0.7.0
torchinfo==1.8.0
torchmetrics==1.2.0
torchtext==0.16.0
torchvision==0.16.0
tornado @ file:///home/conda/feedstock_root/build_artifacts/tornado_1695373560918/work
tqdm==4.66.1
traitlets @ file:///home/conda/feedstock_root/build_artifacts/traitlets_1698671135544/work
transformers==4.37.2
triton==2.1.0
typer==0.9.0
typing-inspect==0.9.0
typing_extensions @ file:///home/conda/feedstock_root/build_artifacts/typing_extensions_1695040754690/work
tzdata==2023.3
urllib3==1.26.18
uvicorn==0.27.0
virtualenv==20.25.0
wasabi==1.1.2
wcwidth @ file:///home/conda/feedstock_root/build_artifacts/wcwidth_1699959196938/work
weasel==0.3.4
webencodings==0.5.1
Werkzeug==3.0.1
widgetsnbextension==4.0.9
wrapt==1.16.0
xgboost==2.0.3
xgboost-ray==0.1.18
xlrd==2.0.1
XlsxWriter==3.1.9
xlwt==1.3.0
xxhash==3.4.1
yarl==1.9.2
zipp @ file:///home/conda/feedstock_root/build_artifacts/zipp_1695255097490/work

Can someone help me in solving this
TIA

The text was updated successfully, but these errors were encountered:

alexsherstinsky · 2024-02-15T15:49:18Z

Hi @raghavbj24 -- thank you for submitting this issue! Question for you: I see that your base_model is /home/ubuntu/llama-2-7b-hf_for_merge. Would the same "small size" phenomenon happen if you try to use meta-llama/Llama-2-7b-hf from https://huggingface.co/meta-llama/Llama-2-7b-hf? Please let me know. Thank you.

raghavbj24 · 2024-02-16T09:47:54Z

Hi @alexsherstinsky-- as per your suggestion I tried the base model as meta-llama/Llama-2-7b-hf from huggingface...but there is no difference and the size of the saved model is very small.

alexsherstinsky · 2024-02-16T16:49:51Z

@raghavbj24 Could you please point me to the HuggingFace location where your model is saved and enable me to access it with "read" privileges? I am going to look into it thoroughly in the next few days. Thank you.

alexsherstinsky · 2024-02-16T16:54:53Z

@raghavbj24 In parallel, if you do not mind: could you please rerun your experiment using this base model: alexsherstinsky/Mistral-7B-v0.1-sharded -- and let me know here what you see for the merged model size (and please also tell me the location where it will be saved). Thank you very much for your collaboration.

raghavbj24 changed the title ~~Unable to save the fine tuned llama-2 model~~ Inappropriate saving of the merged fine tuned llama-2 model Feb 15, 2024

alexsherstinsky self-assigned this Feb 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inappropriate saving of the merged fine tuned llama-2 model #3931

Inappropriate saving of the merged fine tuned llama-2 model #3931

raghavbj24 commented Feb 15, 2024

alexsherstinsky commented Feb 15, 2024

raghavbj24 commented Feb 16, 2024

alexsherstinsky commented Feb 16, 2024

alexsherstinsky commented Feb 16, 2024

Inappropriate saving of the merged fine tuned llama-2 model #3931

Inappropriate saving of the merged fine tuned llama-2 model #3931

Comments

raghavbj24 commented Feb 15, 2024

alexsherstinsky commented Feb 15, 2024

raghavbj24 commented Feb 16, 2024

alexsherstinsky commented Feb 16, 2024

alexsherstinsky commented Feb 16, 2024