Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inappropriate saving of the merged fine tuned llama-2 model #3931

Open
raghavbj24 opened this issue Feb 15, 2024 · 4 comments
Open

Inappropriate saving of the merged fine tuned llama-2 model #3931

raghavbj24 opened this issue Feb 15, 2024 · 4 comments
Assignees

Comments

@raghavbj24
Copy link

Hi,
I am trying to fine tune the llama-2 model with the help of the following config file->

base_model: /home/ubuntu/llama-2-7b-hf_for_merge

quantization:
  bits: 8

adapter:
  type: lora
  r: 8
  dropout: 0.05
  target_modules: null
  alpha: 16
  pretrained_adapter_weights: null
  postprocessor:
      merge_adapter_into_base_model: true
      progressbar: true
  bias_type: none

prompt:
  template: |
    ### Instruction:
    {Instruction}

    ### Context:
    {Context}

    ### Response:


input_features:
  - name: prompt
    type: text
    preprocessing:
      max_sequence_length: 1024

output_features:
  - name: Response
    type: text
    preprocessing:
      max_sequence_length: 512

trainer:
  type: finetune
  learning_rate: 0.0001
  batch_size: 1
  max_batch_size: 1
  gradient_accumulation_steps: 1
  enable_gradient_checkpointing: true
  epochs: 3
  learning_rate_scheduler:
    warmup_fraction: 0.01

preprocessing:
  sample_ratio: 1.0

backend:
  type: local

The fine tuning is successful and I can see that merge and unload process was also completed as shown->

Unloading and merging model:   0%|          | 0/518 [00:00<?, ?it/s]/opt/conda/envs/ludwig_train_env/lib/python3.10/site-packages/peft/tuners/lora/bnb.py:67: UserWarning: Merge lora module to 8-bit linear may get different generations due to rounding errors.
  warnings.warn(

Unloading and merging model:   1%|▏         | 7/518 [00:00<00:07, 66.47it/s]
Unloading and merging model:   4%|▍         | 23/518 [00:00<00:06, 80.72it/s]
Unloading and merging model:   8%|▊         | 39/518 [00:00<00:05, 84.70it/s]
Unloading and merging model:  11%|█         | 55/518 [00:00<00:05, 86.39it/s]
Unloading and merging model:  14%|█▎        | 71/518 [00:00<00:05, 87.25it/s]
Unloading and merging model:  17%|█▋        | 87/518 [00:01<00:04, 87.56it/s]
Unloading and merging model:  20%|█▉        | 103/518 [00:01<00:04, 87.93it/s]
Unloading and merging model:  23%|██▎       | 119/518 [00:01<00:04, 87.98it/s]
Unloading and merging model:  26%|██▌       | 135/518 [00:01<00:04, 88.05it/s]
Unloading and merging model:  29%|██▉       | 151/518 [00:01<00:04, 88.08it/s]
Unloading and merging model:  32%|███▏      | 167/518 [00:01<00:03, 88.15it/s]
Unloading and merging model:  35%|███▌      | 183/518 [00:02<00:03, 87.92it/s]
Unloading and merging model:  38%|███▊      | 199/518 [00:02<00:03, 87.75it/s]
Unloading and merging model:  42%|████▏     | 215/518 [00:02<00:03, 87.50it/s]
Unloading and merging model:  45%|████▍     | 231/518 [00:02<00:03, 87.44it/s]
Unloading and merging model:  48%|████▊     | 247/518 [00:02<00:03, 87.47it/s]
Unloading and merging model:  51%|█████     | 263/518 [00:03<00:02, 87.39it/s]
Unloading and merging model:  54%|█████▍    | 279/518 [00:03<00:02, 87.43it/s]
Unloading and merging model:  57%|█████▋    | 295/518 [00:03<00:02, 87.35it/s]
Unloading and merging model:  60%|██████    | 311/518 [00:03<00:02, 87.30it/s]
Unloading and merging model:  63%|██████▎   | 327/518 [00:03<00:02, 87.31it/s]
Unloading and merging model:  66%|██████▌   | 343/518 [00:03<00:02, 87.48it/s]
Unloading and merging model:  69%|██████▉   | 359/518 [00:04<00:01, 87.61it/s]
Unloading and merging model:  72%|███████▏  | 375/518 [00:04<00:01, 87.59it/s]
Unloading and merging model:  75%|███████▌  | 391/518 [00:04<00:01, 87.64it/s]
Unloading and merging model:  79%|███████▊  | 407/518 [00:04<00:01, 87.65it/s]
Unloading and merging model:  82%|████████▏ | 423/518 [00:04<00:01, 87.69it/s]
Unloading and merging model:  85%|████████▍ | 439/518 [00:05<00:00, 87.76it/s]
Unloading and merging model:  88%|████████▊ | 455/518 [00:05<00:00, 87.79it/s]
Unloading and merging model:  91%|█████████ | 471/518 [00:05<00:00, 87.74it/s]
Unloading and merging model:  94%|█████████▍| 487/518 [00:05<00:00, 87.77it/s]
Unloading and merging model:  97%|█████████▋| 503/518 [00:05<00:00, 87.75it/s]
Unloading and merging model: 100%|██████████| 518/518 [00:05<00:00, 88.56it/s]
Removed shared tensor {'model.layers.7.self_attn.o_proj.weight_format', 'model.layers.17.self_attn.q_proj.weight_format', 'model.layers.19.self_attn.o_proj.weight_format', 'model.layers.20.mlp.down_proj.weight_format', 'model.layers.21.mlp.down_proj.weight_format', 'model.layers.17.self_attn.v_proj.weight_format', 'model.layers.2.self_attn.v_proj.weight_format', 'model.layers.29.self_attn.o_proj.weight_format', 'model.layers.14.mlp.down_proj.weight_format', 'model.layers.27.mlp.up_proj.weight_format', 'model.layers.0.mlp.gate_proj.weight_format', 'model.layers.27.self_attn.k_proj.weight_format', 'model.layers.12.mlp.up_proj.weight_format', 'model.layers.30.mlp.gate_proj.weight_format', 'model.layers.8.mlp.down_proj.weight_format', 'model.layers.27.self_attn.q_proj.weight_format', 'model.layers.6.mlp.gate_proj.weight_format', 'model.layers.24.mlp.gate_proj.weight_format', 'model.layers.1.self_attn.v_proj.weight_format', 'model.layers.21.mlp.gate_proj.weight_format', 'model.layers.30.mlp.down_proj.weight_format', 'model.layers.15.mlp.up_proj.weight_format', 'model.layers.11.mlp.down_proj.weight_format', 'model.layers.10.mlp.gate_proj.weight_format', 'model.layers.24.self_attn.v_proj.weight_format', 'model.layers.29.mlp.gate_proj.weight_format', 'model.layers.17.mlp.gate_proj.weight_format', 'model.layers.8.self_attn.k_proj.weight_format', 'model.layers.21.self_attn.k_proj.weight_format', 'model.layers.14.self_attn.v_proj.weight_format', 'model.layers.4.self_attn.q_proj.weight_format', 'model.layers.0.mlp.up_proj.weight_format', 'model.layers.12.self_attn.q_proj.weight_format', 'model.layers.26.self_attn.q_proj.weight_format', 'model.layers.15.self_attn.k_proj.weight_format', 'model.layers.2.self_attn.q_proj.weight_format', 'model.layers.15.mlp.down_proj.weight_format', 'model.layers.5.self_attn.k_proj.weight_format', 'model.layers.20.self_attn.o_proj.weight_format', 'model.layers.6.mlp.down_proj.weight_format', 'model.layers.14.self_attn.k_proj.weight_format', 'model.layers.30.self_attn.v_proj.weight_format', 'model.layers.5.mlp.up_proj.weight_format', 'model.layers.22.self_attn.v_proj.weight_format', 'model.layers.28.self_attn.o_proj.weight_format', 'model.layers.9.mlp.gate_proj.weight_format', 'model.layers.18.mlp.down_proj.weight_format', 'model.layers.13.mlp.down_proj.weight_format', 'model.layers.8.self_attn.q_proj.weight_format', 'model.layers.13.self_attn.q_proj.weight_format', 'model.layers.27.mlp.gate_proj.weight_format', 'model.layers.3.self_attn.k_proj.weight_format', 'model.layers.8.mlp.gate_proj.weight_format', 'model.layers.12.mlp.down_proj.weight_format', 'model.layers.16.self_attn.o_proj.weight_format', 'model.layers.28.mlp.down_proj.weight_format', 'model.layers.30.self_attn.k_proj.weight_format', 'model.layers.31.mlp.up_proj.weight_format', 'model.layers.20.mlp.up_proj.weight_format', 'model.layers.16.self_attn.k_proj.weight_format', 'model.layers.30.self_attn.q_proj.weight_format', 'model.layers.11.mlp.up_proj.weight_format', 'model.layers.3.self_attn.o_proj.weight_format', 'model.layers.0.self_attn.v_proj.weight_format', 'model.layers.5.mlp.gate_proj.weight_format', 'model.layers.7.self_attn.v_proj.weight_format', 'model.layers.22.mlp.up_proj.weight_format', 'model.layers.17.self_attn.o_proj.weight_format', 'model.layers.4.self_attn.k_proj.weight_format', 'model.layers.25.self_attn.k_proj.weight_format', 'model.layers.5.self_attn.o_proj.weight_format', 'model.layers.15.self_attn.v_proj.weight_format', 'model.layers.2.mlp.down_proj.weight_format', 'model.layers.19.mlp.up_proj.weight_format', 'model.layers.11.self_attn.q_proj.weight_format', 'model.layers.3.self_attn.v_proj.weight_format', 'model.layers.9.self_attn.k_proj.weight_format', 'model.layers.11.mlp.gate_proj.weight_format', 'model.layers.17.mlp.up_proj.weight_format', 'model.layers.10.self_attn.q_proj.weight_format', 'model.layers.20.self_attn.k_proj.weight_format', 'model.layers.5.mlp.down_proj.weight_format', 'model.layers.23.mlp.gate_proj.weight_format', 'model.layers.23.mlp.down_proj.weight_format', 'model.layers.25.mlp.gate_proj.weight_format', 'model.layers.26.mlp.down_proj.weight_format', 'model.layers.4.mlp.down_proj.weight_format', 'model.layers.14.mlp.gate_proj.weight_format', 'model.layers.27.self_attn.v_proj.weight_format', 'model.layers.1.mlp.gate_proj.weight_format', 'model.layers.17.mlp.down_proj.weight_format', 'model.layers.20.mlp.gate_proj.weight_format', 'model.layers.1.mlp.up_proj.weight_format', 'model.layers.27.self_attn.o_proj.weight_format', 'model.layers.24.mlp.up_proj.weight_format', 'model.layers.10.self_attn.k_proj.weight_format', 'model.layers.18.mlp.gate_proj.weight_format', 'model.layers.13.self_attn.v_proj.weight_format', 'model.layers.18.self_attn.o_proj.weight_format', 'model.layers.15.mlp.gate_proj.weight_format', 'model.layers.16.self_attn.v_proj.weight_format', 'model.layers.23.self_attn.q_proj.weight_format', 'model.layers.12.self_attn.o_proj.weight_format', 'model.layers.23.mlp.up_proj.weight_format', 'model.layers.9.self_attn.q_proj.weight_format', 'model.layers.21.self_attn.q_proj.weight_format', 'model.layers.3.mlp.gate_proj.weight_format', 'model.layers.19.mlp.gate_proj.weight_format', 'model.layers.27.mlp.down_proj.weight_format', 'model.layers.10.mlp.up_proj.weight_format', 'model.layers.21.self_attn.o_proj.weight_format', 'model.layers.5.self_attn.v_proj.weight_format', 'model.layers.28.mlp.gate_proj.weight_format', 'model.layers.18.self_attn.v_proj.weight_format', 'model.layers.20.self_attn.v_proj.weight_format', 'model.layers.22.mlp.gate_proj.weight_format', 'model.layers.10.mlp.down_proj.weight_format', 'model.layers.1.self_attn.o_proj.weight_format', 'model.layers.4.mlp.gate_proj.weight_format', 'model.layers.21.self_attn.v_proj.weight_format', 'model.layers.28.mlp.up_proj.weight_format', 'model.layers.1.self_attn.q_proj.weight_format', 'model.layers.16.mlp.gate_proj.weight_format', 'model.layers.20.self_attn.q_proj.weight_format', 'model.layers.26.mlp.up_proj.weight_format', 'model.layers.8.self_attn.v_proj.weight_format', 'model.layers.30.self_attn.o_proj.weight_format', 'model.layers.30.mlp.up_proj.weight_format', 'model.layers.7.mlp.down_proj.weight_format', 'model.layers.4.self_attn.o_proj.weight_format', 'model.layers.21.mlp.up_proj.weight_format', 'model.layers.1.mlp.down_proj.weight_format', 'model.layers.19.self_attn.k_proj.weight_format', 'model.layers.8.self_attn.o_proj.weight_format', 'model.layers.15.self_attn.q_proj.weight_format', 'model.layers.23.self_attn.o_proj.weight_format', 'model.layers.6.mlp.up_proj.weight_format', 'model.layers.16.mlp.up_proj.weight_format', 'model.layers.19.self_attn.v_proj.weight_format', 'model.layers.29.mlp.down_proj.weight_format', 'model.layers.31.mlp.gate_proj.weight_format', 'model.layers.18.self_attn.q_proj.weight_format', 'model.layers.19.mlp.down_proj.weight_format', 'model.layers.0.self_attn.o_proj.weight_format', 'model.layers.2.mlp.gate_proj.weight_format', 'model.layers.23.self_attn.k_proj.weight_format', 'model.layers.29.self_attn.q_proj.weight_format', 'model.layers.10.self_attn.o_proj.weight_format', 'model.layers.25.self_attn.v_proj.weight_format', 'model.layers.13.self_attn.o_proj.weight_format', 'model.layers.25.self_attn.q_proj.weight_format', 'model.layers.12.mlp.gate_proj.weight_format', 'model.layers.0.mlp.down_proj.weight_format', 'model.layers.12.self_attn.v_proj.weight_format', 'model.layers.7.self_attn.q_proj.weight_format', 'model.layers.24.self_attn.k_proj.weight_format', 'model.layers.1.self_attn.k_proj.weight_format', 'model.layers.24.mlp.down_proj.weight_format', 'model.layers.31.self_attn.q_proj.weight_format', 'model.layers.11.self_attn.o_proj.weight_format', 'model.layers.22.mlp.down_proj.weight_format', 'model.layers.7.self_attn.k_proj.weight_format', 'model.layers.26.self_attn.o_proj.weight_format', 'model.layers.14.self_attn.o_proj.weight_format', 'model.layers.29.self_attn.k_proj.weight_format', 'model.layers.23.self_attn.v_proj.weight_format', 'model.layers.7.mlp.up_proj.weight_format', 'model.layers.2.self_attn.k_proj.weight_format', 'model.layers.3.mlp.up_proj.weight_format', 'model.layers.6.self_attn.k_proj.weight_format', 'model.layers.19.self_attn.q_proj.weight_format', 'model.layers.3.mlp.down_proj.weight_format', 'model.layers.7.mlp.gate_proj.weight_format', 'model.layers.22.self_attn.o_proj.weight_format', 'model.layers.2.self_attn.o_proj.weight_format', 'model.layers.0.self_attn.k_proj.weight_format', 'model.layers.29.self_attn.v_proj.weight_format', 'model.layers.24.self_attn.q_proj.weight_format', 'model.layers.4.mlp.up_proj.weight_format', 'model.layers.4.self_attn.v_proj.weight_format', 'model.layers.13.mlp.up_proj.weight_format', 'model.layers.25.self_attn.o_proj.weight_format', 'model.layers.10.self_attn.v_proj.weight_format', 'model.layers.28.self_attn.q_proj.weight_format', 'model.layers.5.self_attn.q_proj.weight_format', 'model.layers.6.self_attn.q_proj.weight_format', 'model.layers.14.self_attn.q_proj.weight_format', 'model.layers.9.self_attn.o_proj.weight_format', 'model.layers.15.self_attn.o_proj.weight_format', 'model.layers.26.self_attn.v_proj.weight_format', 'model.layers.13.mlp.gate_proj.weight_format', 'model.layers.28.self_attn.k_proj.weight_format', 'model.layers.9.self_attn.v_proj.weight_format', 'model.layers.31.self_attn.k_proj.weight_format', 'model.layers.31.mlp.down_proj.weight_format', 'model.layers.2.mlp.up_proj.weight_format', 'model.layers.9.mlp.down_proj.weight_format', 'model.layers.31.self_attn.v_proj.weight_format', 'model.layers.17.self_attn.k_proj.weight_format', 'model.layers.12.self_attn.k_proj.weight_format', 'model.layers.28.self_attn.v_proj.weight_format', 'model.layers.29.mlp.up_proj.weight_format', 'model.layers.26.mlp.gate_proj.weight_format', 'model.layers.8.mlp.up_proj.weight_format', 'model.layers.25.mlp.down_proj.weight_format', 'model.layers.11.self_attn.v_proj.weight_format', 'model.layers.26.self_attn.k_proj.weight_format', 'model.layers.6.self_attn.o_proj.weight_format', 'model.layers.24.self_attn.o_proj.weight_format', 'model.layers.9.mlp.up_proj.weight_format', 'model.layers.31.self_attn.o_proj.weight_format', 'model.layers.18.mlp.up_proj.weight_format', 'model.layers.16.mlp.down_proj.weight_format', 'model.layers.11.self_attn.k_proj.weight_format', 'model.layers.13.self_attn.k_proj.weight_format', 'model.layers.14.mlp.up_proj.weight_format', 'model.layers.22.self_attn.q_proj.weight_format', 'model.layers.22.self_attn.k_proj.weight_format', 'model.layers.6.self_attn.v_proj.weight_format', 'model.layers.3.self_attn.q_proj.weight_format', 'model.layers.25.mlp.up_proj.weight_format', 'model.layers.18.self_attn.k_proj.weight_format', 'model.layers.16.self_attn.q_proj.weight_format'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading

╒══════════╕
│ FINISHED │
╘══════════╛

Finetuning process has been completed..
Saving the finetuned base model..
Saving the finetuned base model completed..

When I checked the disk size for this saved model it was 7.6MB only...indicating that the merge did not happen appropriately.

Environment:

absl-py==2.0.0
accelerate==0.24.1
aiohttp==3.8.6
aiohttp-cors==0.7.0
aiorwlock==1.4.0
aiosignal==1.3.1
anyio==4.2.0
asttokens @ file:///home/conda/feedstock_root/build_artifacts/asttokens_1698341106958/work
async-timeout==4.0.3
attrs==23.1.0
awscli==1.32.25
backports.functools-lru-cache @ file:///home/conda/feedstock_root/build_artifacts/backports.functools_lru_cache_1687772187254/work
beautifulsoup4==4.12.3
bitsandbytes==0.40.2
bleach==6.1.0
blessed==1.20.0
blinker==1.7.0
blis==0.7.11
botocore==1.34.25
Brotli==1.1.0
cachetools==5.3.2
captum==0.7.0
catalogue==2.0.10
certifi==2023.7.22
charset-normalizer==3.3.2
click==8.1.7
cloudpathlib==0.16.0
cloudpickle==3.0.0
colorama==0.4.4
colorful==0.5.6
comm @ file:///home/conda/feedstock_root/build_artifacts/comm_1691044910542/work
commonmark==0.9.1
confection==0.1.3
contourpy==1.2.0
cycler==0.12.1
cymem==2.0.8
Cython==3.0.5
dask==2023.3.2
dataclasses-json==0.6.2
datasets==2.14.6
debugpy @ file:///home/conda/feedstock_root/build_artifacts/debugpy_1695534290310/work
decorator @ file:///home/conda/feedstock_root/build_artifacts/decorator_1641555617451/work
deepspeed==0.12.3
dill==0.3.7
distlib==0.3.8
docutils==0.16
et-xmlfile==1.1.0
exceptiongroup @ file:///home/conda/feedstock_root/build_artifacts/exceptiongroup_1692026125334/work
executing @ file:///home/conda/feedstock_root/build_artifacts/executing_1698579936712/work
faiss-cpu==1.7.4
fastapi==0.109.0
filelock==3.13.1
Flask==3.0.1
Flask-Compress==1.14
fonttools==4.47.2
frozenlist==1.4.0
fsspec==2023.9.2
future==0.18.3
getdaft==0.1.20
google-api-core==2.15.0
google-auth==2.23.4
google-auth-oauthlib==1.1.0
googleapis-common-protos==1.62.0
gpustat==1.1.1
GPUtil==1.4.0
grpcio==1.59.2
h11==0.14.0
h5py==3.10.0
hiplot==0.1.33
hjson==3.1.0
html5lib==1.1
httpcore==1.0.2
httpx==0.26.0
huggingface-hub==0.20.3
hummingbird-ml==0.4.10
hyperopt==0.2.7
idna==3.4
imagecodecs==2024.1.1
importlib-metadata @ file:///home/conda/feedstock_root/build_artifacts/importlib-metadata_1688754491823/work
ipykernel @ file:///home/conda/feedstock_root/build_artifacts/ipykernel_1698244021190/work
ipython @ file:///home/conda/feedstock_root/build_artifacts/ipython_1698846603011/work
ipywidgets==8.1.1
itsdangerous==2.1.2
jedi @ file:///home/conda/feedstock_root/build_artifacts/jedi_1696326070614/work
Jinja2==3.1.2
jmespath==1.0.1
joblib==1.3.2
jsonschema==4.6.2
jupyter_client @ file:///home/conda/feedstock_root/build_artifacts/jupyter_client_1699283905679/work
jupyter_core @ file:///home/conda/feedstock_root/build_artifacts/jupyter_core_1698673647019/work
jupyterlab-widgets==3.0.9
kaggle==1.5.16
kiwisolver==1.4.5
langcodes==3.3.0
lightgbm==4.2.0
lightgbm-ray==0.1.9
lightning-utilities==0.9.0
locket==1.0.0
loguru==0.7.2
loralib==0.1.2
ludwig==0.9.3
lxml==4.9.3
Markdown==3.5.1
MarkupSafe==2.1.3
marshmallow==3.20.1
marshmallow-dataclass==8.5.4
marshmallow-jsonschema==0.13.0
matplotlib==3.8.2
matplotlib-inline @ file:///home/conda/feedstock_root/build_artifacts/matplotlib-inline_1660814786464/work
mpi4py==3.1.5
mpmath==1.3.0
msgpack==1.0.7
multidict==6.0.4
multiprocess==0.70.15
murmurhash==1.0.10
mypy-extensions==1.0.0
nest-asyncio @ file:///home/conda/feedstock_root/build_artifacts/nest-asyncio_1697083700168/work
networkx==3.2.1
ninja==1.11.1.1
nltk==3.8.1
numpy==1.26.2
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-ml-py==12.535.133
nvidia-nccl-cu12==2.18.1
nvidia-nvjitlink-cu12==12.3.52
nvidia-nvtx-cu12==12.1.105
oauthlib==3.2.2
onnx==1.15.0
onnxconverter-common==1.13.0
opencensus==0.11.4
opencensus-context==0.1.3
openpyxl==3.1.2
packaging @ file:///home/conda/feedstock_root/build_artifacts/packaging_1696202382185/work
pandas==2.1.3
parso @ file:///home/conda/feedstock_root/build_artifacts/parso_1638334955874/work
partd==1.4.1
peft==0.6.2
pexpect @ file:///home/conda/feedstock_root/build_artifacts/pexpect_1667297516076/work
pickleshare @ file:///home/conda/feedstock_root/build_artifacts/pickleshare_1602536217715/work
Pillow==10.1.0
platformdirs @ file:///home/conda/feedstock_root/build_artifacts/platformdirs_1699715570510/work
preshed==3.0.9
prometheus-client==0.19.0
prompt-toolkit @ file:///home/conda/feedstock_root/build_artifacts/prompt-toolkit_1699963054032/work
protobuf==3.20.3
psutil==5.9.4
ptitprince==0.2.7
ptyprocess @ file:///home/conda/feedstock_root/build_artifacts/ptyprocess_1609419310487/work/dist/ptyprocess-0.7.0-py2.py3-none-any.whl
pure-eval @ file:///home/conda/feedstock_root/build_artifacts/pure_eval_1642875951954/work
py==1.11.0
py-cpuinfo==9.0.0
py-spy==0.3.14
py4j==0.10.9.7
pyarrow==14.0.1
pyasn1==0.5.0
pyasn1-modules==0.3.0
pydantic==1.10.13
Pygments @ file:///home/conda/feedstock_root/build_artifacts/pygments_1691408637400/work
pynvml==11.5.0
pyparsing==3.1.1
pyrsistent==0.20.0
python-dateutil @ file:///home/conda/feedstock_root/build_artifacts/python-dateutil_1626286286081/work
python-multipart==0.0.6
python-slugify==8.0.1
pytz==2023.3.post1
pyxlsb==1.0.10
PyYAML==6.0
pyzmq @ file:///home/conda/feedstock_root/build_artifacts/pyzmq_1698062401223/work
ray==2.3.1
regex==2023.10.3
requests==2.31.0
requests-oauthlib==1.3.1
retry==0.9.2
rich==12.4.4
rsa==4.7.2
s3fs==0.4.2
s3transfer==0.10.0
sacremoses==0.1.1
safetensors==0.4.2
scikit-learn==1.3.2
scipy==1.11.3
seaborn==0.11.0
sentence-transformers==2.2.2
sentencepiece==0.1.99
six @ file:///home/conda/feedstock_root/build_artifacts/six_1620240208055/work
smart-open==6.4.0
sniffio==1.3.0
soupsieve==2.5
spacy==3.7.2
spacy-legacy==3.0.12
spacy-loggers==1.0.5
srsly==2.4.8
stack-data @ file:///home/conda/feedstock_root/build_artifacts/stack_data_1669632077133/work
starlette==0.35.1
sympy==1.12
tabulate==0.9.0
tblib==3.0.0
tensorboard==2.15.1
tensorboard-data-server==0.7.2
tensorboardX==2.2
text-unidecode==1.3
thinc==8.2.1
threadpoolctl==3.2.0
tifffile==2024.2.12
tokenizers==0.15.2
toolz==0.12.0
torch==2.1.0
torchaudio==2.1.0
torchdata==0.7.0
torchinfo==1.8.0
torchmetrics==1.2.0
torchtext==0.16.0
torchvision==0.16.0
tornado @ file:///home/conda/feedstock_root/build_artifacts/tornado_1695373560918/work
tqdm==4.66.1
traitlets @ file:///home/conda/feedstock_root/build_artifacts/traitlets_1698671135544/work
transformers==4.37.2
triton==2.1.0
typer==0.9.0
typing-inspect==0.9.0
typing_extensions @ file:///home/conda/feedstock_root/build_artifacts/typing_extensions_1695040754690/work
tzdata==2023.3
urllib3==1.26.18
uvicorn==0.27.0
virtualenv==20.25.0
wasabi==1.1.2
wcwidth @ file:///home/conda/feedstock_root/build_artifacts/wcwidth_1699959196938/work
weasel==0.3.4
webencodings==0.5.1
Werkzeug==3.0.1
widgetsnbextension==4.0.9
wrapt==1.16.0
xgboost==2.0.3
xgboost-ray==0.1.18
xlrd==2.0.1
XlsxWriter==3.1.9
xlwt==1.3.0
xxhash==3.4.1
yarl==1.9.2
zipp @ file:///home/conda/feedstock_root/build_artifacts/zipp_1695255097490/work

Can someone help me in solving this
TIA

@raghavbj24 raghavbj24 changed the title Unable to save the fine tuned llama-2 model Inappropriate saving of the merged fine tuned llama-2 model Feb 15, 2024
@alexsherstinsky alexsherstinsky self-assigned this Feb 15, 2024
@alexsherstinsky
Copy link
Collaborator

Hi @raghavbj24 -- thank you for submitting this issue! Question for you: I see that your base_model is /home/ubuntu/llama-2-7b-hf_for_merge. Would the same "small size" phenomenon happen if you try to use meta-llama/Llama-2-7b-hf from https://huggingface.co/meta-llama/Llama-2-7b-hf? Please let me know. Thank you.

@raghavbj24
Copy link
Author

Hi @alexsherstinsky-- as per your suggestion I tried the base model as meta-llama/Llama-2-7b-hf from huggingface...but there is no difference and the size of the saved model is very small.

@alexsherstinsky
Copy link
Collaborator

@raghavbj24 Could you please point me to the HuggingFace location where your model is saved and enable me to access it with "read" privileges? I am going to look into it thoroughly in the next few days. Thank you.

@alexsherstinsky
Copy link
Collaborator

@raghavbj24 In parallel, if you do not mind: could you please rerun your experiment using this base model: alexsherstinsky/Mistral-7B-v0.1-sharded -- and let me know here what you see for the merged model size (and please also tell me the location where it will be saved). Thank you very much for your collaboration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants