How to use your finetuned model ? #110

maepopi · 2024-03-25T12:31:28Z

Hey there! I've originally posted this question inside of a longer thread but I thought it might be best to post it as a new thread because I'm unlikely to be the only one to have this issue.

So I've gone ahead and finetuned my first model, but I don't get how I can use it now. I went to see into the "fast_inference.py" and I saw a checkpoint path at line 88 but I got an OOM when trying to point to my finetuned ckpt. So I'm not sure I'm doing it right (not surprising since there is an entire rig to load the model from Hugging Face so that was kind of a desperate attempt on my part!)

I've also tried changing the checkpoint path in the "inference.py" script, but I have another OOM error.

Thanks!!

vatsalaggarwal · 2024-03-25T14:03:32Z

thanks for raising the issue, and apologies about the difficultly you're facing here... that's our bad :)

out of curiosity, how much VRAM do you have?

@lucapericlp -- just fyi -- https://github.com/metavoiceio/metavoice-src/blob/main/fam/llm/finetune.py#L299, this ends up storing all the tensors including optimiser states in pickled format (and they're on CUDA device), when you load them via torch.load all the tensors get allocated on CUDA memory (including optimiser state)... so if folks don't have enough GPU VRAM for weights+optimiser state, they'll get OOM

one of us will send us out a fix later tn :)

vatsalaggarwal · 2024-03-25T14:04:40Z

Does finetuning run on that same GPU though? That's a bit strange because it looks like you do have enough VRAM for weights+optimiser states?

maepopi · 2024-03-25T14:25:57Z

Hey!! Thanks for your answer!

Well yes, I think the finetuning ran on that same GPU because I think I saw in the console that the finetuning was run on GPU = 0 (which I thought, at the time, was the index of the device used) and CUDA was called. AND the training was relatively fast, which would definitely not have been the case on CPU I think

I have a RTX 3060 12Gb, here is the exact message have:

Traceback (most recent call last):
  File "/home/maelys/AI_PROJECTS/SOUND/TOOLS/metavoice-src/fam/llm/fast_inference.py", line 161, in <module>
    tts = tyro.cli(TTS)
  File "/home/maelys/.cache/pypoetry/virtualenvs/fam-QegjHbyx-py3.10/lib/python3.10/site-packages/tyro/_cli.py", line 187, in cli
    output = _cli_impl(
  File "/home/maelys/.cache/pypoetry/virtualenvs/fam-QegjHbyx-py3.10/lib/python3.10/site-packages/tyro/_cli.py", line 454, in _cli_impl
    out, consumed_keywords = _calling.call_from_args(
  File "/home/maelys/.cache/pypoetry/virtualenvs/fam-QegjHbyx-py3.10/lib/python3.10/site-packages/tyro/_calling.py", line 241, in call_from_args
    return unwrapped_f(*positional_args, **kwargs), consumed_keywords  # type: ignore
  File "/home/maelys/AI_PROJECTS/SOUND/TOOLS/metavoice-src/fam/llm/fast_inference.py", line 80, in __init__
    self.llm_second_stage = Model(
  File "/home/maelys/AI_PROJECTS/SOUND/TOOLS/metavoice-src/fam/llm/inference.py", line 93, in __init__
    self._init_model()
  File "/home/maelys/AI_PROJECTS/SOUND/TOOLS/metavoice-src/fam/llm/inference.py", line 141, in _init_model
    [self.model.to](http://self.model.to/)(self.config.device)
  File "/home/maelys/.cache/pypoetry/virtualenvs/fam-QegjHbyx-py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1152, in to
    return self._apply(convert)
  File "/home/maelys/.cache/pypoetry/virtualenvs/fam-QegjHbyx-py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 802, in _apply
    module._apply(fn)
  File "/home/maelys/.cache/pypoetry/virtualenvs/fam-QegjHbyx-py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 802, in _apply
    module._apply(fn)
  File "/home/maelys/.cache/pypoetry/virtualenvs/fam-QegjHbyx-py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 802, in _apply
    module._apply(fn)
  [Previous line repeated 3 more times]
  File "/home/maelys/.cache/pypoetry/virtualenvs/fam-QegjHbyx-py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 825, in _apply
    param_applied = fn(param)
  File "/home/maelys/.cache/pypoetry/virtualenvs/fam-QegjHbyx-py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1150, in convert
    return [t.to](http://t.to/)(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 44.00 MiB. GPU 0 has a total capacity of 11.76 GiB of which 55.25 MiB is free. Process 4518 has 336.64 MiB memory in use. Including non-PyTorch memory, this process has 10.85 GiB memory in use. Of the allocated memory 10.66 GiB is allocated by PyTorch, and 58.62 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

I tried a torch.cuda.empty_cache() but to no avail :'(

vatsalaggarwal · 2024-03-25T14:31:47Z

yeah, torch.cuda.empty_cache is unlikely to help here... if you've got 15-16 GBs of GPU RAM, you can try add map_location="cpu" to https://github.com/metavoiceio/metavoice-src/blob/main/fam/llm/fast_inference_utils.py#L243 ...

otherwise, the best thing to do might be to remove saving optimiser states during checkpointing ... you basically need to remove this line: https://github.com/metavoiceio/metavoice-src/blob/main/fam/llm/finetune.py#L288 ... but you might have difficultly resuming your finetuning jobs if you ever want to do that...

maepopi · 2024-03-25T14:41:10Z

Oh yeah there's really no way to resume the finetuning on 12GB VRAM (aka save the optimiser states if I understand correctly)?
It's okay let me just try that for now and see whether it helps

maepopi · 2024-03-25T14:46:47Z

I suppose I have to first delete the line in finetune.py, THEN retrain my model, and then try to call it through inference.py? Is that right?
(Sorry I'm still a novice in all this :))

vatsalaggarwal · 2024-03-25T15:00:38Z

Yeah, that's the "easiest"... the way to use your existing training likely is to "add map_location="cpu" to https://github.com/metavoiceio/metavoice-src/blob/main/fam/llm/fast_inference_utils.py#L243"

maepopi · 2024-03-25T15:02:13Z

Ok this would then make the inference tap into the CPU to compensate for the lack of VRAM is that it?
And if I retrain my model without the optimizer save, then I don't need to add map_location="cpu" because I'd have enough VRAM, is that it?

thank you so much for your patience

vatsalaggarwal · 2024-03-25T15:03:21Z

it would just mean that the optimiser states remain in CPU RAM instead of living on GPU. Model should still be on GPU.

maepopi · 2024-03-25T15:04:50Z

OK! let me try both solutions and get back to you later today :)

maepopi · 2024-03-25T21:10:31Z

Hey again!
So unfortunately I still have the OOM problem :( here's what I tried :

I added the map_location="cpu" line to "fast_inference_utils.py", wrote the path to my former finetuned model in "fast_inference.py", and then launched poetry run python -i fam/llm/fast_inference.py => OOM
I commented the optimizer save in the finetune.py script, retrained the model, and then tried the first step again but this time pointing to the new finetuned model ==> OOM
I tried to run the fast inference command without the "map_location" line ==> OOM

thanks!

vatsalaggarwal · 2024-03-27T10:46:22Z

sorry about the issues here :( we'll have a look today/tomorrow!

maepopi · 2024-03-27T11:14:02Z

No worries, thank you for your implication :)))

vatsalaggarwal · 2024-04-03T11:35:29Z

hey, sorry about the delay here... i'm looking into this now... could you share your finetuned checkpoint with me please (gdrive or anything is good), and also which GPU you were using?

maepopi · 2024-04-03T12:08:38Z

Hey! thank you! Of course, I've sent you by email through an external host because it's 5GB ^^ Tell me if you've received!
The GPU I have is a RTX 3060 12 GB

maepopi · 2024-04-16T08:04:47Z

Hey @vatsalaggarwal any news on this question? Have you received my checkpoint?

Thank you! I'm such in a hurry to start finetuning and try my hand at making a French model!

lucapericlp · 2024-05-14T21:46:54Z

@vatsalaggarwal anything worth adding here?

lucapericlp mentioned this issue Mar 26, 2024

Parametrising fast inference so that finetuned models can be used #113

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use your finetuned model ? #110

How to use your finetuned model ? #110

maepopi commented Mar 25, 2024

vatsalaggarwal commented Mar 25, 2024 •

edited

vatsalaggarwal commented Mar 25, 2024

maepopi commented Mar 25, 2024 •

edited

vatsalaggarwal commented Mar 25, 2024

maepopi commented Mar 25, 2024

maepopi commented Mar 25, 2024

vatsalaggarwal commented Mar 25, 2024 •

edited

maepopi commented Mar 25, 2024

vatsalaggarwal commented Mar 25, 2024

maepopi commented Mar 25, 2024

maepopi commented Mar 25, 2024

vatsalaggarwal commented Mar 27, 2024

maepopi commented Mar 27, 2024

vatsalaggarwal commented Apr 3, 2024 •

edited

maepopi commented Apr 3, 2024

maepopi commented Apr 16, 2024

lucapericlp commented May 14, 2024

How to use your finetuned model ? #110

How to use your finetuned model ? #110

Comments

maepopi commented Mar 25, 2024

vatsalaggarwal commented Mar 25, 2024 • edited

vatsalaggarwal commented Mar 25, 2024

maepopi commented Mar 25, 2024 • edited

vatsalaggarwal commented Mar 25, 2024

maepopi commented Mar 25, 2024

maepopi commented Mar 25, 2024

vatsalaggarwal commented Mar 25, 2024 • edited

maepopi commented Mar 25, 2024

vatsalaggarwal commented Mar 25, 2024

maepopi commented Mar 25, 2024

maepopi commented Mar 25, 2024

vatsalaggarwal commented Mar 27, 2024

maepopi commented Mar 27, 2024

vatsalaggarwal commented Apr 3, 2024 • edited

maepopi commented Apr 3, 2024

maepopi commented Apr 16, 2024

lucapericlp commented May 14, 2024

vatsalaggarwal commented Mar 25, 2024 •

edited

maepopi commented Mar 25, 2024 •

edited

vatsalaggarwal commented Mar 25, 2024 •

edited

vatsalaggarwal commented Apr 3, 2024 •

edited