Export Fine-Tuned LLM after Trainer is Complete #2101

andreyvelich · 2024-05-06T22:30:29Z

We discussed here: kubeflow/website#3718 (comment) that our LLM Trainer doesn't export the fine-tuned model.
So user can't re-use that model for inference or other purposes.

We should discuss how user can get the fine-tuned artifact after LLM Trainer is complete.
/cc @kubeflow/wg-training-leads @deepanker13

Would be nice to see integration with Kubeflow Model Registry as well. cc @kubeflow/wg-data-leads

tarilabs · 2024-05-07T01:36:17Z

Would be nice to see integration with Kubeflow Model Registry as well. cc @kubeflow/wg-data-leads

If there is a tutorial of the part specific to this project that exhibit the metadata we want to capture on Model Registry, I would be very happy to complement that example with indexing those metadata on MR ! 🚀👍

StefanoFioravanzo · 2024-05-07T06:17:04Z

@andreyvelich I may have misunderstood the initial context of this API because I was under the impression that you could serve the model once fine-tuned. Can you elaborate on this?

So user can't re-use that model for inference or other purposes.

andreyvelich · 2024-05-07T11:01:43Z

@andreyvelich I may have misunderstood the initial context of this API because I was under the impression that you could serve the model once fine-tuned. Can you elaborate on this?

So user can't re-use that model for inference or other purposes.

I think, right now the only way is to use output_dir for model checkpoints.
In that case, user can get the model from PVC that we attach to the PyTorchJob.
Like in this example: https://github.com/kubeflow/training-operator/blob/master/examples/pytorch/language-modeling/train_api_hf_dataset.ipynb
Right @johnugeorge @deepanker13 ?

andreyvelich added the kind/discussion label May 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Export Fine-Tuned LLM after Trainer is Complete #2101

Export Fine-Tuned LLM after Trainer is Complete #2101

andreyvelich commented May 6, 2024

tarilabs commented May 7, 2024

StefanoFioravanzo commented May 7, 2024

andreyvelich commented May 7, 2024

Export Fine-Tuned LLM after Trainer is Complete #2101

Export Fine-Tuned LLM after Trainer is Complete #2101

Comments

andreyvelich commented May 6, 2024

tarilabs commented May 7, 2024

StefanoFioravanzo commented May 7, 2024

andreyvelich commented May 7, 2024