Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export Fine-Tuned LLM after Trainer is Complete #2101

Open
andreyvelich opened this issue May 6, 2024 · 3 comments
Open

Export Fine-Tuned LLM after Trainer is Complete #2101

andreyvelich opened this issue May 6, 2024 · 3 comments

Comments

@andreyvelich
Copy link
Member

We discussed here: kubeflow/website#3718 (comment) that our LLM Trainer doesn't export the fine-tuned model.
So user can't re-use that model for inference or other purposes.

We should discuss how user can get the fine-tuned artifact after LLM Trainer is complete.
/cc @kubeflow/wg-training-leads @deepanker13

Would be nice to see integration with Kubeflow Model Registry as well. cc @kubeflow/wg-data-leads

@tarilabs
Copy link
Member

tarilabs commented May 7, 2024

Would be nice to see integration with Kubeflow Model Registry as well. cc @kubeflow/wg-data-leads

If there is a tutorial of the part specific to this project that exhibit the metadata we want to capture on Model Registry, I would be very happy to complement that example with indexing those metadata on MR ! 🚀👍

@StefanoFioravanzo
Copy link
Member

@andreyvelich I may have misunderstood the initial context of this API because I was under the impression that you could serve the model once fine-tuned. Can you elaborate on this?

So user can't re-use that model for inference or other purposes.

@andreyvelich
Copy link
Member Author

@andreyvelich I may have misunderstood the initial context of this API because I was under the impression that you could serve the model once fine-tuned. Can you elaborate on this?

So user can't re-use that model for inference or other purposes.

I think, right now the only way is to use output_dir for model checkpoints.
In that case, user can get the model from PVC that we attach to the PyTorchJob.
Like in this example: https://github.com/kubeflow/training-operator/blob/master/examples/pytorch/language-modeling/train_api_hf_dataset.ipynb
Right @johnugeorge @deepanker13 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants