Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not getting Kubeflow Training SDK v1.7 when installing kubeflow-training #2082

Open
JamesKunstle opened this issue Apr 24, 2024 · 13 comments
Open

Comments

@JamesKunstle
Copy link

In a new virtual environment, I'm installing kubeflow-training only.

This is the freeze I get:

cachetools==5.3.3
certifi==2024.2.2
charset-normalizer==3.3.2
google-auth==2.29.0
idna==3.7
kubeflow-training==1.7.0
kubernetes==29.0.0
oauthlib==3.2.2
pyasn1==0.6.0
pyasn1_modules==0.4.0
python-dateutil==2.9.0.post0
PyYAML==6.0.1
requests==2.31.0
requests-oauthlib==2.0.0
retrying==1.3.4
rsa==4.9
setuptools==69.5.1
six==1.16.0
urllib3==2.2.1
websocket-client==1.8.0

However, when I inspect the code that's been installed at new_venv/lib/python3.12/site-packages/kubeflow/training/api/training_client.py the code isn't up to date with the 1.7 SDK release that I can see here on GitHub.

Specifically, I see that the function get_job_logs is different. I need to most updated one.

@andreyvelich
Copy link
Member

Thank you for creating this @JamesKunstle.
We publish SDK on each Training Operator release: https://pypi.org/project/kubeflow-training/.
E.g. the latest version is 1.7, so to see the changes for that SDK, you need to check the release-1.7 branch:
https://github.com/kubeflow/training-operator/blob/v1.7-branch/sdk/python/kubeflow/training/api/training_client.py

@JamesKunstle
Copy link
Author

What would be the supported path to get the most up-to-date SDK code? The main-branch code does what I want, but not the code that gets pulled when I install the kubeflow-training library

@franciscojavierarceo
Copy link
Contributor

@andreyvelich how do you publish release to PyPi? I took a look at the code and I didn't see any actions doing a release automatically. I reached out to @tenzen-y on this as well.

@franciscojavierarceo
Copy link
Contributor

FWIW @andreyvelich for Feast we have the release process fully automated and deployed to PyPi with this action: https://github.com/feast-dev/feast/actions/workflows/release.yml

@franciscojavierarceo
Copy link
Contributor

Happy to help out and replicate the same here if that would be desirable.

@anishasthana
Copy link

anishasthana commented Apr 24, 2024

Could you try something like this?

pip install git+https://github.com/kubeflow/training-operator.git@master#subdirectory=sdk/python"

I've never installed from a subdirectory before but I think this should work

@andreyvelich
Copy link
Member

@JamesKunstle If you want to get the latest changes for SDK, I added the scripts in this PR: kubeflow/website#3719.
Similar to @anishasthana's comment, you can do this:

pip install git+https://github.com/kubeflow/training-operator.git@7345e33b333ba5084127efe027774dd7bed8f6e6#subdirectory=sdk/python

@andreyvelich
Copy link
Member

@andreyvelich how do you publish release to PyPi? I took a look at the code and I didn't see any actions doing a release automatically. I reached out to @tenzen-y on this as well.

Currently, for Training Operator we don't have script to automate release process. So, @johnugeorge is publishing SDK manually after we cut the release.
However, for Katib SDK we have this script that we run to publish Images + SDK after the release: https://github.com/kubeflow/katib/blob/master/scripts/v1beta1/release.sh#L85-L97.

Happy to help out and replicate the same here if that would be desirable.

That would be awesome if you could help us to automate releases for Training Operator/Katib.
We have this issue that we created a while ago: kubeflow/katib#2049.

@anishasthana
Copy link

@franciscojavierarceo
Copy link
Contributor

However, for Katib SDK we have this script that we run to publish Images + SDK after the release: https://github.com/kubeflow/katib/blob/master/scripts/v1beta1/release.sh#L85-L97.

So is publishing the image also manual?

@tenzen-y
Copy link
Member

However, for Katib SDK we have this script that we run to publish Images + SDK after the release: https://github.com/kubeflow/katib/blob/master/scripts/v1beta1/release.sh#L85-L97.

So is publishing the image also manual?

We usually publish the operator image by

- component-name: training-operator
dockerfile: build/images/training-operator/Dockerfile
platforms: linux/amd64,linux/arm64,linux/ppc64le
.

@JamesKunstle
Copy link
Author

@andreyvelich @anishasthana Okay yeah that works now, I can see the most recent changes. Would really appreciate a more "pypi"-y way of installing the latest release, I think I was getting a fairly old package when I was installing by name from pypi.

@andreyvelich
Copy link
Member

@andreyvelich @anishasthana Okay yeah that works now, I can see the most recent changes. Would really appreciate a more "pypi"-y way of installing the latest release, I think I was getting a fairly old package when I was installing by name from pypi.

Basically, we release SDK when we make another release of Training Operator to keep all component versions consistent: Controller + SDK. That helps us to keep versions stable.
Any thoughts @JamesKunstle ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants